List Info

Thread: Re: Simple application to retrieve MARC entry




Re: Simple application to retrieve MARC entry
country flaguser name
Denmark
2007-02-15 09:35:37
Laurence Finston wrote:

>Last night, I had to send off my message in a hurry. 
Here are a couple of
>additional comments.
>
>On Wed, 14 Feb 2007, Timothy Murphy wrote:
>
>  
>
>>On Wednesday 14 February 2007 12:43, lfinsto1gwdg.de
wrote:
>>
>>Over the years I have grown more and more ashamed of
this system
>>(accessible I think at <http://www
.maths.tcd.ie/local/library/>), 
>>and long ago decided it was time for a change.
>>    
>>
>
>I don't think there's any need to be ashamed of a
program that has 
>worked well for 20 years.  I've just looked up `refer'
and found that, 
>on a GNU/Linux system, it's part of the `groff' package.
 Apparently, 
>it implements a simple database in the form of a text
file, and the 
>manual page uses the term "database".
>
>  
>
>>At present our secretaries enter new books "by
hand",
>>typing in author, title, etc.
>>It seems that this could be greatly simplified by a
program
>>in which the secretary simpy typed in the ISBN
number,
>>and which then accessed the Library of Congress
database,
>>and stored the entry, probably in XML format.
>>    
>>
> 
>Retrieving the XML data is a piece of cake.  Apparently,
YAZ has a way 
>of doing this, but I've only used YAZ so far for
retrieving Pica data
>from a Z39.50 server.  To get the XML data from an OAI
server, 
>I used an library function `get_http' (or something)
under Windows,
>and am now using GNU Wget under GNU/Linux.  
>
>The usual way of approaching the problem from this point
is to parse
>the XML data and store the information in a data
structure, probably some 
>kind of tree.  This is the tricky part, and using
`libxml', 
>some other library, or any of the many tools available
for processing 
>XML doesn't seem to reduce the amount of work one has to
do significantly, 
>no matter what approach one chooses.  This is just my
impression, and I'd 
>be interested to hear what other programmers' opinions
are.  However, 
>once the data is stored in the data structure, writing
it to a database
>or formatting it in various ways is reasonably
straightforward.
>It also makes it possible to do much more complicated
things with the data.
>It might be possible to write a script or a program that
can recognize 
>some tags and perform simple transformations, or put
together a pipeline 
>of utitilites to do this, as outlined by another poster.
 If this would be
>adequate for your needs, great.  However, my approach
would be to parse the 
>data and store it in a data structure for the sake of
the additional 
>functionality one could implement, once that's done.
>  
>
Guys,

I'll offer up my opinion about the XML parsing issue as a
fellow 
programmer. We started using XML-like data structures --
more inspired 
by SGML -- long before the current family of XML tools came
of age, so 
we have, over the years, developed a great many different ad
hoc 
approaches to dealing with XML, from simple text-based
pattern matching 
to more elaborate parsing code. However, I think everybody
at Index Data 
has come to the conclusion that the benefits of using
standard packages 
for this far outweigh the problems, and that with libxml
(and its 
wrappers in various other languages), the problems are few.

Some of the benefits you get from using libxml include:

1) It is blindingly fast, and very robust, in our
experience. In fact, 
it is fast enough that in future versions of our Zebra
indexer, we will 
be recommending an XML/XSLT-based approach to designing
indexing rules 
as the best overall approach. We've found that this code is
actually 
faster, and much, much more functional than the homegrown
indexing 
mechanism we had originally developed.

2) Because it is a formal implementation of the whole
standard, it will 
reliably deal with all the different things that might go
into an XML 
document, like namespaces, processing instructions. If it
fails, you can 
generally be pretty sure it's because the document is not
well formed, 
and you can go hassle whoever sent you the document about
it. Going a 
step further and validating documents against
application-specific 
schemas gives you even more control. We have found that
using a full, 
conformant parser, and making XML the lingua franca for
commnicating 
with customers saves us tons of time spent in parsing
people's homegrown 
'almost-XML' formats.

3) Once you're using libxml, if a document isn't to your
liking, it's 
simple a simple process to thrown your XML tree through an
XSLT 
transformation... XSLT, for people who haven't played with
it, is 
definitely worth getting to know -- it is a powerful,
flexible language 
for transforming XML documents, and a pleasure to use.

4) Fianally, if you happen to be a C programmer, I have been
really 
delighted with the 'tree' API in libxml.. I find it more
intuitive and 
pleasant to use than many DOM-inspired APIs found in other
languages 
(see http://xmlsoft.org/ex
ample.html).

It does take an effort to get to know it, but, having
developed several 
XML-ish parsers myself, I can say that learning the libxml
API is 
definitely easier and faster, and well-worth the effort for
all of the 
fringe benefits you get.

Hope this is useful,

--Sebastian

>It takes a certain amount of effort to learn to use a
database package,
>but I believe it's worth the effort.  Once one has
learned how, one
>will probably think of all sorts of uses for it.  I
haven't looked into this
>thoroughly, but I'm fairly sure that the simple one used
with `refer' must 
>be quite limited in comparison with, say `nosql' (which
I admittedly 
>haven't started using yet).  On the other hand, I do
believe in the principles 
>"If it ain't broke, don't fix it" and
"Don't use a cannon to shoot at 
>sparrows".
>
>
>Laurence
>
>_______________________________________________
>Yazlist mailing list
>Yazlistlists.indexdata.dk
>http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>
>
>  
>

-- 
Sebastian Hammer, Index Data
quinnindexdata.com   www.indexdata.com
Ph: (603) 209-6853 Fax: (866) 383-4485


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )