List Info

Thread: Simple application to retrieve MARC entry




Simple application to retrieve MARC entry
user name
2007-02-13 20:54:18
I'm looking for an application like zoomsh
which when given an ISBN number
will retrieve the entry from the Library of Congress in XML
format,
and will then analyse the entry and output
a simpliifed version giving Author(s), Title, Publisher,
Date of Publication
and perhaps a couple of other entries.

Is there such a program available for public consumption?

If not, what would be the best language to code it in?

Any and all suggestions gratefully received.

-- 
Timothy Murphy  
e-mail (<80k only): tim /at/ birdsnest.maths.tcd.ie
tel: +353-86-2336090, +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2,
Ireland

_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

Simple application to retrieve MARC entry
user name
2007-02-13 21:34:31
Timothy Murphy writes:
 > I'm looking for an application like zoomsh
 > which when given an ISBN number
 > will retrieve the entry from the Library of Congress
in XML format,
 > and will then analyse the entry and output
 > a simpliifed version giving Author(s), Title,
Publisher, Date of Publication
 > and perhaps a couple of other entries.
 > 
 > Is there such a program available for public
consumption?
 > 
 > If not, what would be the best language to code it
in?

How about shell-script?

$ zoomsh "open z3950.loc.gov:7090/voyager"
"find attr 1=7 0253333490" "set
preferredRecordSyntax xml" "show 0 1" quit |
sed 1,2d | xsltproc debug.xsl -

Advantage: development time of approximately 90 seconds 

As you can see, zoomsh is doing all the work; sed is just
throwing
away the firsy two lines, which are zoomsh's chitchat,
leaving only
the XML record which is then fed to the XSLT processor.

Hope this helps.

 _/|_	
____________________________________________________________
_______
/o ) /  Mike Taylor    <mikeindexdata.com>    http://www.miketaylor.or
g.uk
)_v__/  "Stay tuned for exciting news about chicken
zygapophyses" --
	 Matt Wedel.


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

Re: Simple application to retrieve MARC entry
country flaguser name
Denmark
2007-02-13 21:53:51
Mike Taylor wrote:

>Timothy Murphy writes:
> > I'm looking for an application like zoomsh
> > which when given an ISBN number
> > will retrieve the entry from the Library of
Congress in XML format,
> > and will then analyse the entry and output
> > a simpliifed version giving Author(s), Title,
Publisher, Date of Publication
> > and perhaps a couple of other entries.
> > 
> > Is there such a program available for public
consumption?
> > 
> > If not, what would be the best language to code it
in?
>
>How about shell-script?
>
>$ zoomsh "open z3950.loc.gov:7090/voyager"
"find attr 1=7 0253333490" "set
preferredRecordSyntax xml" "show 0 1" quit |
sed 1,2d | xsltproc debug.xsl -
>
>Advantage: development time of approximately 90 seconds

>
>As you can see, zoomsh is doing all the work; sed is
just throwing
>away the firsy two lines, which are zoomsh's chitchat,
leaving only
>the XML record which is then fed to the XSLT processor.
>
>Hope this helps.
>  
>
I should keep my mouth shut, but I'm just so pleased with
the way that 
XSLT enables re-use and sharing of business logic such as
record formatting.

Perhaps the nicest thing about Mike's approach is that you
don't have to 
come up with the stylesheet itself. I'm sure you could get
lots of help 
from the XML4Lib community and elsewhere if you're not into
XSLT, but 
the LOC actually makes stylesheets available. You could use
as a 
starting point one of the stylesheets at 
http://www.loc.
gov/standards/marcxml/ ...

But, since the LoC server <plug
type="shameless">thanks to the YAZ-Proxy 
that it uses</plug> will return DC records on request,
you could also 
simply do

$ zoomsh "open z3950.loc.gov:7090/voyager"
"find attr 1=7 0253333490" 
"set preferredRecordSyntax xml" "set
elementSetName dc" "show 0 1" quit 
| sed 1,2d

And, depending on your definition of 'simplified', this
might be very, 
very close to what you are looking for.

Cheers,

--Sebastian


> _/|_	
____________________________________________________________
_______
>/o ) /  Mike Taylor    <mikeindexdata.com>    http://www.miketaylor.or
g.uk
>)_v__/  "Stay tuned for exciting news about
chicken zygapophyses" --
>	 Matt Wedel.
>
>
>_______________________________________________
>Yazlist mailing list
>Yazlistlists.indexdata.dk
>http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>
>
>  
>

-- 
Sebastian Hammer, Index Data
quinnindexdata.com   www.indexdata.com
Ph: (603) 209-6853 Fax: (866) 383-4485


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

Re: Simple application to retrieve MARC entry
user name
2007-02-14 06:02:54
On Wednesday 14 February 2007 03:53, Sebastian Hammer
wrote:

> I should keep my mouth shut, but I'm just so pleased
with the way that
> XSLT enables re-use and sharing of business logic such
as record
> formatting.

Thanks very much; I'll try that.
I was actually using zoomsh,
but wasn't sure how to process the result.
I'll look at XSLT after "set preferredRecordSyntax
xml"  as you suggest.

-- 
Timothy Murphy  
e-mail (<80k only): tim /at/ birdsnest.maths.tcd.ie
tel: +353-86-2336090, +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2,
Ireland

_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

Re: Simple application to retrieve MARC entry
country flaguser name
Germany
2007-02-14 06:43:38
Timothy Murphy wrote:

> I'm looking for an application like zoomsh
> which when given an ISBN number
> will retrieve the entry from the Library of Congress in
XML format,
> and will then analyse the entry and output
> a simpliifed version giving Author(s), Title,
Publisher, Date of
> Publication
> and perhaps a couple of other entries.
>

It depends on what you mean by analyse and what you want to
do with the
data once you've gotten it.  The task becomes more complex
if you want any
of these features, and surely others that don't occur to me
at the moment:

1.  The data should be written to a database

2.  It should be possible later to retrieve the data from a
database and
format it in one or more different ways, e.g., in HTML, in
PDF, as TeX
input, as groff input, etc.

3.  The program should be able to send queries to the server
or servers in
multiple, asynchronous threads.

4.  The program should be able to handle very large amounts
of data.

5.  The program should be able to handle likely errors
(ideally, unlikely
ones, too) and fail gracefully when it can't.

6.  The program should be accessible simultaneously to
multiple users.

7.  It should be possible for users to access the program
from a browser.

8.  The program should be able to handle data in more than
one format.

> Is there such a program available for public
consumption?
> If not, what would be the best language to code it in?

There are a number of programs available.  I've looked into
this, but
haven't done a thorough survey.  I am, in fact, working on a
package that
does something similar, and in the fullness of time may
possibly
be usable for what you want to do.  It is called the LDF
Metadata Exchange
Utilities, and I've applied to have it be made part of the
GNU Project of
the Free Software Foundation.  This is the web page, if
you're interested:
 http://www.nongnu.org/
iwf-mdh/

There are a lot of tools, packages, libraries, etc.,
available, such as
YAZ, so that programming an application is largely (but not
merely) a
matter of combining them.  My impression is that many of the
free packages
are in Perl or Java, and are no longer being developed or
maintained,
meaning no disrespect to any that are.

As far as language is concerned, the answer to this is
another "it
depends".  Any interpreted language such as Perl or
Java has a
considerable cost in speed and efficiency associated with
it, because the
program (generally speaking) requires an interpreter to run.
 With Java,
it may be possible to generate machine code, but I doubt
that one could
achieve the same kind of efficiency as one could by
programming in a
language that's meant for compilation in the first place. 
Since an
application like this may well have to handle very large
amounts of data,
or even just to make it possible that it could in the
future, my choice is
C++.  It might be possible to squeeze out a bit of extra
efficiency by
using C, but I find the C++ provides many useful facilities,
and I find
Bjoerne Stroustrup's (the main author of C++, in case you
don't know)
arguments plausible, that the costs of the extra features of
C++ are
reasonable with respect to their benefits.

There are libraries available for handling XML.  I've looked
into
`libxml'.  There's another one called `expat' that I haven't
looked at
yet.  I need to scan and parse XML data from OAI servers
where the format
is based on
Dublin Core.  I felt that `libxml' was more general than
what I needed,
and I've started working on a scanner/parser pair using Flex
and GNU
Bison.  I've got the framework working, but all it can parse
at present is
"<record>   </record>".  However,
great oaks from little acorns grow.  It
was slightly tricky getting it to work making both the
scanner and the
parser reentrant, as I was somewhat out of practice using
Flex.  I didn't
feel that using `libxml' instead would save me that much
work, especially
as I've never used
it before, but have used Flex and Bison.  Someone else might
choose a
different approach.

I hope you find this helpful.

Laurence Finston
http://wwwuser.gwdg
.de/~lfinsto1/



_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

Re: Simple application to retrieve MARC entry
user name
2007-02-14 08:04:16
On Wednesday 14 February 2007 12:43, lfinsto1gwdg.de
wrote:

> > I'm looking for an application like zoomsh
> > which when given an ISBN number
> > will retrieve the entry from the Library of
Congress in XML format,
> > and will then analyse the entry and output
> > a simpliifed version giving Author(s), Title,
Publisher, Date of
> > Publication
> > and perhaps a couple of other entries.
>
> It depends on what you mean by analyse and what you
want to do with the
> data once you've gotten it.  The task becomes more
complex if you want any
> of these features, and surely others that don't occur
to me at the moment:

Thanks very much for your comprehensive response.
I should probably explain (or confess) the exact use I have
in mind.

We have a fairly extensive (10,000 volume) research library
in our mathematics department.
I wrote a program over 20 years ago, based on the Unix
"refer" format,
for maintaining the library catalogue.
At that time our computer system, based on a pdp-11/23,
had two RLO2 20MB removable disks,
so space was at a premium, to put it mildly.

Over the years I have grown more and more ashamed of this
system
(accessible I think at <http://www
.maths.tcd.ie/local/library/>), 
and long ago decided it was time for a change.

At present our secretaries enter new books "by
hand",
typing in author, title, etc.
It seems that this could be greatly simplified by a program
in which the secretary simpy typed in the ISBN number,
and which then accessed the Library of Congress database,
and stored the entry, probably in XML format.

> 1.  The data should be written to a database

Presently the catalogue is stored in "refer"
format, as I said,
if that can be referred to as a "database".
I guess it would make sense to change to something like
MySQL,
which we use for other purposes.
But in fact the ancient Unix "hunt" program
which came with the "refer" system
is perfectly adequate for our modest needs.
I mean the information sought is returned with good speed.

> 2.  It should be possible later to retrieve the data
from a database and
> format it in one or more different ways, e.g., in HTML,
in PDF, as TeX
> input, as groff input, etc.

I guess this would be useful.
We do actually have a home-grown program
for translating the data into BibTeX format,
but something more general might be useful.

> 3.  The program should be able to send queries to the
server or servers in
> multiple, asynchronous threads.

I don't think this is relevant for our needs.

> 4.  The program should be able to handle very large
amounts of data.

No, the opposite would be true in our case;
the data sought would be an occasional trickle rather than a
stream.

> 5.  The program should be able to handle likely errors
(ideally, unlikely
> ones, too) and fail gracefully when it can't.

Again, I don't think this would be an issue.
Any errors could be corrected "by hand".

> 6.  The program should be accessible simultaneously to
multiple users.

I hadn't really thought of running a Z39.50 server (if that
is what is meant),
though that might conceivably be useful.

> 7.  It should be possible for users to access the
program from a browser.

I'm not sure which users you mean here.
I guess it would be nice if the program used by the
secretary
had a browser interface, which could be used under Windows
or Linux/BSD.

> 8.  The program should be able to handle data in more
than one format.

I don't think that would be relevant in our case.
I think if the catalogue was kept in MARCXML format
which could be displayed in a reasonably neat way
that would be perfectly adequate.

> > Is there such a program available for public
consumption?
> > If not, what would be the best language to code it
in?
>
> There are a number of programs available.  I've looked
into this, but
> haven't done a thorough survey.  I am, in fact, working
on a package that
> does something similar, and in the fullness of time may
possibly
> be usable for what you want to do.  It is called the
LDF Metadata Exchange
> Utilities, and I've applied to have it be made part of
the GNU Project of
> the Free Software Foundation.  This is the web page, if
you're interested:
>  http://www.nongnu.org/
iwf-mdh/

I'll have a look at this, thanks.

> There are a lot of tools, packages, libraries, etc.,
available, such as
> YAZ, so that programming an application is largely (but
not merely) a
> matter of combining them.  My impression is that many
of the free packages
> are in Perl or Java, and are no longer being developed
or maintained,
> meaning no disrespect to any that are.

Yaz-client seems to give everything we need, in fact.
Really what I was looking for was a simpler interface,
where the user was just asked for an ISBN number,
when the major items (author, title, etc) were displayed
and then saved with our local catalogue tag (eg
"s3542") added.

> As far as language is concerned, the answer to this is
another "it
> depends".  Any interpreted language such as Perl
or Java has a
> considerable cost in speed and efficiency associated
with it, because the
> program (generally speaking) requires an interpreter to
run.  With Java,
> it may be possible to generate machine code, but I
doubt that one could
> achieve the same kind of efficiency as one could by
programming in a
> language that's meant for compilation in the first
place.  Since an
> application like this may well have to handle very
large amounts of data,
> or even just to make it possible that it could in the
future, my choice is
> C++.  It might be possible to squeeze out a bit of
extra efficiency by
> using C, but I find the C++ provides many useful
facilities, and I find
> Bjoerne Stroustrup's (the main author of C++, in case
you don't know)
> arguments plausible, that the costs of the extra
features of C++ are
> reasonable with respect to their benefits.

Actually, as I have explained, I don't think speed matters
in our case.
If I am writing the program, simplicity would be more
important!

> There are libraries available for handling XML.  I've
looked into
> `libxml'.  There's another one called `expat' that I
haven't looked at
> yet.  I need to scan and parse XML data from OAI
servers where the format
> is based on
> Dublin Core.  I felt that `libxml' was more general
than what I needed,
> and I've started working on a scanner/parser pair using
Flex and GNU
> Bison.  I've got the framework working, but all it can
parse at present is
> "<record>   </record>".  However,
great oaks from little acorns grow.  It
> was slightly tricky getting it to work making both the
scanner and the
> parser reentrant, as I was somewhat out of practice
using Flex.  I didn't
> feel that using `libxml' instead would save me that
much work, especially
> as I've never used
> it before, but have used Flex and Bison.  Someone else
might choose a
> different approach.

I sort of assumed that there would be programs available
for displaying the XML format reasonably neatly,
as well as searching through the catalogue.
But I'll look at your suggestions.

> I hope you find this helpful.

Most helpful.
Thanks very much.

-- 
Timothy Murphy  
e-mail (<80k only): tim /at/ birdsnest.maths.tcd.ie
tel: +353-86-2336090, +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2,
Ireland

_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

Re: Simple application to retrieve MARC entry
country flaguser name
Germany
2007-02-14 14:39:20
Timothy Murphy wrote:

> On Wednesday 14 February 2007 12:43, lfinsto1gwdg.de
wrote:
>
>
> Thanks very much for your comprehensive response.

My pleasure.  I'd like to clarify that the main author of
C++ is Bjarne
Stroustrup, not Bjoerne.  Also, what I said about projects
no longer being
developed or maintained applied particularly to applications
rather than
tools and libraries.

>
> I wrote a program over 20 years ago, based on the Unix
"refer" format,
> for maintaining the library catalogue.

Never heard of this, although I fancy myself as fairly
knowledgeable about
Unix.

> At that time our computer system, based on a
pdp-11/23,
> had two RLO2 20MB removable disks,
> so space was at a premium, to put it mildly.

Ah, the Golden Age of computing!

>
> At present our secretaries enter new books "by
hand",
> typing in author, title, etc.
> It seems that this could be greatly simplified by a
program
> in which the secretary simpy typed in the ISBN number,
> and which then accessed the Library of Congress
database,
> and stored the entry, probably in XML format.

Sounds simple, but might be tricky to code.  The problem is
how to tell
the computer that the data in the "author" field
refers to the author.

>> 1.  The data should be written to a database
>
> Presently the catalogue is stored in "refer"
format, as I said,
> if that can be referred to as a "database".

Well, it's either a database, or it provides similar
functionality.

> I guess it would make sense to change to something like
MySQL,
> which we use for other purposes.
> But in fact the ancient Unix "hunt" program
> which came with the "refer" system
> is perfectly adequate for our modest needs.

Boy, never heard of `hunt', either.  It might be worthwhile
to look at
`nosql', which is simpler than MySQL or PostgreSQL and uses
the shell.

>
>> 3.  The program should be able to send queries to
the server or servers
>> in
>> multiple, asynchronous threads.
>
> I don't think this is relevant for our needs.
>

Perhaps not, but I find that it's nearly always a good idea
to account for
the possibly that one might one day want to use threads. 
It's not that
difficult, and Posix threads should be available on any
Unix-like system.

>> 6.  The program should be accessible simultaneously
to multiple users.
>
> I hadn't really thought of running a Z39.50 server (if
that is what is
> meant),
> though that might conceivably be useful.

It wouldn't have to be Z39.50, it could be OAI or perhaps
something else.
I don't know any other types for bibliographic data.

>
>> 7.  It should be possible for users to access the
program from a
>> browser.
>
> I'm not sure which users you mean here.

Any kind.  All I meant was, if it's meant to be a web
application, that
makes the problem more complex.


> Most helpful.
> Thanks very much.

You're very welcome.  Please feel free to ask if you have
any more questions.

Laurence




_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

Re: Simple application to retrieve MARC entry
country flaguser name
Germany
2007-02-15 08:03:22
Hi,

On Thu, 15 Feb 2007 11:47:24 +0100 (CET) Laurence Finston
<lfinsto1gwdg.de> wrote:

> The usual way of approaching the problem from this
point is to parse
> the XML data and store the information in a data
structure, probably
> some kind of tree.  This is the tricky part, and using
`libxml', 
> some other library, or any of the many tools available
for processing 
> XML doesn't seem to reduce the amount of work one has
to do
> significantly, no matter what approach one chooses.
> This is just my impression, and I'd be interested to
hear what other
> programmers' opinions are.

Obviously, you've not yet met the abyss that awaits you when
dealing
with data from lots of different parties. Writing a
spec-compatible XML
parser yourself is -- overkill. Have a look at the XML
specification,
it has really a lot of definitions. Are you prepared for
full XML
namespace implementation? CDATA embedded in your elements?
External
Entities (humm, yes, I think I'm neither...). Using libxml
*will* save
time, definately.

> However, once the data is stored in the data
> structure, writing it to a database or formatting it in
various ways
> is reasonably straightforward. It also makes it
possible to do much
> more complicated things with the data. It might be
possible to write
> a script or a program that can recognize some tags and
perform simple
> transformations [...]

Yep, there's XSLT for doing that with XML snippets...

> or put together a pipeline of utitilites to do this,
> as outlined by another poster.  If this would be
adequate for your
> needs, great.  However, my approach would be to parse
the data and
> store it in a data structure for the sake of the
additional
> functionality one could implement, once that's done.

Well, the data actually *is* encapsulated in a data
structure (your XML
snippet). There's no pressure to put it into another one.
You can most
probably get the added functionality anyway, e.g. by using a
dedicated
XML database.

> It takes a certain amount of effort to learn to use a
database
> package, but I believe it's worth the effort.

True. But don't think that "database package"
means relational
databases only. I even tend to think that their time might
come soon
because they scale harder to changed data structures than
(indexed) XML
does.

It was e.g. suggested to use the Koha ILS for what the OP
wanted. I
tend to agree. Koha uses a database for the document
metadata -- but
not a RDBMS, but Indexdata's fine Zebra server which really
offers
nifty functionality reg. indexing of (arbitrary) XML data.


-hwh

_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

Re: Simple application to retrieve MARC entry
country flaguser name
Germany
2007-02-15 10:27:19
On Thu, 15 Feb 2007, Hans-Werner Hilse wrote:

> Obviously, you've not yet met the abyss that awaits you
when dealing
> with data from lots of different parties. Writing a
spec-compatible XML
> parser yourself is -- overkill. Have a look at the XML
specification,
> it has really a lot of definitions. Are you prepared
for full XML
> namespace implementation? CDATA embedded in your
elements? External
> Entities (humm, yes, I think I'm neither...). Using
libxml *will* save
> time, definately.

I agree that I will have to use a library such as `libxml'
or `expat', if 
I
want to implement a general solution for parsing XML data. 
However,
the original task I had to solve was to retrieve data from
servers
using the OAI-PMH (Open Archives Initiative --- Protocol for
Metadata
Harvesting) using Visual C++, and store it in a database
using
Microsoft SQL Server.  It was possible to do this without
having to
learn too much about XML, because MS SQL Server provides a
stored
system procedure that represents arbitrary XML data in
tabular form.
I simply had to write a stored procedure to extract the data
from the
temporary table that was created and write it to the
database tables.

I am no longer using Windows, Visual Studio, or MS SQl
Server, and am
trying to find a way to use only free software under
GNU/Linux.  Much
of the code that I wrote for the original package, the IWF
Metadata
Harvester, will therefore be unusable for this purpose.  On
the other
hand, the format prescribed by OAI-PMH is simple enough that
it seemed
reasonable to try to solve the problem using Flex and GNU
Bison.  One
reason I did this was because I didn't find the
documentation for
`libxml' particularly easy to use.  I may return to it at a
later
date, but I don't think I need it just for the OAI data.  I
also think
that a scanner/parser pair for parsing XML data in some
format might
be useful for other purposes, since it wouldn't be too
difficult to
adapt to others, and the overhead is significantly less than
linking
to a large library such as `libxml'.

As far as USMARC is concerned, another part of the the IWF
Metadata
Harvester retrieves records from Z39.50 servers (actually
only tested
with one) in Pica format using the YAZ library.  This should
be easier
to port.  I also believe that the method I used, i.e., a
function for
every field and a function for the categories, where needed,
should be
transferrable to USMARC.  I therefore haven't given any
thought yet to
retrieving data in XML format from libraries that provide it
in USMARC 
or Pica.  Even though it seems very old-fashioned, I like
the
compactness of Pica. XML may have many advantages, but it is
certainly
verbose.

> 
> > However, once the data is stored in the data
> > structure, writing it to a database or formatting
it in various ways
> > is reasonably straightforward. It also makes it
possible to do much
> > more complicated things with the data. It might be
possible to write
> > a script or a program that can recognize some tags
and perform simple
> > transformations [...]
> 
> Yep, there's XSLT for doing that with XML snippets...
> 
[...]
> 
> Well, the data actually *is* encapsulated in a data
structure (your XML
> snippet). There's no pressure to put it into another
one. You can most
> probably get the added functionality anyway, e.g. by
using a dedicated
> XML database.

I've only just started learning about XML, so what I write
may be
out-of-date, incomplete, or wrong for some other reason. 
What I read
about XSLT said that it's designed for transforming XML to
XML or a
related format such as SGML or HTML.  It cannot be used to
transform
it to PostScript, PDF, TeX input, etc.  I would say that XML
code
represents a description of a data structure, but not a data
structure
itself.  A data structure in the sense that I mean would be
an object
of a given type in a running program; in C++, almost
certainly a
`class' type.  `libxml' doesn't do the work of defining this
data 
structure.  It does provide support for creating a tree
structure from
the XML data and traversing it, but one has to write the
code for
creating the objects and assigning values to their data
elements
oneself.  This seemed like the biggest part of the task to
me.  I'm
unfamiliar with `libxml', but I've already written scanners
and
parsers using Flex and GNU Bison.  It therefore seemed like
a
reasonable approach.

> 
> > It takes a certain amount of effort to learn to
use a database
> > package, but I believe it's worth the effort.
> 
> True. But don't think that "database package"
means relational
> databases only. I even tend to think that their time
might come soon
> because they scale harder to changed data structures
than (indexed) XML
> does.

Again, I haven't researched this topic thoroughly, but when
I was
looking for free database packages, none for XML databases
leaped out
at me.  The packages that looked most usable to me were
`nosql',
PostgreSQL, and MySQL.  I rather suspect that relational
databases
will still be with us for some time.  Nor do I know whether
an XML
database would be suitable for use with Pica or USMARC data.
 It's not
that I'm prejudiced against using an XML database; it just
seems that
it's not the most promising approach for me at present.

Thanks, 

Laurence



_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

Re: Simple application to retrieve MARC entry
country flaguser name
Germany
2007-02-15 10:41:41
On Thu, 15 Feb 2007, Sebastian Hammer wrote:

> Laurence Finston wrote:
> 
> 
> I'll offer up my opinion about the XML parsing issue as
a fellow programmer.

[...]

> 
> 4) Fianally, if you happen to be a C programmer, I have
been really delighted
> with the 'tree' API in libxml.. I find it more
intuitive and pleasant to use
> than many DOM-inspired APIs found in other languages
(see
> http://xmlsoft.org/ex
ample.html).

> 
> It does take an effort to get to know it, but, having
developed several
> XML-ish parsers myself, I can say that learning the
libxml API is definitely
> easier and faster, and well-worth the effort for all of
the fringe benefits
> you get.
> 

Well, that was certainly a ringing endorsement of `libxml'. 

I shall give it serious thought, along with what Hans-Werner
Hilse 
wrote before.  A couple of things that put me off
were the way the documentation is organized and the naming
conventions
of the datatypes, functions, and macros in the C API.  

The XML format used for OAI is so simple that I'm not sure
that it's
worth the effort, if I only want to parse data in this
format.
However, I'll think about it some more.

> Hope this is useful,

Very much so, thank you.

Laurence

_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

[1-10] [11]

about | contact  Other archives ( Real Estate discussion Medical topics )