|
List Info
Thread: php/yaz
|
|
| php/yaz |

|
2007-04-09 10:02:48 |
I asked some time ago about a program to retrieve book
details in XML
given the ISBN number,
and a number of suggestions were made, all of which I
followed up.
[In particular, I tried zoomsh + XSLT, but this turned out
to be
too difficult for me to implement.]
In brief, I found I had to delve into PHP
(a language I knew nothing about)
to get a satisfactory solution.
I give my php program at the end -
however, this is more for interest than out of any pride in
it!
I'm sure it is far from the polished program
many of you would produce.
Though I did look around for such a program, without luck.
Any suggestions for the improvement of my program
will be gratefully received.
My solution leaves one simple question unanswered:
I based the program on the output of the Library of
Congress
Z39.50 server, using their XML format.
However, I found that other servers, eg Oxford University
(the URLs are given at the start of the program)
gave the result in a different XML format from LoC.
I actually prefer the Oxford XML format,
and wondered if there is a simple 1-1 mapping
between the two?
Or indeed between Marc and Oxford XML?
Maybe it would be better to retrieve the data
in MARC format, even if it is to be stored in XML?
------------------------- isbn.php -----------------------
<html>
<body>
<?php
// For list of servers see http://targettest.in
dexdata.com/
$servers = array("z3950.loc.gov:7090/voyager",
"library.tcd.ie:210/advance",
"library.ox.ac.uk:210/ADVANCE",
"library2.open.ac.uk:7090/voyager");
function isbncheck($entry) {
if (ereg('[^-0-9X]$', $entry)) {
echo "ISBN number can only contain digits 0-9, letter
X and hyphens (-'s)";
return 0;
}
$isbn=ereg_replace("-","",$entry);
if (strlen($isbn)!=10) {
echo "Number must be of length 10 (excluding
-'s)";
return 0;
}
$j=0;
for ($i = 0; $i < 9; $i++)
$j += $isbn[$i] * (10 - $i);
if ($isbn[9] == "X")
$j += 10;
else
$j += $isbn[9];
if ($j % 11 == 0)
return $isbn;
else {
echo "ISBN has wrong checksum (last
character)";
return 0;
}
}
if (!isset($_REQUEST["ISBN"])) {
echo '<form method="post">
<br />
ISBN number: <input type="text"
size="15" name="ISBN" /><br />
<input type="submit" name="action"
value="Submit" />
</form>
';
} else {
$entry=$_REQUEST["ISBN"];
$isbn=isbncheck($entry);
echo "ISBN $isbn: ";
// Look up ISBN number at Z39.50 servers
for ($server = 0; $server < count($servers); $server++)
{
$id = yaz_connect($servers[$server]);
$query = " attr 1=7 $isbn";
yaz_syntax($id, "xml");
yaz_range($id, 1, 10);
yaz_search($id, "rpn", $query);
yaz_wait();
$error = yaz_error($id);
if (!empty($error))
echo "Error: $error";
else {
$hits = yaz_hits($id);
if ($hits > 0)
break;
}
}
if ($hits == 0) {
echo "Sorry, cannot find book with IBSN number
$isbn<br />";
break;
} elseif ($hits == 1)
echo "1 match<br />";
else
echo "$hits matches<br />";
// Display entry
$rec =
"<catalog>".str_replace("xmlns",&q
uot;ns",yaz_record($id,
1, "string"))."</catalog>";
echo htmlentities($rec);
$xml = new SimpleXMLElement($rec);
$author = array("");
$authors = 0;
foreach ($xml->record->datafield as $datafield) {
switch((string) $datafield['tag']) { // Get attributes
as element
indices
case '100':
foreach ($datafield->children() as $subfield) {
switch((string) $subfield['code']) { // Get
attributes as element
indices
case 'a':
$author[$authors] = $subfield;
$authors++;
break;
}
}
break;
case '700':
foreach ($datafield->children() as $subfield) {
switch((string) $subfield['code']) { // Get
attributes as element
indices
case 'a':
$author[$authors] = $subfield;
$authors++;
break;
}
}
break;
case '245':
foreach ($datafield->subfield as $subfield) {
switch((string) $subfield['code']) { // Get
attributes as element
indices
case 'a':
$title = strtr($subfield,"/"," ");
break;
}
}
break;
case '260':
foreach ($datafield->children() as $subfield) {
switch((string) $subfield['code']) { // Get
attributes as element
indices
case 'b':
$publisher = strtr($subfield,",.","
");
break;
case 'c':
$date = strtr($subfield,"c,.","
");
break;
}
}
break;
}
}
print "<table border='2' cellpadding='6'
align='center'";
print "summary='Output of Z39.50 server'>";
print
"<tr><td><b>ISBN</b></td>
<td>$isbn</td></tr>";
if ($authors==1) {
print
"<tr><td><b>Author</b></td&g
t;<td>$author[0]</td></tr>";
} else {
print "<tr><td
rowspan=$authors><b>Authors</b></td><
;td>$author[0]</td></tr>";
for ($i = 1; $i < $authors; $i++) {
print
"<tr><td>$author[$i]</td></tr>&
quot;;
}
}
print
"<tr><td><b>Title</b></td>
;<td>$title</td></tr>";
print
"<tr><td><b>Publisher</b></t
d><td>$publisher</td></tr>";
print
"<tr><td><b>Date</b></td>
<td>$date</td></tr>";
print "</table>";
// Add new book to catalog
$docA = new DOMDocument;
$docA->formatOutput=true;
$docA->preserveWhiteSpace=false;
if ($docA->load("catalog.xml")) {
echo "Catalog parsed<br />";
} else {
echo "Error while parsing catalog<br />";
exit;
}
$docB = new DOMDocument;
$docB->formatOutput=true;
$docB->preserveWhiteSpace=false;
if ($docB->loadXML($rec)) {
echo "New entry parsed<br />";
} else {
echo "Error while parsing new entry<br
/>";
exit;
}
$xpath = new DOMXPath($docB);
$nodes = $xpath->query('//catalog/record');
foreach($nodes as $n) {
$new = $docA->importNode($n, true);
$docA->documentElement->appendChild($new);
}
$output = $docA->save("/tmp/catalog.xml");
echo "Output: $output bytes<br />";
}
?><br />
</body>
</html>
------------------------------------------------------------
-
The program assumes the existence of catalog.xml,
possibly empty as
<?xml version="1.0"
encoding="iso-8859-1" ?>
<catalog>
</catalog>
--
Timothy Murphy
e-mail (<80k only): tim /at/ birdsnest.maths.tcd.ie
tel: +353-86-2336090, +353-1-2842366
s-mail: School of Mathematics, Trinity College, Dublin 2,
Ireland
_______________________________________________
Yazlist mailing list
Yazlist lists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
|
|
| Re: php/yaz |
  United States |
2007-04-09 11:18:10 |
Timothy,
I suspect that your program is currently requesting XML and
then allows the Z39.50 server to return its default schema.
You may wish to take a look at the other XML schemas that
are supported by the YAZ Proxy (that is in front of our
Voyager Z-server) and see if you like one of those better
than the MARCXML schema (which is our default). In order
to specify the schema, you would need to include a request
for the desired elementSetName prior to retrieving the
record. (Below is an example YAZ transaction log
requesting
all three schemas.)
Larry
-----------------------------------------------------
Z> open z3950.loc.gov:7090/voyager
Name : Voyager LMS - Z39.50 Server (YAZ Proxy)
Version: 2003.1.1/1.2.1.1
Options: search present
Z> f attr 1=7 1842228250
Number of hits: 1, setno 0
Z> format xml
Z> elements marcxml [Note: request for MARCXML
schema]
Z> s 1
Sent presentRequest (1+1).
Records: 1
[VOYAGER]Record type: XML
<record xmlns="http://www.l
oc.gov/MARC21/slim">
<!-- Length implementation at offset 22 should hold a
digit. Assuming 0
-->
<leader>01386cam a2200349 a 4500</leader>
<controlfield
tag="001">13584118</controlfield>
<controlfield
tag="005">20050412122605.0</controlfield>
<controlfield tag="008">040506s2003
enka 001 0beng
</controlfield>
<datafield tag="906" ind1=" "
ind2=" ">
<subfield code="a">7</subfield>
<subfield code="b">cbc</subfield>
<subfield
code="c">copycat</subfield>
<subfield code="d">2</subfield>
<subfield
code="e">ncip</subfield>
<subfield code="f">20</subfield>
<subfield
code="g">y-gencatlg</subfield>
</datafield>
<datafield tag="925" ind1="0"
ind2=" ">
<subfield
code="a">acquire</subfield>
<subfield code="b">2 shelf
copies</subfield>
<subfield code="x">policy
default</subfield>
</datafield>
<datafield tag="955" ind1=" "
ind2=" ">
<subfield code="a">nb11 2004-05-06 ON
ORDER ; nb05 2004-09-23 to HLCD
for processing</subfield>
<subfield code="c">lk50
2004-12-30</subfield>
<subfield code="d">lh06
2005-04-01</subfield>
<subfield code="e">lh36 2005-04-12 to
Dewey</subfield>
</datafield>
<datafield tag="010" ind1=" "
ind2=" ">
<subfield code="a">
2004445757</subfield>
</datafield>
<datafield tag="015" ind1=" "
ind2=" ">
<subfield
code="a">GBA3-Z2935</subfield>
</datafield>
<datafield tag="020" ind1=" "
ind2=" ">
<subfield
code="a">1842228250</subfield>
</datafield>
<datafield tag="035" ind1=" "
ind2=" ">
<subfield
code="a">(OCoLC)ocm51483185</subfield>
</datafield>
<datafield tag="040" ind1=" "
ind2=" ">
<subfield code="a">UKM</subfield>
<subfield code="c">UKM</subfield>
<subfield
code="d">OCLCQ</subfield>
<subfield code="d">DLC</subfield>
</datafield>
<datafield tag="042" ind1=" "
ind2=" ">
<subfield
code="a">lccopycat</subfield>
</datafield>
<datafield tag="043" ind1=" "
ind2=" ">
<subfield
code="a">e-uk---</subfield>
</datafield>
<datafield tag="050" ind1="0"
ind2="0">
<subfield
code="a">DA566.9.C5</subfield>
<subfield code="b">C4765
2003</subfield>
</datafield>
<datafield tag="082" ind1="0"
ind2="4">
<subfield
code="a">940.540092</subfield>
<subfield code="2">21</subfield>
</datafield>
<datafield tag="245" ind1="0"
ind2="0">
<subfield code="a">Churchill at war
:</subfield>
<subfield code="b">his 'finest
hour' in photographs,
1940-1945 /</subfield>
<subfield code="c">[compiled by] Martin
Gilbert.</subfield>
</datafield>
<datafield tag="260" ind1=" "
ind2=" ">
<subfield code="a">bond
:</subfield>
<subfield
code="b">Carlton,</subfield>
<subfield
code="c">2003.</subfield>
</datafield>
<datafield tag="300" ind1=" "
ind2=" ">
<subfield code="a">160 p.
:</subfield>
<subfield code="b">chiefly ill.
;</subfield>
<subfield code="c">29
cm.</subfield>
</datafield>
<datafield tag="500" ind1=" "
ind2=" ">
<subfield code="a">At head of title:
Imperial War Museum.</subfield>
</datafield>
<datafield tag="500" ind1=" "
ind2=" ">
<subfield code="a">Includes
index.</subfield>
</datafield>
<datafield tag="600" ind1="1"
ind2="0">
<subfield code="a">Churchill,
Winston,</subfield>
<subfield
code="c">Sir,</subfield>
<subfield
code="d">1874-1965</subfield>
<subfield
code="v">Portraits.</subfield>
</datafield>
<datafield tag="650" ind1=" "
ind2="0">
<subfield code="a">World War,
1939-1945</subfield>
<subfield code="z">Great
Britain</subfield>
<subfield code="v">Pictorial
works.</subfield>
</datafield>
<datafield tag="600" ind1="1"
ind2="0">
<subfield code="a">Churchill,
Winston,</subfield>
<subfield
code="c">Sir,</subfield>
<subfield
code="d">1874-1965</subfield>
<subfield code="x">Military
leadership</subfield>
<subfield code="v">Pictorial
works.</subfield>
</datafield>
<datafield tag="650" ind1=" "
ind2="0">
<subfield code="a">Prime
ministers</subfield>
<subfield code="z">Great
Britain</subfield>
<subfield
code="v">Biography</subfield>
<subfield code="v">Pictorial
works.</subfield>
</datafield>
<datafield tag="700" ind1="1"
ind2=" ">
<subfield code="a">Gilbert,
Martin,</subfield>
<subfield
code="d">1936-</subfield>
</datafield>
<datafield tag="710" ind1="2"
ind2=" ">
<subfield code="a">Imperial War Museum
(Great Britain)</subfield>
</datafield>
<datafield tag="923" ind1=" "
ind2=" ">
<subfield
code="d">20040823</subfield>
<subfield code="n">127</subfield>
<subfield
code="s">Bookspeed</subfield>
</datafield>
</record>
Z> elements dc [Request for Dublin Core]
Z> s 1
Sent presentRequest (1+1).
Records: 1
[VOYAGER]Record type: XML
<?xml version="1.0"
encoding="UTF-8"?>
<srw_dc:dc
xmlns:srw_dc="info:srw/schema/1/dc-schema"
xmlns si=&q
uot;http://www.w3.o
rg/2001/XMLSchema-instance" xmlns="http://purl.or
g/dc/elements/1.1/" xsi:schemaL
ocation="info:srw/schema/1/dc-schema
http://ww
w.loc.gov/standards/sru/dc-schema.
xsd">
<title>Churchill at war : his 'finest hour' in
photographs, 1940-1945
/</title>
<creator>Gilbert, Martin, 1936-</creator>
<creator>Imperial War Museum (Great
Britain)</creator>
<type>text</type>
<publisher>bond : Carlton,</publisher>
<date>2003.</date>
<language>eng</language>
<subject>Churchill, Winston, Sir,
1874-1965--Portraits.</subject>
<subject>Churchill, Winston, Sir,
1874-1965--Military
leadership--Pictorial works.</subject>
<subject>World War, 1939-1945--Great
Britain--Pictorial works.</subject>
<subject>Prime ministers--Great
Britain--Biography--Pictorial
works.</subject>
<identifier>URN:ISBN:1842228250</identifier>
</srw_dc:dc>
Z> elements mods [Request for MODS]
Z> s 1
Sent presentRequest (1+1).
Records: 1
[VOYAGER]Record type: XML
<?xml version="1.0"
encoding="UTF-8"?>
<mods
xmlns si=&q
uot;http:
//www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.lo
c.gov/mods/v3" version="3.0"
xsi:schemaLocation="http://www.loc.gov/mods/v3
http
://www.loc.gov/standards/mods/v3/mods-3-0.xsd">
<titleInfo>
<title>Churchill at war</title>
<subTitle>his 'finest hour' in photographs,
1940-1945</subTitle>
</titleInfo>
<name type="personal">
<namePart>Gilbert, Martin</namePart>
<namePart
type="date">1936-</namePart>
</name>
<name type="corporate">
<namePart>Imperial War Museum (Great
Britain)</namePart>
</name>
<typeOfResource>text</typeOfResource>
<genre
authority="marc">biography</genre>
<originInfo>
<place>
<placeTerm type="code"
authority="marccountry">enk</placeTerm>
</place>
<place>
<placeTerm
type="text">bond</placeTerm>
</place>
<publisher>Carlton</publisher>
<dateIssued>2003</dateIssued>
<issuance>monographic</issuance>
</originInfo>
<language>
<languageTerm authority="iso639-2b"
type="code">eng</languageTerm>
</language>
<physicalDescription>
<form
authority="marcform">print</form>
<extent>160 p. : chiefly ill. ; 29
cm.</extent>
</physicalDescription>
<note type="statement of
responsibility">[compiled by] Martin
Gilbert.</note>
<note>At head of title: Imperial War
Museum.</note>
<note>Includes index.</note>
<subject>
<geographicCode
authority="marcgac">e-uk---</geographicCode&
gt;
</subject>
<subject authority="lcsh">
<name type="personal">
<namePart
type="termsOfAddress">Sir</namePart>
<namePart>Churchill, Winston</namePart>
<namePart
type="date">1874-1965</namePart>
</name>
<topic>Portraits</topic>
</subject>
<subject authority="lcsh">
<topic>World War, 1939-1945</topic>
<geographic>Great Britain</geographic>
<topic>Pictorial works</topic>
</subject>
<subject authority="lcsh">
<name type="personal">
<namePart
type="termsOfAddress">Sir</namePart>
<namePart>Churchill, Winston</namePart>
<namePart
type="date">1874-1965</namePart>
</name>
<topic>Military leadership</topic>
<topic>Pictorial works</topic>
</subject>
<subject authority="lcsh">
<topic>Prime ministers</topic>
<geographic>Great Britain</geographic>
<topic>Biography</topic>
<topic>Pictorial works</topic>
</subject>
<classification authority="lcc">DA566.9.C5
C4765 2003</classification>
<classification authority="ddc"
edition="21">940.540092</classification>
<identifier
type="isbn">1842228250</identifier>
<identifier
type="lccn">2004445757</identifier>
<recordInfo>
<recordContentSource
authority="marcorg">UKM</recordContentSource
>
<recordCreationDate
encoding="marc">040506</recordCreationDate&g
t;
<recordChangeDate
encoding="iso8601">20050412122605.0</recordC
hangeDate>
<recordIdentifier>13584118</recordIdentifier>
</recordInfo>
</mods>
------------------------------------------------------
Z> open library.ox.ac.uk:210/advance
ID : 1995
Name : Geac Advance Z39.50 SERVER
Version: 6.8
Options: search present delSet scan sort extendedServices
namedResultSets
Z> f attr 1=7 1842228250
Number of hits: 1, setno 1
Z> format xml
Z> s 1
Sent presentRequest (1+1).
Records: 1
[MAIN*BIBMAST]Record type: XML
<?xml version = "1.0"
encoding="UTF-8"?>
<SearchResults>
<Resource>
<AdvInfo
ReleaseLevel="6.82.24"
LocalStyle="GeacLocalTransform.xsl"
LocalCitStyle="GeacLocalCitationTransform.xsl"
LocalType="text/xsl">
</AdvInfo>
<LCN>
15644858
</LCN>
<Year> 2003</Year>
<ISBN SrchTerm="1842228250"> 1842228250
</ISBN>
<Creator SrchTerm="Gilbert, Martin,">
Gilbert, Martin, </Creator>
<Title SrchTerm="Churchill at war :">
Churchill at war : his 'finest
hour&a
pos; in photographs 1940-1945 / Martin Gilbert.
</Title>
<PubYear> 2003. </PubYear>
<Note SrchTerm="Published in association with the
Imperial War Museum.">
Publish
ed in association with the Imperial War Museum.
</Note>
<Note SrchTerm="Ill. on lining papers.">
Ill. on lining papers. </Note>
<Subject SrchTerm="Churchill, Winston,">
Churchill, Winston,
1874-1965. </Subject>
<Subject SrchTerm="World War, 1939-1945">
World War, 1939-1945 Great
Britain. </
Subject>
<Subject SrchTerm="World War, 1914-1918">
World War, 1914-1918 Great
Britain. </
Subject>
<Contributor SrchTerm="Imperial War Museum (Great
Britain)"> Imperial War
Museum
(Great Britain) </Contributor>
<CitHoldings>
<CitHoldingsByLoc>
<CitLocation
CitCallNumber="M04.C06568"
CitInstitution="Bodley" CitSublocation="BOD
Bookstack
" CitCollection="">
</CitLocation>
<CitHoldingAvailOwnCnt>
0/1
</CitHoldingAvailOwnCnt>
<CitHoldingKeys>
18365590
</CitHoldingKeys>
<LocationImage ImageDesc=""
ImagePath="Bodleian - K
Floor" LinkPath="230550000">
</LocationImage>
</CitHoldingsByLoc>
</CitHoldings>
</Resource>
</SearchResults>
On Mon, 9 Apr 2007, Timothy Murphy wrote:
> I asked some time ago about a program to retrieve book
details in XML
> given the ISBN number,
> and a number of suggestions were made, all of which I
followed up.
> [In particular, I tried zoomsh + XSLT, but this turned
out to be
> too difficult for me to implement.]
>
> In brief, I found I had to delve into PHP
> (a language I knew nothing about)
> to get a satisfactory solution.
>
> I give my php program at the end -
> however, this is more for interest than out of any
pride in it!
> I'm sure it is far from the polished program
> many of you would produce.
> Though I did look around for such a program, without
luck.
>
> Any suggestions for the improvement of my program
> will be gratefully received.
>
> My solution leaves one simple question unanswered:
>
> I based the program on the output of the Library of
Congress
> Z39.50 server, using their XML format.
> However, I found that other servers, eg Oxford
University
> (the URLs are given at the start of the program)
> gave the result in a different XML format from LoC.
>
> I actually prefer the Oxford XML format,
> and wondered if there is a simple 1-1 mapping
> between the two?
> Or indeed between Marc and Oxford XML?
>
> Maybe it would be better to retrieve the data
> in MARC format, even if it is to be stored in XML?
>
> ------------------------- isbn.php
-----------------------
[lines removed from original message]
> --
> Timothy Murphy
> e-mail (<80k only): tim /at/ birdsnest.maths.tcd.ie
> tel: +353-86-2336090, +353-1-2842366
> s-mail: School of Mathematics, Trinity College, Dublin
2, Ireland
------------------------------------------------------------
Larry E. Dixson Internet: ldix loc.gov
Network Development and MARC
Standards Office, LA327
Library of Congress Telephone: (202)
707-5807
Washington, D.C. 20540-4402 Fax: (202)
707-0115
_______________________________________________
Yazlist mailing list
Yazlist lists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
|
|
[1-2]
|
|