List Info

Thread: Re: ANSEL text encoding




Re: ANSEL text encoding
country flaguser name
United States
2007-04-09 13:59:13
Jon,
I can't answer your ZOOM-related questions, but I can tell
you
that you are going to find a variety of character-set
support
with Z39.50 servers.  For example, our server is not
currently
behaving as might be expected.  Search terms that contain 
diacritics (or special characters) must have those special 
characters encoded in UTF-8 (or not present in the search
term)
in order to match entries in our indexes.  (We don't
currently 
support character-set negotiation, and we are currently
configured
to return records in MARC-8 only.)

Some sample searches follow, FYI.
Larry

-----------------------------------------------
Z> open z3950.loc.gov:7090/voyager
ID     : 34
Name   : Voyager LMS - Z39.50 Server (YAZ Proxy)
Version: 2003.1.1/1.2.1.1
Options: search present

Z> f attr 1=1003 "BFohmer, GFunter"    [MARC-8
umlauts]
Number of hits: 0

Z> f attr 1=1003 "Bo&jhmer, Gu&jnter   [UTF-8
umlauts]
Number of hits: 42

Z> s 1
Sent presentRequest (1+1).
Records: 1
[VOYAGER]Record type: USmarc      [Record returned in
MARC-8]
00819cam  2200217 a 4500
001 948744
005 20030317112257.0
008 830722s1969    gw a          000 0 ger c
035    $9 (DLC)   83672065
906    $a 7 $b cbc $c orignew $d u $e ncip $f 19 $g
y-gencatlg
010    $a    83672065
040    $a MH $c MH $d DLC
050 00 $a Z4.Z9 $b B83 1969
245 00 $a BFucher und Menschen / $c mit BeitrFagen von Peter
Suhrkamp
... [et al.] ; und mit einem Nachwort von Georg Kurt Schauer
; 
Zeichnungen von Gunter BFohmer.
260    $a Frankfurt am Main : $b Mergenthaler-Verlag der
Linotype, $c
c1969.
300    $a 115 p. : $b col. ill. ; $c 29 cm.
650  0 $a Books $z Germany.
650  0 $a Books and reading $z Germany.
700 1  $a Suhrkamp, Peter, $d 1891-1959.
700 1  $a BFohmer, Gunter, $d 1911-

Z> f attr 1=1003 "Bo&jhmer, Gunter    [One UTF-8
umlaut (surname)]
Number of hits: 42

Z> f attr 1=1003 "Bohmer, Gu&jnter    [One UTF-8
umlaut (first name)]
Number of hits: 42

Z> f attr 1=1003 "bohmer, gunter"     [No
umlauts]
Number of hits: 42


On Mon, 9 Apr 2007, jda wrote:

> >Actually, MARC-8 uses ANSEL (ANSI Extended Latin)
as its default 
> >character set, but also uses other non-roman
character sets as well. 
> >In that sense they are not the same: ANSEL is a
separate spec that 
> >is included as a subset of MARC-8 character sets.
> >
> 
> Thanks. With ZOOM, should I specify MARC8 or ANSEL,
then, if 
> accessing an ANSEL site (I'm using "marc8"
now)?
> 
> Another question I have is if I need to encode the
query as ANSEL? 
> Right now I'm sending queries as ISO-8559-1 to site
that support 
> ANSEL (because I don't know how to do the ANSEL
encoding myself). I'm 
> getting mixed results with queries that have accented
characters, but 
> I don't know if that's because the query isn't encoded
as ANSEL or 
> whether the library just doesn't handle accented
characters correctly.
> 
> If the query needs to be ANSEL/MARC8-encoded, does ZOOM
handle that 
> (I've poured over the docs I could find, but see
nothing specific 
> about ZOOM handling query encoding)?
> 
> Thanks again,
> 
> Jon
> 

------------------------------------------------------------

Larry E. Dixson                    Internet:    ldixloc.gov
Network Development and MARC
   Standards Office, LA327
Library of Congress                Telephone: (202)
707-5807
Washington, D.C.  20540-4402       Fax:       (202)
707-0115


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

Re: ANSEL text encoding
country flaguser name
United States
2007-04-09 14:38:32
>Jon,
>I can't answer your ZOOM-related questions, but I can
tell you
>that you are going to find a variety of character-set
support
>with Z39.50 servers.  For example, our server is not
currently
>behaving as might be expected.  Search terms that
contain
>diacritics (or special characters) must have those
special
>characters encoded in UTF-8 (or not present in the
search term)
>in order to match entries in our indexes.  (We don't
currently
>support character-set negotiation, and we are currently
configured
>to return records in MARC-8 only.)
>
>Some sample searches follow, FYI.
>Larry
>

Thanks Sebastian and Larry for your answers.

So the LOC defaults to a UTF8 search? The 
situation indeed is messy. I tried doing UTF8 
searches for other libraries and they failed -- 
ISO-8559-1 searches worked! (I tried a name with 
an umlauted u (ü)). I have had luck with UTF-8 
queries with libraries that return UTF-8 (e.g. 
some Japanese libraries).

It would certainly seem that the query encoding 
and the result encoding should be the same. But 
such is life.

Thanks again.

Jon

_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )