Hi,
I'm writing a client using the ruby-zoom binding [1] to
retrieve
USMARC/MARC21 records. If I set the option charset to
'UTF-8', does
this really mean?: "If you understand what UTF-8 is and
you can
provide records in that charset, please do so, otherwise
just give me
the default." In other words I could request UTF-8
records and be
given MARC-8 records?
Do ztargets announce what charsets they can provide records
in?
I want to make sure that all the records I finally use are
UTF-8
encoded. Ruby-zoom has an xml method which takes charsets
as
parameters for conversion [2]. Many of the ztargets I
connect to will
only be able to provide MARC-8 encoded records while others
will
supply UTF-8 records. What if I simply run them all through
a
conversion like so?
record.xml ( 'MARC-8', 'UTF-8' )
This would properly convert MARC-8 records, but would it
mangle
records that are already in UTF-8 (or in another character
encoding)?
Or do I need some way to determine the character encoding of
the
record first before selecting the proper conversion? I know
I can
check leader byte 9 for Library of Congress records (blank =
marc8, a
= utf8), any ideas if other targets would use this position
in the
same way consistently?
If I'm only grabbing USMARC/MARC21 records, am I likely to
get records
in charsets other than MARC-8 and UTF-8?
Thanks for any help getting me through the morass of charset
encodings.
--Jason Ronallo
[1] http://ruby-zoom.ruby
forge.org/
[2] http:/
/ruby-zoom.rubyforge.org/xhtml/ch03.html
_______________________________________________
Yazlist mailing list
Yazlist lists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
|