Adam Dickmeiss wrote:
> Gary Anderson wrote:
>> I am not sure how this will help. In the
application, the last 2
>> bytes of the data string are oxea and 0x1e - the
diacritic and the
>> record mark. yaz_iconv seems to drop the diacritic
because it doesn't
>> have a trailing character, but it does process the
record mark. What
>> I need is something that will tell me that this
case has occurred. It
>> looks to me like yaz just drops the diacritic.
> I don't see a way the iconv interface could tell you
this. I'm still a
> little confused, so forgive me for asking,.. what is
the behavior you
> want? (keep the diacritic?)
>
In case you want *not* to keep the diacritic, in other words
you are
asking to be notified about an error .. then maybe it's best
to use
EINVAL because the iconv man page says:
"EINVAL An incomplete multibyte sequence has been
encountered in the
input."
Case 1:
So if you pass
.. 0xEA
you get EINVAL because no characters follow 0xEA (as far as
iconv is
concerned).
Case 2: If you pass
.. 0xEA 0x1E
that would not return an error. In fact YAZ currently
converts this UTF-8:
0x1E 0xCC 0x8A
because 0x1E is just a "character".
Unfortunately for case 1, YAZ currently returns 'unknown
error'. That's
no good. This has been fixed in the CVS version of YAZ.
/ Adam
> / Adam
>
>>
>> My checking indicates that on completion of
conversion of the record
>> mark, the yaz_iconv library is left in its 'initial
state'. The next
>> string converts just fine.
>> Gary
>>
>> Adam Dickmeiss wrote:
>>
>>> Gary Anderson wrote:
>>>
>>>> I am using the siconv interface. I have a
programmatic process that
>>>> deals with very large files of records.
>>>>
>>>> Adam Dickmeiss wrote:
>>>>
>>>>> Gary Anderson wrote:
>>>>>
>>>>>> I recently ran some tests using
records from the National Library
>>>>>> of Canada. Of the 600,000+ records
in their name and subject
>>>>>> authority file, six records had 670
tags where the subfield a data
>>>>>> ended in a combining diacritic
character with no following character.
>>>>>>
>>>>>> Submitting that data string
>>>>>>
(indicators+subfieldmark+subfieldcode+data+fieldmark) to
siconvert
>>>>>> resulted in an output string that
did not contain the diacritic
>>>>>> character. It was dropped. The
field mark character was
>>>>>> retained. Can you suggest a means
for notifying the caller when
>>>>>> this condition occurs? Byte counts
don't really work because UTF8
>>>>>> is one side or the other of the
conversion transaction.
>>>>>>
>>>>>> The ending diacritic values were:
0xE2, 0xE5, 0xE8, 0xEA, and 0xF6.
>>>>>
>>>
>>> I think you need to do is to "flush"
reset to the "initial state".
>>> The flush would take place after a field or
subfield ends.
>>>
>>> That's done by iconv and, hopefully, yaz_iconv
by setting inbuf or
>>> *inbuf to NULL, but outbut to non-NULL, i.e.
>>>
>>> yaz_iconv(cd, 0, 0, &outbuf,
&outbytesleft);
>>>
>>> From 'man 3 iconv':
>>> "
>>> A different case is when inbuf is NULL or
*inbuf is NULL, but outbuf is
>>> not NULL and *outbuf is not NULL. In this case,
the iconv() function
>>> attempts to set cd's conversion state to the
initial state and store a
>>> corresponding shift sequence at *outbuf. At
most *outbytesleft bytes,
>>> starting at *outbuf, will be written. If the
output buffer has no more
>>> room for this reset sequence, it sets errno
to E2BIG and returns
>>> (size_t)(-1). Otherwise it increments
*outbuf and decrements *out-
>>> bytesleft by the number of bytes written.
>>> "
>>>
>>> Use YAZ 2.1.48 or later for this to work.
>>>
>>> / Adam
>>>
>>>>>>
>>>>> Did you use yaz-marcdump for the
conversion?
>>>>>
>>>>> Or did you do something else ? (such as
programming towards the
>>>>> siconv interface)?
>>>>>
>>>>> / Adam
>>>>>
>>>>>> Thanks
>>>>>> Gary
>>>>>>
>>>>>>
_______________________________________________
>>>>>> Yazlist mailing list
>>>>>> Yazlist lists.indexdata.dk
>>>>>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
_______________________________________________
>>>>> Yazlist mailing list
>>>>> Yazlist lists.indexdata.dk
>>>>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>>>>>
>>>>
>>>>
_______________________________________________
>>>> Yazlist mailing list
>>>> Yazlist lists.indexdata.dk
>>>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>>>
>>>
>>>
>>>
_______________________________________________
>>> Yazlist mailing list
>>> Yazlist lists.indexdata.dk
>>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>>>
>>
>> _______________________________________________
>> Yazlist mailing list
>> Yazlist lists.indexdata.dk
>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>
>
> _______________________________________________
> Yazlist mailing list
> Yazlist lists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>
>
_______________________________________________
Yazlist mailing list
Yazlist lists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
|