List Info

Thread: Fields ending in combining diacritics




Fields ending in combining diacritics
country flaguser name
United States
2007-03-07 14:05:15
I recently ran some tests using records from the National
Library of 
Canada.  Of the 600,000+ records in their name and subject
authority 
file, six records had 670 tags where the subfield a data
ended in a 
combining diacritic character with no following character.

Submitting that data string 
(indicators+subfieldmark+subfieldcode+data+fieldmark) to
siconvert 
resulted in an output string that did not contain the
diacritic 
character.  It was dropped.  The field mark character was
retained.  Can 
you suggest a means for notifying the caller when this
condition 
occurs?  Byte counts don't really work because UTF8 is one
side or the 
other of the conversion transaction.

The ending diacritic values were:  0xE2, 0xE5, 0xE8, 0xEA,
and 0xF6.

Thanks
Gary

_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

  
Re: Fields ending in combining diacritics
country flaguser name
Denmark
2007-03-08 04:05:54
Gary Anderson wrote:
> I recently ran some tests using records from the
National Library of 
> Canada.  Of the 600,000+ records in their name and
subject authority 
> file, six records had 670 tags where the subfield a
data ended in a 
> combining diacritic character with no following
character.
> 
> Submitting that data string 
> (indicators+subfieldmark+subfieldcode+data+fieldmark)
to siconvert 
> resulted in an output string that did not contain the
diacritic 
> character.  It was dropped.  The field mark character
was retained.  Can 
> you suggest a means for notifying the caller when this
condition 
> occurs?  Byte counts don't really work because UTF8 is
one side or the 
> other of the conversion transaction.
> 
> The ending diacritic values were:  0xE2, 0xE5, 0xE8,
0xEA, and 0xF6.
> 
Did you use yaz-marcdump for the conversion?

Or did you do something else ? (such as programming towards
the siconv 
interface)?

/ Adam

> Thanks
> Gary
> 
> _______________________________________________
> Yazlist mailing list
> Yazlistlists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

Re: Fields ending in combining diacritics
country flaguser name
United States
2007-03-08 13:44:06
I am using the siconv interface.  I have a programmatic
process that 
deals with very large files of records.

Adam Dickmeiss wrote:

> Gary Anderson wrote:
>
>> I recently ran some tests using records from the
National Library of 
>> Canada.  Of the 600,000+ records in their name and
subject authority 
>> file, six records had 670 tags where the subfield a
data ended in a 
>> combining diacritic character with no following
character.
>>
>> Submitting that data string 
>>
(indicators+subfieldmark+subfieldcode+data+fieldmark) to
siconvert 
>> resulted in an output string that did not contain
the diacritic 
>> character.  It was dropped.  The field mark
character was retained.  
>> Can you suggest a means for notifying the caller
when this condition 
>> occurs?  Byte counts don't really work because UTF8
is one side or 
>> the other of the conversion transaction.
>>
>> The ending diacritic values were:  0xE2, 0xE5,
0xE8, 0xEA, and 0xF6.
>>
> Did you use yaz-marcdump for the conversion?
>
> Or did you do something else ? (such as programming
towards the siconv 
> interface)?
>
> / Adam
>
>> Thanks
>> Gary
>>
>> _______________________________________________
>> Yazlist mailing list
>> Yazlistlists.indexdata.dk
>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>
>
>
> _______________________________________________
> Yazlist mailing list
> Yazlistlists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

  
Re: Fields ending in combining diacritics
country flaguser name
Denmark
2007-03-08 16:07:29
Gary Anderson wrote:
> I am using the siconv interface.  I have a programmatic
process that 
> deals with very large files of records.
> 
> Adam Dickmeiss wrote:
> 
>> Gary Anderson wrote:
>>
>>> I recently ran some tests using records from
the National Library of 
>>> Canada.  Of the 600,000+ records in their name
and subject authority 
>>> file, six records had 670 tags where the
subfield a data ended in a 
>>> combining diacritic character with no following
character.
>>>
>>> Submitting that data string 
>>>
(indicators+subfieldmark+subfieldcode+data+fieldmark) to
siconvert 
>>> resulted in an output string that did not
contain the diacritic 
>>> character.  It was dropped.  The field mark
character was retained.  
>>> Can you suggest a means for notifying the
caller when this condition 
>>> occurs?  Byte counts don't really work because
UTF8 is one side or 
>>> the other of the conversion transaction.
>>>
>>> The ending diacritic values were:  0xE2, 0xE5,
0xE8, 0xEA, and 0xF6.

I think you need to do is to "flush" reset to the
"initial state". The 
flush would take place after a field or subfield ends.

That's done by iconv and, hopefully, yaz_iconv by setting
inbuf or 
*inbuf to NULL, but outbut to non-NULL, i.e.

yaz_iconv(cd, 0, 0, &outbuf, &outbytesleft);

 From 'man 3 iconv':
"
A different case is when inbuf is NULL or *inbuf is NULL,
but outbuf is
not NULL and *outbuf is not NULL. In this case,  the 
iconv()  function
attempts  to set cd's conversion state to the initial state
and store a
corresponding shift sequence at *outbuf.  At most
*outbytesleft  bytes,
starting at *outbuf, will be written.  If the output buffer
has no more
room for this reset sequence,  it  sets  errno  to  E2BIG 
and  returns
(size_t)(-1).  Otherwise  it  increments  *outbuf  and
decrements *out-
bytesleft by the number of bytes written.
"

Use YAZ 2.1.48 or later for this to work.

/ Adam

>>>
>> Did you use yaz-marcdump for the conversion?
>>
>> Or did you do something else ? (such as programming
towards the siconv 
>> interface)?
>>
>> / Adam
>>
>>> Thanks
>>> Gary
>>>
>>>
_______________________________________________
>>> Yazlist mailing list
>>> Yazlistlists.indexdata.dk
>>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>>
>>
>>
>> _______________________________________________
>> Yazlist mailing list
>> Yazlistlists.indexdata.dk
>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>>
> 
> _______________________________________________
> Yazlist mailing list
> Yazlistlists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

Re: Fields ending in combining diacritics
country flaguser name
United States
2007-03-08 16:25:51
I am not sure how this will help.  In the application, the
last 2 bytes 
of the data string are oxea and 0x1e - the diacritic and the
record 
mark.  yaz_iconv seems to drop the diacritic because it
doesn't have a 
trailing character, but it does process the record mark. 
What I need is 
something that will tell me that this case has occurred.  It
looks to me 
like yaz just drops the diacritic.

My checking indicates that on completion of conversion of
the record 
mark, the yaz_iconv library is left in its 'initial state'. 
The next 
string converts just fine.
Gary

Adam Dickmeiss wrote:

> Gary Anderson wrote:
>
>> I am using the siconv interface.  I have a
programmatic process that 
>> deals with very large files of records.
>>
>> Adam Dickmeiss wrote:
>>
>>> Gary Anderson wrote:
>>>
>>>> I recently ran some tests using records
from the National Library 
>>>> of Canada.  Of the 600,000+ records in
their name and subject 
>>>> authority file, six records had 670 tags
where the subfield a data 
>>>> ended in a combining diacritic character
with no following character.
>>>>
>>>> Submitting that data string 
>>>>
(indicators+subfieldmark+subfieldcode+data+fieldmark) to
siconvert 
>>>> resulted in an output string that did not
contain the diacritic 
>>>> character.  It was dropped.  The field mark
character was 
>>>> retained.  Can you suggest a means for
notifying the caller when 
>>>> this condition occurs?  Byte counts don't
really work because UTF8 
>>>> is one side or the other of the conversion
transaction.
>>>>
>>>> The ending diacritic values were:  0xE2,
0xE5, 0xE8, 0xEA, and 0xF6.
>>>
>
> I think you need to do is to "flush" reset to
the "initial state". The 
> flush would take place after a field or subfield ends.
>
> That's done by iconv and, hopefully, yaz_iconv by
setting inbuf or 
> *inbuf to NULL, but outbut to non-NULL, i.e.
>
> yaz_iconv(cd, 0, 0, &outbuf, &outbytesleft);
>
> From 'man 3 iconv':
> "
> A different case is when inbuf is NULL or *inbuf is
NULL, but outbuf is
> not NULL and *outbuf is not NULL. In this case,  the 
iconv()  function
> attempts  to set cd's conversion state to the initial
state and store a
> corresponding shift sequence at *outbuf.  At most
*outbytesleft  bytes,
> starting at *outbuf, will be written.  If the output
buffer has no more
> room for this reset sequence,  it  sets  errno  to 
E2BIG  and  returns
> (size_t)(-1).  Otherwise  it  increments  *outbuf  and
decrements *out-
> bytesleft by the number of bytes written.
> "
>
> Use YAZ 2.1.48 or later for this to work.
>
> / Adam
>
>>>>
>>> Did you use yaz-marcdump for the conversion?
>>>
>>> Or did you do something else ? (such as
programming towards the 
>>> siconv interface)?
>>>
>>> / Adam
>>>
>>>> Thanks
>>>> Gary
>>>>
>>>>
_______________________________________________
>>>> Yazlist mailing list
>>>> Yazlistlists.indexdata.dk
>>>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>>>
>>>
>>>
>>>
>>>
_______________________________________________
>>> Yazlist mailing list
>>> Yazlistlists.indexdata.dk
>>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>>>
>>
>> _______________________________________________
>> Yazlist mailing list
>> Yazlistlists.indexdata.dk
>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>
>
>
> _______________________________________________
> Yazlist mailing list
> Yazlistlists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

  
[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )