List Info

Thread: Error in latest SICONV.C




Error in latest SICONV.C
country flaguser name
United States
2007-03-21 20:05:51
Adam,

I've run through the latest siconv.c from your cvs
repository.  It now 
has additional problems with converting UTF8 to MARC8.  I
ran a 
simplified test.  The input string was (hex bytes, ignore
spaces):

61 E8 87 BA  E7 81 A3  E5 BE 97

The output string was:

61 1B 24 31 21 54 2B 21 49 43

Notice that only the first two hangul triples were converted
and 
output.  The ending ESC string was also not output.  This
happened 
because the normal flow of the logic does NOT call your
flush function.  
In fact in siconv.c version $Id: siconv.c,v 1.37 2007/03/20
21:37:32 
adam Exp $, yaz_iconv exits at the point shown below.

    while (1)
    {
        unsigned long x;
        size_t no_read;

        if (cd->unget_x)
        {
            x = cd->unget_x;
            no_read = cd->no_read_x;
        }
        else
        {
            if (*inbytesleft == 0)
            {
                r = *inbuf - inbuf0;
                break;    <-------------------This is the
exit point.
            }

I think what you really meant to do was this:
    while (1)
    {
        unsigned long x;
        size_t no_read;

        if (cd->unget_x)
        {
            x = cd->unget_x;
            no_read = cd->no_read_x;
        }
        else
        {
            if (*inbytesleft == 0)
            {
               if (cd->flush_handle)     // Proposed fix
to correctly 
clear the buffers
                   r = (*cd->flush_handle)(cd, outbuf,
outbytesleft);  
// Proposed fix to correctly clear the buffers.
                r = *inbuf - inbuf0;
                break;    <-------------------This is the
exit point.
            }

Anyway, I tried the code with this fix, and it works
properly.  Is this 
the correct fix, or should this problem be fixed another
way?

Gary


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

  
Re: Error in latest SICONV.C
country flaguser name
Denmark
2007-03-22 03:04:54
Gary Anderson wrote:
> Adam,
> 
> I've run through the latest siconv.c from your cvs
repository.  It now 
> has additional problems with converting UTF8 to MARC8. 
I ran a 
> simplified test.  The input string was (hex bytes,
ignore spaces):
> 
> 61 E8 87 BA  E7 81 A3  E5 BE 97
> 
> The output string was:
> 
> 61 1B 24 31 21 54 2B 21 49 43

That might very well be with your test code, which I
unfortunately don't 
know anything about.

But I think I can guess what goes on: you did not flush the
output 
buffers with yaz_iconv(cd, 0,0, ..) .

Why is _separate_ flushing a good idea? What can't we call
iconv once?

Suppose we did not have it (implicit flushing for every call
of iconv) 
and we want to convert a file. We read chunks from disk (say
2K at a 
time). We call iconv. iconv flushes every time. But when it
flushes it 
assumes that the input is complete. Sometimes it ain't.
There will be 
input sequences "crossing" block boundaries. And
so we'll end up with 
errors. So we just read the whole in memory? No? Can mmap do
it?

The opposite is much more appealing . We read chunks of
almost size we 
want. We don't care about the input.  We don't *know* the
input. So we 
call iconv for each chunk . It's a streaming process. At EOF
we flush.

Of course, if it's a small and *complete* buffer in memory
it might be 
ackward.. We'd have two calls instead of one. But I am sure
you can make 
a wrapper yaz_iconv_all(cd, a, b, c, d); which calls
yaz_iconv(cd, a, b, 
c, d) +     yaz_iconv(cd, 0, 0, c, d);
This is what your patch effectively does , I think.

/ Adam

> Notice that only the first two hangul triples were
converted and 
> output.  The ending ESC string was also not output. 
This happened 
> because the normal flow of the logic does NOT call your
flush function.  
> In fact in siconv.c version $Id: siconv.c,v 1.37
2007/03/20 21:37:32 
> adam Exp $, yaz_iconv exits at the point shown below.
> 
>    while (1)
>    {
>        unsigned long x;
>        size_t no_read;
> 
>        if (cd->unget_x)
>        {
>            x = cd->unget_x;
>            no_read = cd->no_read_x;
>        }
>        else
>        {
>            if (*inbytesleft == 0)
>            {
>                r = *inbuf - inbuf0;
>                break;    <-------------------This is
the exit point.
>            }
> 
> I think what you really meant to do was this:
>    while (1)
>    {
>        unsigned long x;
>        size_t no_read;
> 
>        if (cd->unget_x)
>        {
>            x = cd->unget_x;
>            no_read = cd->no_read_x;
>        }
>        else
>        {
>            if (*inbytesleft == 0)
>            {
>               if (cd->flush_handle)     // Proposed
fix to correctly 
> clear the buffers
>                   r = (*cd->flush_handle)(cd,
outbuf, outbytesleft);  // 
> Proposed fix to correctly clear the buffers.
>                r = *inbuf - inbuf0;
>                break;    <-------------------This is
the exit point.
>            }
> 
> Anyway, I tried the code with this fix, and it works
properly.  Is this 
> the correct fix, or should this problem be fixed
another way?
> 
> Gary
> 
> _______________________________________________
> Yazlist mailing list
> Yazlistlists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

Re: Error in latest SICONV.C
country flaguser name
United States
2007-03-22 12:37:19
This is a different philosophy that I picked up from your
earlier 
emails.  Those indicated that I shouldn't be submitting
subfield marks 
and field marks to the converter.  The MARC-21 standard
specifies that 
fixed field and subfield strings are self-contained and ESC
sequences in 
MARC8 encoding don't cross field or subfield boundaries. 
That is why I 
suggested the fix the way I did.  Now you seem to be saying
that we can 
and should run the data in whatever size chunks we want
until an EOF is 
reached.  So, rather than relying on the library to give me
back 
self-contained data in conversions to and from MARC8, I have
two 
choices.  First, I can append a period to the end of every
string I 
convert and then kill it when the string comes back (that
works, by the 
way) or I can make a convert call then another convert call
to flush the 
buffers.  That seems a little more cumbersome and time
consuming.

Gary

Adam Dickmeiss wrote:

> Gary Anderson wrote:
>
>> Adam,
>>
>> I've run through the latest siconv.c from your cvs
repository.  It 
>> now has additional problems with converting UTF8 to
MARC8.  I ran a 
>> simplified test.  The input string was (hex bytes,
ignore spaces):
>>
>> 61 E8 87 BA  E7 81 A3  E5 BE 97
>>
>> The output string was:
>>
>> 61 1B 24 31 21 54 2B 21 49 43
>
>
> That might very well be with your test code, which I
unfortunately 
> don't know anything about.
>
> But I think I can guess what goes on: you did not flush
the output 
> buffers with yaz_iconv(cd, 0,0, ..) .
>
> Why is _separate_ flushing a good idea? What can't we
call iconv once?
>
> Suppose we did not have it (implicit flushing for every
call of iconv) 
> and we want to convert a file. We read chunks from disk
(say 2K at a 
> time). We call iconv. iconv flushes every time. But
when it flushes it 
> assumes that the input is complete. Sometimes it ain't.
There will be 
> input sequences "crossing" block boundaries.
And so we'll end up with 
> errors. So we just read the whole in memory? No? Can
mmap do it?
>
> The opposite is much more appealing . We read chunks of
almost size we 
> want. We don't care about the input.  We don't *know*
the input. So we 
> call iconv for each chunk . It's a streaming process.
At EOF we flush.
>
> Of course, if it's a small and *complete* buffer in
memory it might be 
> ackward.. We'd have two calls instead of one. But I am
sure you can 
> make a wrapper yaz_iconv_all(cd, a, b, c, d); which
calls 
> yaz_iconv(cd, a, b, c, d) +     yaz_iconv(cd, 0, 0, c,
d);
> This is what your patch effectively does , I think.
>
> / Adam
>
>> Notice that only the first two hangul triples were
converted and 
>> output.  The ending ESC string was also not output.
 This happened 
>> because the normal flow of the logic does NOT call
your flush 
>> function.  In fact in siconv.c version $Id:
siconv.c,v 1.37 
>> 2007/03/20 21:37:32 adam Exp $, yaz_iconv exits at
the point shown 
>> below.
>>
>>    while (1)
>>    {
>>        unsigned long x;
>>        size_t no_read;
>>
>>        if (cd->unget_x)
>>        {
>>            x = cd->unget_x;
>>            no_read = cd->no_read_x;
>>        }
>>        else
>>        {
>>            if (*inbytesleft == 0)
>>            {
>>                r = *inbuf - inbuf0;
>>                break;   
<-------------------This is the exit point.
>>            }
>>
>> I think what you really meant to do was this:
>>    while (1)
>>    {
>>        unsigned long x;
>>        size_t no_read;
>>
>>        if (cd->unget_x)
>>        {
>>            x = cd->unget_x;
>>            no_read = cd->no_read_x;
>>        }
>>        else
>>        {
>>            if (*inbytesleft == 0)
>>            {
>>               if (cd->flush_handle)     //
Proposed fix to correctly 
>> clear the buffers
>>                   r = (*cd->flush_handle)(cd,
outbuf, outbytesleft);  
>> // Proposed fix to correctly clear the buffers.
>>                r = *inbuf - inbuf0;
>>                break;   
<-------------------This is the exit point.
>>            }
>>
>> Anyway, I tried the code with this fix, and it
works properly.  Is 
>> this the correct fix, or should this problem be
fixed another way?
>>
>> Gary
>>
>> _______________________________________________
>> Yazlist mailing list
>> Yazlistlists.indexdata.dk
>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>
>
>
> _______________________________________________
> Yazlist mailing list
> Yazlistlists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

  
Re: Error in latest SICONV.C
country flaguser name
Denmark
2007-03-22 13:10:45
Gary Anderson wrote:
> This is a different philosophy that I picked up from
your earlier 
> emails.  Those indicated that I shouldn't be submitting
subfield marks 
> and field marks to the converter.  The MARC-21 standard
specifies that 
> fixed field and subfield strings are self-contained and
ESC sequences in 
> MARC8 encoding don't cross field or subfield
boundaries.  That is why I 
> suggested the fix the way I did.  Now you seem to be
saying that we can 
> and should run the data in whatever size chunks we want
until an EOF is 
EOF would be end of data field too.
> reached.  So, rather than relying on the library to
give me back 
> self-contained data in conversions to and from MARC8, I
have two 
> choices.  First, I can append a period to the end of
every string I 
> convert and then kill it when the string comes back
(that works, by the 
> way) or I can make a convert call then another convert
call to flush the 
> buffers.  That seems a little more cumbersome and time
consuming.
If two calls to iconv bothers you, make a wrapper or just
change the 
source of YAZ.

/ Adam

> 
> Gary
> 
> Adam Dickmeiss wrote:
> 
>> Gary Anderson wrote:
>>
>>> Adam,
>>>
>>> I've run through the latest siconv.c from your
cvs repository.  It 
>>> now has additional problems with converting
UTF8 to MARC8.  I ran a 
>>> simplified test.  The input string was (hex
bytes, ignore spaces):
>>>
>>> 61 E8 87 BA  E7 81 A3  E5 BE 97
>>>
>>> The output string was:
>>>
>>> 61 1B 24 31 21 54 2B 21 49 43
>>
>>
>> That might very well be with your test code, which
I unfortunately 
>> don't know anything about.
>>
>> But I think I can guess what goes on: you did not
flush the output 
>> buffers with yaz_iconv(cd, 0,0, ..) .
>>
>> Why is _separate_ flushing a good idea? What can't
we call iconv once?
>>
>> Suppose we did not have it (implicit flushing for
every call of iconv) 
>> and we want to convert a file. We read chunks from
disk (say 2K at a 
>> time). We call iconv. iconv flushes every time. But
when it flushes it 
>> assumes that the input is complete. Sometimes it
ain't. There will be 
>> input sequences "crossing" block
boundaries. And so we'll end up with 
>> errors. So we just read the whole in memory? No?
Can mmap do it?
>>
>> The opposite is much more appealing . We read
chunks of almost size we 
>> want. We don't care about the input.  We don't
*know* the input. So we 
>> call iconv for each chunk . It's a streaming
process. At EOF we flush.
>>
>> Of course, if it's a small and *complete* buffer in
memory it might be 
>> ackward.. We'd have two calls instead of one. But I
am sure you can 
>> make a wrapper yaz_iconv_all(cd, a, b, c, d); which
calls 
>> yaz_iconv(cd, a, b, c, d) +     yaz_iconv(cd, 0, 0,
c, d);
>> This is what your patch effectively does , I
think.
>>
>> / Adam
>>
>>> Notice that only the first two hangul triples
were converted and 
>>> output.  The ending ESC string was also not
output.  This happened 
>>> because the normal flow of the logic does NOT
call your flush 
>>> function.  In fact in siconv.c version $Id:
siconv.c,v 1.37 
>>> 2007/03/20 21:37:32 adam Exp $, yaz_iconv exits
at the point shown 
>>> below.
>>>
>>>    while (1)
>>>    {
>>>        unsigned long x;
>>>        size_t no_read;
>>>
>>>        if (cd->unget_x)
>>>        {
>>>            x = cd->unget_x;
>>>            no_read = cd->no_read_x;
>>>        }
>>>        else
>>>        {
>>>            if (*inbytesleft == 0)
>>>            {
>>>                r = *inbuf - inbuf0;
>>>                break;   
<-------------------This is the exit point.
>>>            }
>>>
>>> I think what you really meant to do was this:
>>>    while (1)
>>>    {
>>>        unsigned long x;
>>>        size_t no_read;
>>>
>>>        if (cd->unget_x)
>>>        {
>>>            x = cd->unget_x;
>>>            no_read = cd->no_read_x;
>>>        }
>>>        else
>>>        {
>>>            if (*inbytesleft == 0)
>>>            {
>>>               if (cd->flush_handle)     //
Proposed fix to correctly 
>>> clear the buffers
>>>                   r =
(*cd->flush_handle)(cd, outbuf, outbytesleft);  
>>> // Proposed fix to correctly clear the
buffers.
>>>                r = *inbuf - inbuf0;
>>>                break;   
<-------------------This is the exit point.
>>>            }
>>>
>>> Anyway, I tried the code with this fix, and it
works properly.  Is 
>>> this the correct fix, or should this problem be
fixed another way?
>>>
>>> Gary
>>>
>>>
_______________________________________________
>>> Yazlist mailing list
>>> Yazlistlists.indexdata.dk
>>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>>
>>
>>
>> _______________________________________________
>> Yazlist mailing list
>> Yazlistlists.indexdata.dk
>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>>
> 
> _______________________________________________
> Yazlist mailing list
> Yazlistlists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )