List Info

Thread: Another MARC8 conversion problem.




Another MARC8 conversion problem.
country flaguser name
United States
2007-03-20 20:33:35
I am passing the following UTF8 string (Values are hangul
characters 
given in hex.  Ignore spaces) to the converter:

E8 87 BA  E7 81 A3  E5 9C B0  E5 8D 80  E5 9C 8B  E6 B0 91 
E6 89 80  E5 
BE 97.

YAZ correctly translates this string to (output in MARC8,
hex, ignore 
spaces):

1B  28  42  21 54 2B  21 49 43  21 37 79  21 34 55  21 37 6f
 21 46 4d  
21 3F 75  21 30 6A
esc  $    1

Notice that the ending escape sequence (ESC ( B) was not
appended to 
this string.  It appeared at the beginning of my
next string. 

I'm thinking that the yaz_write_marc8_page_chr module you
sent in the 
patch isn't working, or it needs to be called from somewhere
else.

Gary

_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

  
Re: Another MARC8 conversion problem. - code included
country flaguser name
United States
2007-03-20 21:27:04
Gary Anderson wrote:

> I am passing the following UTF8 string (Values are
hangul characters 
> given in hex.  Ignore spaces) to the converter:
>
> E8 87 BA  E7 81 A3  E5 9C B0  E5 8D 80  E5 9C 8B  E6 B0
91  E6 89 80  
> E5 BE 97.
>
> YAZ correctly translates this string to (output in
MARC8, hex, ignore 
> spaces):
>
> 1B  28  42  21 54 2B  21 49 43  21 37 79  21 34 55  21
37 6f  21 46 
> 4d  21 3F 75  21 30 6A
> esc  $    1
>
> Notice that the ending escape sequence (ESC ( B) was
not appended to 
> this string.  It appeared at the beginning of my
> next string.
> I'm thinking that the yaz_write_marc8_page_chr module
you sent in the 
> patch isn't working, or it needs to be called from
somewhere else.
>
> Gary
>
>_______________________________________________
>Yazlist mailing list
>Yazlistlists.indexdata.dk
>http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>  
>
The code I have in siconv.c after applying the patch is:

// This added as a result of a patch sent Mar. 16, 2007 to
handle 
closing ESC sequences

static size_t yaz_write_marc8_page_chr(yaz_iconv_t cd,

                                       char **outbuf, size_t
*outbytesleft,

                                       const char
*page_chr)

{
    const char *old_page_chr = cd->write_marc8_page_chr;
    if (strcmp(page_chr, old_page_chr))
    {
        size_t plen = 0;
        const char *page_out = page_chr;
        if (*outbytesleft < 8)
        {
            cd->my_errno = YAZ_ICONV_E2BIG;
            return (size_t) (-1);
        }
        cd->write_marc8_page_chr = page_chr;
        if (!strcmp(old_page_chr, "33p")
            || !strcmp(old_page_chr, "33g")
            || !strcmp(old_page_chr, "33b"))
        {
            /* Technique 1 leave */
            page_out = "33s";
            if (strcmp(page_chr, "33(B")) /* Not
going ASCII page? */
            {
                /* Must leave script + enter new page */
                plen = strlen(page_out);
                memcpy(*outbuf, page_out, plen);
                (*outbuf) += plen;
                (*outbytesleft) -= plen;
                page_out = page_chr;
            }
        }
        plen = strlen(page_out);
        memcpy(*outbuf, page_out, plen);
        (*outbuf) += plen;
        (*outbytesleft) -= plen;
    }
    return 0;
}

static size_t yaz_write_marc8_2(yaz_iconv_t cd, unsigned
long x,
                                char **outbuf, size_t
*outbytesleft,
                                int last)
{
    int comb = 0;
    const char *page_chr = 0;
    unsigned long y = lookup_marc8(cd, x, &comb,
&page_chr);
    if (!y)
        return (size_t) (-1);
    if (comb)
    {
        if (x == 0x0361)
            cd->write_marc8_second_half_char = 0xEC;
        else if (x == 0x0360)
            cd->write_marc8_second_half_char = 0xFB;
        if (cd->write_marc8_comb_no < 6)
           
cd->write_marc8_comb_ch[cd->write_marc8_comb_no++] =
y;
    }
    else
    {
        size_t r = flush_combos(cd, outbuf, outbytesleft);
        if (r)
            return r;
        r = yaz_write_marc8_page_chr(cd, outbuf,
outbytesleft, page_chr);
        if (r)
            return r;
        cd->write_marc8_last = y;
    }
    if (last)
    {
        size_t r = flush_combos(cd, outbuf, outbytesleft);
        if (r)
        {
            if (comb)
                cd->write_marc8_comb_no--;
            else
                cd->write_marc8_last = 0;
            return r;
        }
    }
    return 0;
}


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

  
Re: Another MARC8 conversion problem.
country flaguser name
Denmark
2007-03-21 03:51:39
Gary Anderson wrote:
> I am passing the following UTF8 string (Values are
hangul characters 
> given in hex.  Ignore spaces) to the converter:
> 
> E8 87 BA  E7 81 A3  E5 9C B0  E5 8D 80  E5 9C 8B  E6 B0
91  E6 89 80  E5 
> BE 97.
> 
> YAZ correctly translates this string to (output in
MARC8, hex, ignore 
> spaces):
> 
> 1B  28  42  21 54 2B  21 49 43  21 37 79  21 34 55  21
37 6f  21 46 4d  
> 21 3F 75  21 30 6A
> esc  $    1
> 
> Notice that the ending escape sequence (ESC ( B) was
not appended to 
> this string.  It appeared at the beginning of my
> next string.

How did you test this? With yaz-iconv?

A call to
  yaz_iconv(cd, 0, 0, &outp, &outbytesleft);

will set the conversion to the inital state and generate the
ESC(B .

I can tell you this: yaz-iconv did not do it . And that's a
mistake.

> I'm thinking that the yaz_write_marc8_page_chr module
you sent in the 
> patch isn't working, or it needs to be called from
somewhere else.

Yesterday major changes to siconv.c were made. The new code
is simpler, 
IMHO. I really suggest you check YAZ out via CVS. One thing
you'll 
notice is that the last parameter is gone.

The yaz_flush_marc8, yaz_flush_ISO8859_1 does the flushing..
And are 
called when yaz_iconv(cd, 0,0, &outp,
&outbytesleft), is used.

You may ask: why this flushing? And why get rid of the last
parameter?

The last parameter was set(to 1) when for the last
byte/character in a 
call to yaz_iconv  (with inbuf != 0). Problem is that it may
not be the 
last of the whole input byte sequence.

The last is a problematic. Conversion of (large) files
require multiple 
calls to iconv anyway with chunks of input. Not necessarily
complete 
input sequences.. We must therefore flush in the end
anyway.

More importantly: we want yaz_iconv to have iconv
semantics.

See:
http://www.gnu.org/software/
libc/manual/html_node/iconv-Examples.html#iconv-Examples


In case of MARC we want each field data to self-contained.
And hence to 
  ensure this, we flush for each field data. For YAZ' MARC
utility 
that's done in marc_iconv_reset (src/marcdisp.c).

/ Adam

> Gary
> 
> _______________________________________________
> Yazlist mailing list
> Yazlistlists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

Re: Another MARC8 conversion problem.
country flaguser name
United States
2007-03-21 12:20:35
Can you point me to a URL for the CVS? 

Adam Dickmeiss wrote:

> Gary Anderson wrote:
>
>> I am passing the following UTF8 string (Values are
hangul characters 
>> given in hex.  Ignore spaces) to the converter:
>>
>> E8 87 BA  E7 81 A3  E5 9C B0  E5 8D 80  E5 9C 8B 
E6 B0 91  E6 89 80  
>> E5 BE 97.
>>
>> YAZ correctly translates this string to (output in
MARC8, hex, ignore 
>> spaces):
>>
>> 1B  28  42  21 54 2B  21 49 43  21 37 79  21 34 55 
21 37 6f  21 46 
>> 4d  21 3F 75  21 30 6A
>> esc  $    1
>>
>> Notice that the ending escape sequence (ESC ( B)
was not appended to 
>> this string.  It appeared at the beginning of my
>> next string.
>
>
> How did you test this? With yaz-iconv?
>
> A call to
>  yaz_iconv(cd, 0, 0, &outp, &outbytesleft);
>
> will set the conversion to the inital state and
generate the ESC(B .
>
> I can tell you this: yaz-iconv did not do it . And
that's a mistake.
>
>> I'm thinking that the yaz_write_marc8_page_chr
module you sent in the 
>> patch isn't working, or it needs to be called from
somewhere else.
>
>
> Yesterday major changes to siconv.c were made. The new
code is 
> simpler, IMHO. I really suggest you check YAZ out via
CVS. One thing 
> you'll notice is that the last parameter is gone.
>
> The yaz_flush_marc8, yaz_flush_ISO8859_1 does the
flushing.. And are 
> called when yaz_iconv(cd, 0,0, &outp,
&outbytesleft), is used.
>
> You may ask: why this flushing? And why get rid of the
last parameter?
>
> The last parameter was set(to 1) when for the last
byte/character in a 
> call to yaz_iconv  (with inbuf != 0). Problem is that
it may not be 
> the last of the whole input byte sequence.
>
> The last is a problematic. Conversion of (large) files
require 
> multiple calls to iconv anyway with chunks of input.
Not necessarily 
> complete input sequences.. We must therefore flush in
the end anyway.
>
> More importantly: we want yaz_iconv to have iconv
semantics.
>
> See:
> http://www.gnu.org/software/
libc/manual/html_node/iconv-Examples.html#iconv-Examples

>
>
> In case of MARC we want each field data to
self-contained. And hence 
> to  ensure this, we flush for each field data. For YAZ'
MARC utility 
> that's done in marc_iconv_reset (src/marcdisp.c).
>
> / Adam
>
>> Gary
>>
>> _______________________________________________
>> Yazlist mailing list
>> Yazlistlists.indexdata.dk
>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>
>
>
> _______________________________________________
> Yazlist mailing list
> Yazlistlists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

  
Re: Another MARC8 conversion problem.
country flaguser name
Denmark
2007-03-21 12:47:50
Gary Anderson wrote:
> Can you point me to a URL for the CVS?

CVSROOT=:pserver:cvscvs.indexdata.dk:/cvs
password is anonymous
cvs co yaz

/ Adam

> Adam Dickmeiss wrote:
> 
>> Gary Anderson wrote:
>>
>>> I am passing the following UTF8 string (Values
are hangul characters 
>>> given in hex.  Ignore spaces) to the
converter:
>>>
>>> E8 87 BA  E7 81 A3  E5 9C B0  E5 8D 80  E5 9C
8B  E6 B0 91  E6 89 80  
>>> E5 BE 97.
>>>
>>> YAZ correctly translates this string to (output
in MARC8, hex, ignore 
>>> spaces):
>>>
>>> 1B  28  42  21 54 2B  21 49 43  21 37 79  21 34
55  21 37 6f  21 46 
>>> 4d  21 3F 75  21 30 6A
>>> esc  $    1
>>>
>>> Notice that the ending escape sequence (ESC (
B) was not appended to 
>>> this string.  It appeared at the beginning of
my
>>> next string.
>>
>>
>> How did you test this? With yaz-iconv?
>>
>> A call to
>>  yaz_iconv(cd, 0, 0, &outp,
&outbytesleft);
>>
>> will set the conversion to the inital state and
generate the ESC(B .
>>
>> I can tell you this: yaz-iconv did not do it . And
that's a mistake.
>>
>>> I'm thinking that the yaz_write_marc8_page_chr
module you sent in the 
>>> patch isn't working, or it needs to be called
from somewhere else.
>>
>>
>> Yesterday major changes to siconv.c were made. The
new code is 
>> simpler, IMHO. I really suggest you check YAZ out
via CVS. One thing 
>> you'll notice is that the last parameter is gone.
>>
>> The yaz_flush_marc8, yaz_flush_ISO8859_1 does the
flushing.. And are 
>> called when yaz_iconv(cd, 0,0, &outp,
&outbytesleft), is used.
>>
>> You may ask: why this flushing? And why get rid of
the last parameter?
>>
>> The last parameter was set(to 1) when for the last
byte/character in a 
>> call to yaz_iconv  (with inbuf != 0). Problem is
that it may not be 
>> the last of the whole input byte sequence.
>>
>> The last is a problematic. Conversion of (large)
files require 
>> multiple calls to iconv anyway with chunks of
input. Not necessarily 
>> complete input sequences.. We must therefore flush
in the end anyway.
>>
>> More importantly: we want yaz_iconv to have iconv
semantics.
>>
>> See:
>> http://www.gnu.org/software/
libc/manual/html_node/iconv-Examples.html#iconv-Examples

>>
>>
>> In case of MARC we want each field data to
self-contained. And hence 
>> to  ensure this, we flush for each field data. For
YAZ' MARC utility 
>> that's done in marc_iconv_reset (src/marcdisp.c).
>>
>> / Adam
>>
>>> Gary
>>>
>>>
_______________________________________________
>>> Yazlist mailing list
>>> Yazlistlists.indexdata.dk
>>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>>
>>
>>
>> _______________________________________________
>> Yazlist mailing list
>> Yazlistlists.indexdata.dk
>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>>
> 
> _______________________________________________
> Yazlist mailing list
> Yazlistlists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )