On 9/22/07, Tels <nospam-abuse bloodgate.com> wrote:
> Moin,
>
> On Friday 21 September 2007 23:56:56 demerphq wrote:
> > On 9/21/07, demerphq <demerphq gmail.com> wrote:
> > > But we need to make sure this is fixed before
5.10 is released.
> >
> > Just to expand on this, somewhere in or around the
make_trie code is
> > some logic that turns on a bit in a bit vector for
every start byte in
> > the trie. In the branch for handling non unicode
data it needs to do
> > something like the following pseudo code.
> >
> > /* store first byte of utf8 representation of
codepoints in the 127 <
> > cp < 256 range */
> > if (127 < cp && cp < 192) {
> > SETBIT(CHARCLASS,194)
> > } else if (191 < cp && cp < 256) {
> > SETBIT(CHARCLASS,195)
> > }
>
> Neither SETBIT nor "vector" appear in the
source. In the end greppign
> for "bitfield" leads to line 1392 which looks
like:
>
> if ( set_bit ) /* bitmap only alloced when
!(UTF&&Folding) */
> TRIE_BITMAP_SET(trie,*uc); /* store the raw
first byte
> regardless of
encoding */
>
> for ( ; uc < e ; uc += len ) {
> TRIE_CHARCOUNT(trie)++;
> TRIE_READ_CHAR;
> chars++;
> if ( uvc < 256 ) {
> if ( !trie->charmap[ uvc ] ) {
> trie->charmap[ uvc ]=(
++trie->uniquecharcount );
> if ( folder )
> trie->charmap[ folder[ uvc ]
] = trie->charmap[
> uvc ];
> TRIE_STORE_REVCHAR;
> }
> if ( set_bit ) {
> /* store the codepoint in the
bitmap, and if its ascii
> also store its folded
equivelent. */
> TRIE_BITMAP_SET(trie,uvc);
> if ( folder )
TRIE_BITMAP_SET(trie,folder[ uvc ]);
Right there. The line that says
if ( folder )
TRIE_BITMAP_SET(trie,folder[ uvc ]);
should probably read
if ( folder ) { /* folder only true
when
pattern is not utf8 */
TRIE_BITMAP_SET(trie,folder[ uvc
]); /*
store the folded codepoint */
/* store first byte of utf8
representation of
codepoints in the 127 < uvc
< 256 range */
if (127 < uvc && uvc
< 192) {
TRIE_BITMAP_SET(trie,194)
} else if (191 < uvc ) { /*
&& uvc < 256 --
we know uvc is < 256 already */
TRIE_BITMAP_SET(trie,195)
}
}
> set_bit = 0; /* We've done our bit
*/
> }
> } else {
> SV** svpp;
> if ( !widecharmap )
> widecharmap = newHV();
>
> svpp = hv_fetch( widecharmap,
(char*)&uvc, sizeof( UV ),
> 1 );
>
> if ( !svpp )
> Perl_croak( aTHX_ "error
creating/fetching widecharmap
> entry for 0x%"UVXf, uvc );
>
> if ( !SvTRUE( *svpp ) ) {
> sv_setiv( *svpp,
++trie->uniquecharcount );
> TRIE_STORE_REVCHAR;
> }
> }
>
>
> and I believe in the first branch the modification
needs to be done.
> However, I am not sure what to insert where.
Thanks a lot for digging that out, its exactly what i needed
to see.
Can you try the code as Ive indicated above and let me know
if it
solves the problem?
Cheers,
Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"
|