List Info

Thread: Perl 5.8.3 requirement




Perl 5.8.3 requirement
country flaguser name
United States
2007-02-22 11:40:50

How real is the Perl 5.8.3 requirement, and why is it in
place?

I'd like to use KinoSearch on a couple of our boxes that
still run redhat 3, 
with Perl 5.8.0, and I'm wondering what the risks are, or if
I can avoid 
certain features to make this possible.

Thanks,
-Miles



_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


Re: Perl 5.8.3 requirement
country flaguser name
United States
2007-02-22 12:21:07

Miles Crawford scribbled on 2/22/07 11:40 AM:
> 
> 
> How real is the Perl 5.8.3 requirement, and why is it
in place?
> 
> I'd like to use KinoSearch on a couple of our boxes
that still run 
> redhat 3, with Perl 5.8.0, and I'm wondering what the
risks are, or if I 
> can avoid certain features to make this possible.
> 


iirc, 5.8.3 was the first version to really get UTF-8
working correctly, and 
newer version of KS use UTF-8 internally.

whether you can cheat depends on which version of KS you
use, I expect.

-- 
Peter Karman  .  http://peknet.com/  . 
peterpeknet.com

_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


Re: Perl 5.8.3 requirement
country flaguser name
United States
2007-02-26 12:11:19
On Feb 22, 2007, at 10:21 AM, Peter Karman wrote:

> iirc, 5.8.3 was the first version to really get UTF-8
working  
> correctly, and newer version of KS use UTF-8
internally.

You recall correctly.  

http://www.rectangular.com/pipermail/kinosearch/20
06-November/ 
000527.html

> whether you can cheat depends on which version of KS
you use, I  
> expect.

DO NOT use KS -- especially version 0.20_01 and subsequent
releases  
-- with versions of Perl prior to 5.8.3.  Those Unicode bugs
are  
vicious and very hard to trace.  I do not intend to squander
any  
development time working hard to track down what appear to
be bugs in  
KS but actually turn out to be bugs in Perl's Unicode
handling, now  
fixed.

Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/



_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


Re: Perl 5.8.3 requirement
country flaguser name
United States
2007-02-26 12:41:34

On Mon, 26 Feb 2007, Marvin Humphrey wrote:

>
> DO NOT use KS -- especially version 0.20_01 and
subsequent releases -- with 
> versions of Perl prior to 5.8.3.  Those Unicode bugs
are vicious and very

Do not use KS or do not use KS with unicode content?  Does
just the unicode 
fail to work or are there further-reaching consequences?

-Miles

_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


Re: Perl 5.8.3 requirement
country flaguser name
United States
2007-02-26 13:28:19
On Feb 26, 2007, at 10:41 AM, Miles Crawford wrote:

>> DO NOT use KS -- especially version 0.20_01 and
subsequent  
>> releases -- with versions of Perl prior to 5.8.3. 
Those Unicode  
>> bugs are vicious and very
>
> Do not use KS or do not use KS with unicode content? 
Does just the  
> unicode fail to work or are there further-reaching
consequences?

Do not use KS.  KS now converts everything to Unicode at the
front  
end, and all text is handled internally as Unicode.  For
instance, in  
InvIndexer->add_doc, there's this:

     for my $field_name ( keys %$doc ) {
         next unless $utf8_fields->{$field_name};
         utf8::upgrade( $doc->{$field_name} );
     }

If you supply Latin-1 text, it will get changed to Unicode
by that  
utf8::upgrade call.  So, no matter what the source material,
KS will  
be vulnerable to Perl's Unicode bugs.

You also get Unicode text back from KS, e.g. from Hits- 
 >fetch_hit_hashref().  However, if the Unicode text
contains no  
characters outside of Latin-1, Perl can convert back and
forth  
transparently and you shouldn't notice anything different.

The bottom line is that Latin-1 source material should work
without  
you having to think about it, but you have to be using Perl
5.8.3 or  
above.

But now, unlike KS prior to 0.20_01, if you want to supply
Unicode  
text you've prepared yourself, things will work. 

    # invindexer.plx
    my %doc = (
        content => decode( 'KOI8-R', $source_bytes );
    );
    $invindexer->add_doc( %doc );

    # searcher.cgi
    while ( my $hit = $hits->fetch_hit_hashref ) {
        $_ = encode( 'KOI8-R', $_ ) for values %$hit;
        ...
    }

Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/



_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )