List Info

Thread: Re: Internal error: field_num 111 > max_field_num 6




Re: Internal error: field_num 111 > max_field_num 6
country flaguser name
United States
2007-06-11 16:42:06
Hi Marvin,

First off, thanks for all your great work with KinoSearch!

I'm running into a little problem that sounds a lot like the
once  
posted here:

http://www.rectangular.com/pipermail/kinosea
rch/2006-May/000157.html

Specifics:

	- index has about 15m docs
	- using KinoSearch 0.15
	- running on Debian (Linux oldev4.sea 2.6.18.5-amd64 #2 SMP
Fri Dec  
15 12:21:08 PST 2006 x86_64 GNU/Linux)
	- the error is repeatable; here's the error:

Internal error: field_num 111 > max_field_num 6 at
/site/perl/ 
perl-5.8.8/site_perl/5.8.8/x86_64-linux/KinoSearch/Index/ 
TermInfosReader.pm line 92n 
tKinoSearch::Index::TermInfosReader::_scan_enum 
('KinoSearch::Index::TermInfosReader=HASH(0x93a730)',
'\x\x 
23843') called at
/site/perl/perl-5.8.8/site_perl/5.8.8/x86_64-linux/ 
KinoSearch/Index/TermInfosReader.pm line 64n 
tKinoSearch::Index::TermInfosReader::fetch_term_info 
('KinoSearch::Index::TermInfosReader=HASH(0x93a730)',  
'KinoSearch::Index::Term=HASH(0x235bc20)') called at
/site/perl/ 
perl-5.8.8/site_perl/5.8.8/x86_64-linux/KinoSearch/Index/Seg
Reader.pm  
ETC.

	- the indexer has 7 spec_fields, 1 of which is not indexed
	- the index file has been optimized, and is about 4.1G

I'm going to set K_DEBUG to 1 and see what I can see. Any
thoughts?

Thanks,

Matthew


_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


Re: Internal error: field_num 111 > max_field_num 6
country flaguser name
United States
2007-06-11 18:32:15
One additional bit of info. I'm running this under mod_perl
(latest),  
with the searcher held in class data. I suspect it's a
concurrency  
issue, actually. Any thoughts?

Thanks again,

Matthew



On Jun 11, 2007, at 2:42 PM, Matthew Berk wrote:

> Hi Marvin,
>
> First off, thanks for all your great work with
KinoSearch!
>
> I'm running into a little problem that sounds a lot
like the once  
> posted here:
>
> http://www.rectangular.com/pipermail/kinosea
rch/2006-May/000157.html
>
> Specifics:
>
> 	- index has about 15m docs
> 	- using KinoSearch 0.15
> 	- running on Debian (Linux oldev4.sea 2.6.18.5-amd64
#2 SMP Fri  
> Dec 15 12:21:08 PST 2006 x86_64 GNU/Linux)
> 	- the error is repeatable; here's the error:
>
> Internal error: field_num 111 > max_field_num 6 at
/site/perl/ 
>
perl-5.8.8/site_perl/5.8.8/x86_64-linux/KinoSearch/Index/ 
> TermInfosReader.pm line 92n 
> tKinoSearch::Index::TermInfosReader::_scan_enum 
> ('KinoSearch::Index::TermInfosReader=HASH(0x93a730)',
'\x\x 
> 23843') called at
/site/perl/perl-5.8.8/site_perl/5.8.8/x86_64- 
> linux/KinoSearch/Index/TermInfosReader.pm line 64n 
> tKinoSearch::Index::TermInfosReader::fetch_term_info 
> ('KinoSearch::Index::TermInfosReader=HASH(0x93a730)', 

> 'KinoSearch::Index::Term=HASH(0x235bc20)') called at
/site/perl/ 
>
perl-5.8.8/site_perl/5.8.8/x86_64-linux/KinoSearch/Index/ 
> SegReader.pm ETC.
>
> 	- the indexer has 7 spec_fields, 1 of which is not
indexed
> 	- the index file has been optimized, and is about
4.1G
>
> I'm going to set K_DEBUG to 1 and see what I can see.
Any thoughts?
>
> Thanks,
>
> Matthew
>



Marchex, Inc.
http://www.marchex.com

This email message and any attachments are solely for
intended  
recipients, and may contain information that is privileged
and  
confidential.  If you are not the intended recipient, any  
dissemination, distribution or copying is strictly
prohibited. If you  
believe that you may have received this message in error,
please  
immediately notify the sender by replying to this e-mail
message.



_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


Re: Re: Internal error: field_num 111 > max_field_num 6
country flaguser name
United States
2007-06-11 18:50:55
On Jun 11, 2007, at 4:32 PM, Matthew Berk wrote:

> One additional bit of info. I'm running this under
mod_perl  
> (latest), with the searcher held in class data. I
suspect it's a  
> concurrency issue, actually. Any thoughts?

Yes.  KinoSearch isn't thread safe.

The error message you saw suggests that an input stream has
gotten  
out of sync.  It's at the wrong place in the file and
reading data  
that the module considers insane; there's a sanity check in
place to  
give you that exception and head off the looming segfault.

The error from a year ago was due to a 32-bit bottleneck I'd
missed  
truncating a file pointer.  Since then, I don't think I've
received  
any reports of 0.15 i/o sync errors other than yours.

Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/



_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


RE: Re: Internal error: field_num 111 > max_field_num 6
country flaguser name
United States
2007-06-11 23:32:45
Would a simple locking mechanism (I know you have one for
.2) work to alleviate the problem under mod_perl?

-----Original Message-----
From: kinosearch-bounces+matthew=openlist.comrectangular.com on behalf of Marvin Humphrey
Sent: Mon 6/11/2007 4:50 PM
To: KinoSearch discussion forum
Subject: Re: [KinoSearch] Re: Internal error: field_num 111
> max_field_num 6
 

On Jun 11, 2007, at 4:32 PM, Matthew Berk wrote:

> One additional bit of info. I'm running this under
mod_perl  
> (latest), with the searcher held in class data. I
suspect it's a  
> concurrency issue, actually. Any thoughts?

Yes.  KinoSearch isn't thread safe.

The error message you saw suggests that an input stream has
gotten  
out of sync.  It's at the wrong place in the file and
reading data  
that the module considers insane; there's a sanity check in
place to  
give you that exception and head off the looming segfault.

The error from a year ago was due to a 32-bit bottleneck I'd
missed  
truncating a file pointer.  Since then, I don't think I've
received  
any reports of 0.15 i/o sync errors other than yours.


Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/



_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch



_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


  
Re: Re: Internal error: field_num 111 > max_field_num 6
country flaguser name
United States
2007-06-12 09:53:14
On Jun 11, 2007, at 9:32 PM, Matthew Berk wrote:

> Would a simple locking mechanism (I know you have one
for .2) work  
> to alleviate the problem under mod_perl?

Serializing access, say by forcing each complete search into
a  
subroutine and preventing everybody else from using the
shared  
resources until the call returns... I think that would solve
the  
concurrency issue.

However, you're going to get memory errors, because when one
of  
KinoSearch's C struct objects gets copied to multiple
threads, the  
internal refcount doesn't get incremented.  (KS objects keep
their  
own refcounts, distinct from Perl's SvREFCNT.)  When the
first of  
several Perl wrapper objects referencing the C object gets
DESTROYed,  
the KS refcount will fall to 0 and the C object will clean
itself  
up.  From now on, any other Perl wrapper objects referencing
that C  
struct are pointing at freed memory.  That's Bad.

A script which segfaults to illustrate the problem is
below.

Solving this is hard.  Incrementing the KS refcounts of
objects  
copied when Perl spawns a thread is a PITA, because CLONE
gets  
invoked as a *package* method.  You have to maintain global 

registries keeping track of all live object references, then
iterate  
over all the items in the registries when CLONE tips you off
that a  
thread is starting.

Maintaining such a scheme is messy, labor-intensive and
error-prone  
-- and that's the biggest reason among several why KS
doesn't do  
threads.

Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/

#----------------------------------------------------------

#!/usr/bin/perl
use strict;
use warnings;

use KinoSearch::Store::FSFolder;
use Carp qw( cluck );
use File::Path qw( rmtree );

sub warn_destroy {
     my $self = shift;
     cluck "Destroying";
     KinoSearch::Util::Obj:ESTROY($
self);
}

{
     no warnings 'once';
     *KinoSearch::Store::FSFolder:ESTROY =
*warn_destroy;
}

use threads;

rmtree 'foofolder';
mkdir 'foofolder' or die $!;

my $folder = KinoSearch::Store::FSFolder->new(
     path => 'foofolder',
);
warn $folder;

for my $num ( 0 .. 10 ) {
     my $thread = threads->create( sub {
         $folder->refcount_inc;
         my $out =
$folder->open_outstream("file_$num");
     });
     $thread->join;
}





_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


RE: Re: Internal error: field_num 111 > max_field_num 6
country flaguser name
United States
2007-06-12 10:14:05
Does this problem also hold under MP, where each use is a
separae proc? I take it the answer is yes, because child
processes get copies of class data from the parent process.
Any other recommendations for safely running this under MP?

Thanks!

Matthew



-----Original Message-----
From: kinosearch-bounces+matthew=openlist.comrectangular.com on behalf of Marvin Humphrey
Sent: Tue 6/12/2007 7:53 AM
To: KinoSearch discussion forum
Subject: Re: [KinoSearch] Re: Internal error: field_num 111
> max_field_num 6
 

On Jun 11, 2007, at 9:32 PM, Matthew Berk wrote:

> Would a simple locking mechanism (I know you have one
for .2) work  
> to alleviate the problem under mod_perl?

Serializing access, say by forcing each complete search into
a  
subroutine and preventing everybody else from using the
shared  
resources until the call returns... I think that would solve
the  
concurrency issue.

However, you're going to get memory errors, because when one
of  
KinoSearch's C struct objects gets copied to multiple
threads, the  
internal refcount doesn't get incremented.  (KS objects keep
their  
own refcounts, distinct from Perl's SvREFCNT.)  When the
first of  
several Perl wrapper objects referencing the C object gets
DESTROYed,  
the KS refcount will fall to 0 and the C object will clean
itself  
up.  From now on, any other Perl wrapper objects referencing
that C  
struct are pointing at freed memory.  That's Bad.

A script which segfaults to illustrate the problem is
below.

Solving this is hard.  Incrementing the KS refcounts of
objects  
copied when Perl spawns a thread is a PITA, because CLONE
gets  
invoked as a *package* method.  You have to maintain global 

registries keeping track of all live object references, then
iterate  
over all the items in the registries when CLONE tips you off
that a  
thread is starting.

Maintaining such a scheme is messy, labor-intensive and
error-prone  
-- and that's the biggest reason among several why KS
doesn't do  
threads.

Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/

#----------------------------------------------------------

#!/usr/bin/perl
use strict;
use warnings;

use KinoSearch::Store::FSFolder;
use Carp qw( cluck );
use File::Path qw( rmtree );

sub warn_destroy {
     my $self = shift;
     cluck "Destroying";
     KinoSearch::Util::Obj:ESTROY($
self);
}

{
     no warnings 'once';
     *KinoSearch::Store::FSFolder:ESTROY =
*warn_destroy;
}

use threads;

rmtree 'foofolder';
mkdir 'foofolder' or die $!;

my $folder = KinoSearch::Store::FSFolder->new(
     path => 'foofolder',
);
warn $folder;

for my $num ( 0 .. 10 ) {
     my $thread = threads->create( sub {
         $folder->refcount_inc;
         my $out =
$folder->open_outstream("file_$num");
     });
     $thread->join;
}





_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch



_______________________________________________
KinoSearch mailing list
KinoSearchrectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


  
[1-6]

about | contact  Other archives ( Real Estate discussion Medical topics )