|
List Info
Thread: Re: Internal error: field_num 111 > max_field_num 6
|
|
| Re: Internal error: field_num 111 >
max_field_num 6 |
  United States |
2007-06-11 16:42:06 |
Hi Marvin,
First off, thanks for all your great work with KinoSearch!
I'm running into a little problem that sounds a lot like the
once
posted here:
http://www.rectangular.com/pipermail/kinosea
rch/2006-May/000157.html
Specifics:
- index has about 15m docs
- using KinoSearch 0.15
- running on Debian (Linux oldev4.sea 2.6.18.5-amd64 #2 SMP
Fri Dec
15 12:21:08 PST 2006 x86_64 GNU/Linux)
- the error is repeatable; here's the error:
Internal error: field_num 111 > max_field_num 6 at
/site/perl/
perl-5.8.8/site_perl/5.8.8/x86_64-linux/KinoSearch/Index/
TermInfosReader.pm line 92n
tKinoSearch::Index::TermInfosReader::_scan_enum
('KinoSearch::Index::TermInfosReader=HASH(0x93a730)',
'\x\x
23843') called at
/site/perl/perl-5.8.8/site_perl/5.8.8/x86_64-linux/
KinoSearch/Index/TermInfosReader.pm line 64n
tKinoSearch::Index::TermInfosReader::fetch_term_info
('KinoSearch::Index::TermInfosReader=HASH(0x93a730)',
'KinoSearch::Index::Term=HASH(0x235bc20)') called at
/site/perl/
perl-5.8.8/site_perl/5.8.8/x86_64-linux/KinoSearch/Index/Seg
Reader.pm
ETC.
- the indexer has 7 spec_fields, 1 of which is not indexed
- the index file has been optimized, and is about 4.1G
I'm going to set K_DEBUG to 1 and see what I can see. Any
thoughts?
Thanks,
Matthew
_______________________________________________
KinoSearch mailing list
KinoSearch rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
|
|
| Re: Internal error: field_num 111 >
max_field_num 6 |
  United States |
2007-06-11 18:32:15 |
One additional bit of info. I'm running this under mod_perl
(latest),
with the searcher held in class data. I suspect it's a
concurrency
issue, actually. Any thoughts?
Thanks again,
Matthew
On Jun 11, 2007, at 2:42 PM, Matthew Berk wrote:
> Hi Marvin,
>
> First off, thanks for all your great work with
KinoSearch!
>
> I'm running into a little problem that sounds a lot
like the once
> posted here:
>
> http://www.rectangular.com/pipermail/kinosea
rch/2006-May/000157.html
>
> Specifics:
>
> - index has about 15m docs
> - using KinoSearch 0.15
> - running on Debian (Linux oldev4.sea 2.6.18.5-amd64
#2 SMP Fri
> Dec 15 12:21:08 PST 2006 x86_64 GNU/Linux)
> - the error is repeatable; here's the error:
>
> Internal error: field_num 111 > max_field_num 6 at
/site/perl/
>
perl-5.8.8/site_perl/5.8.8/x86_64-linux/KinoSearch/Index/
> TermInfosReader.pm line 92n
> tKinoSearch::Index::TermInfosReader::_scan_enum
> ('KinoSearch::Index::TermInfosReader=HASH(0x93a730)',
'\x\x
> 23843') called at
/site/perl/perl-5.8.8/site_perl/5.8.8/x86_64-
> linux/KinoSearch/Index/TermInfosReader.pm line 64n
> tKinoSearch::Index::TermInfosReader::fetch_term_info
> ('KinoSearch::Index::TermInfosReader=HASH(0x93a730)',
> 'KinoSearch::Index::Term=HASH(0x235bc20)') called at
/site/perl/
>
perl-5.8.8/site_perl/5.8.8/x86_64-linux/KinoSearch/Index/
> SegReader.pm ETC.
>
> - the indexer has 7 spec_fields, 1 of which is not
indexed
> - the index file has been optimized, and is about
4.1G
>
> I'm going to set K_DEBUG to 1 and see what I can see.
Any thoughts?
>
> Thanks,
>
> Matthew
>
Marchex, Inc.
http://www.marchex.com
This email message and any attachments are solely for
intended
recipients, and may contain information that is privileged
and
confidential. If you are not the intended recipient, any
dissemination, distribution or copying is strictly
prohibited. If you
believe that you may have received this message in error,
please
immediately notify the sender by replying to this e-mail
message.
_______________________________________________
KinoSearch mailing list
KinoSearch rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
|
|
| Re: Re: Internal error: field_num 111
> max_field_num 6 |
  United States |
2007-06-11 18:50:55 |
On Jun 11, 2007, at 4:32 PM, Matthew Berk wrote:
> One additional bit of info. I'm running this under
mod_perl
> (latest), with the searcher held in class data. I
suspect it's a
> concurrency issue, actually. Any thoughts?
Yes. KinoSearch isn't thread safe.
The error message you saw suggests that an input stream has
gotten
out of sync. It's at the wrong place in the file and
reading data
that the module considers insane; there's a sanity check in
place to
give you that exception and head off the looming segfault.
The error from a year ago was due to a 32-bit bottleneck I'd
missed
truncating a file pointer. Since then, I don't think I've
received
any reports of 0.15 i/o sync errors other than yours.
Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/
_______________________________________________
KinoSearch mailing list
KinoSearch rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
|
|
| RE: Re: Internal error: field_num 111
> max_field_num 6 |
  United States |
2007-06-11 23:32:45 |
Would a simple locking mechanism (I know you have one for
.2) work to alleviate the problem under mod_perl?
-----Original Message-----
From: kinosearch-bounces+matthew=openlist.com rectangular.com on behalf of Marvin Humphrey
Sent: Mon 6/11/2007 4:50 PM
To: KinoSearch discussion forum
Subject: Re: [KinoSearch] Re: Internal error: field_num 111
> max_field_num 6
On Jun 11, 2007, at 4:32 PM, Matthew Berk wrote:
> One additional bit of info. I'm running this under
mod_perl
> (latest), with the searcher held in class data. I
suspect it's a
> concurrency issue, actually. Any thoughts?
Yes. KinoSearch isn't thread safe.
The error message you saw suggests that an input stream has
gotten
out of sync. It's at the wrong place in the file and
reading data
that the module considers insane; there's a sanity check in
place to
give you that exception and head off the looming segfault.
The error from a year ago was due to a 32-bit bottleneck I'd
missed
truncating a file pointer. Since then, I don't think I've
received
any reports of 0.15 i/o sync errors other than yours.
Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/
_______________________________________________
KinoSearch mailing list
KinoSearch rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
_______________________________________________
KinoSearch mailing list
KinoSearch rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
|
|
|
| Re: Re: Internal error: field_num 111
> max_field_num 6 |
  United States |
2007-06-12 09:53:14 |
On Jun 11, 2007, at 9:32 PM, Matthew Berk wrote:
> Would a simple locking mechanism (I know you have one
for .2) work
> to alleviate the problem under mod_perl?
Serializing access, say by forcing each complete search into
a
subroutine and preventing everybody else from using the
shared
resources until the call returns... I think that would solve
the
concurrency issue.
However, you're going to get memory errors, because when one
of
KinoSearch's C struct objects gets copied to multiple
threads, the
internal refcount doesn't get incremented. (KS objects keep
their
own refcounts, distinct from Perl's SvREFCNT.) When the
first of
several Perl wrapper objects referencing the C object gets
DESTROYed,
the KS refcount will fall to 0 and the C object will clean
itself
up. From now on, any other Perl wrapper objects referencing
that C
struct are pointing at freed memory. That's Bad.
A script which segfaults to illustrate the problem is
below.
Solving this is hard. Incrementing the KS refcounts of
objects
copied when Perl spawns a thread is a PITA, because CLONE
gets
invoked as a *package* method. You have to maintain global
registries keeping track of all live object references, then
iterate
over all the items in the registries when CLONE tips you off
that a
thread is starting.
Maintaining such a scheme is messy, labor-intensive and
error-prone
-- and that's the biggest reason among several why KS
doesn't do
threads.
Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/
#----------------------------------------------------------
#!/usr/bin/perl
use strict;
use warnings;
use KinoSearch::Store::FSFolder;
use Carp qw( cluck );
use File::Path qw( rmtree );
sub warn_destroy {
my $self = shift;
cluck "Destroying";
KinoSearch::Util::Obj: ESTROY($
self);
}
{
no warnings 'once';
*KinoSearch::Store::FSFolder: ESTROY =
*warn_destroy;
}
use threads;
rmtree 'foofolder';
mkdir 'foofolder' or die $!;
my $folder = KinoSearch::Store::FSFolder->new(
path => 'foofolder',
);
warn $folder;
for my $num ( 0 .. 10 ) {
my $thread = threads->create( sub {
$folder->refcount_inc;
my $out =
$folder->open_outstream("file_$num");
});
$thread->join;
}
_______________________________________________
KinoSearch mailing list
KinoSearch rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
|
|
| RE: Re: Internal error: field_num 111
> max_field_num 6 |
  United States |
2007-06-12 10:14:05 |
Does this problem also hold under MP, where each use is a
separae proc? I take it the answer is yes, because child
processes get copies of class data from the parent process.
Any other recommendations for safely running this under MP?
Thanks!
Matthew
-----Original Message-----
From: kinosearch-bounces+matthew=openlist.com rectangular.com on behalf of Marvin Humphrey
Sent: Tue 6/12/2007 7:53 AM
To: KinoSearch discussion forum
Subject: Re: [KinoSearch] Re: Internal error: field_num 111
> max_field_num 6
On Jun 11, 2007, at 9:32 PM, Matthew Berk wrote:
> Would a simple locking mechanism (I know you have one
for .2) work
> to alleviate the problem under mod_perl?
Serializing access, say by forcing each complete search into
a
subroutine and preventing everybody else from using the
shared
resources until the call returns... I think that would solve
the
concurrency issue.
However, you're going to get memory errors, because when one
of
KinoSearch's C struct objects gets copied to multiple
threads, the
internal refcount doesn't get incremented. (KS objects keep
their
own refcounts, distinct from Perl's SvREFCNT.) When the
first of
several Perl wrapper objects referencing the C object gets
DESTROYed,
the KS refcount will fall to 0 and the C object will clean
itself
up. From now on, any other Perl wrapper objects referencing
that C
struct are pointing at freed memory. That's Bad.
A script which segfaults to illustrate the problem is
below.
Solving this is hard. Incrementing the KS refcounts of
objects
copied when Perl spawns a thread is a PITA, because CLONE
gets
invoked as a *package* method. You have to maintain global
registries keeping track of all live object references, then
iterate
over all the items in the registries when CLONE tips you off
that a
thread is starting.
Maintaining such a scheme is messy, labor-intensive and
error-prone
-- and that's the biggest reason among several why KS
doesn't do
threads.
Marvin Humphrey
Rectangular Research
http://www.rectangular.co
m/
#----------------------------------------------------------
#!/usr/bin/perl
use strict;
use warnings;
use KinoSearch::Store::FSFolder;
use Carp qw( cluck );
use File::Path qw( rmtree );
sub warn_destroy {
my $self = shift;
cluck "Destroying";
KinoSearch::Util::Obj: ESTROY($
self);
}
{
no warnings 'once';
*KinoSearch::Store::FSFolder: ESTROY =
*warn_destroy;
}
use threads;
rmtree 'foofolder';
mkdir 'foofolder' or die $!;
my $folder = KinoSearch::Store::FSFolder->new(
path => 'foofolder',
);
warn $folder;
for my $num ( 0 .. 10 ) {
my $thread = threads->create( sub {
$folder->refcount_inc;
my $out =
$folder->open_outstream("file_$num");
});
$thread->join;
}
_______________________________________________
KinoSearch mailing list
KinoSearch rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
_______________________________________________
KinoSearch mailing list
KinoSearch rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
|
|
|
[1-6]
|
|