List Info

Thread: minor problem




minor problem
country flaguser name
United States
2007-12-23 13:38:14
I am running some custom index code. I have a process that
all other processes communicate with to insert documents
(and other update functions such as delete, but for right
now just inserts). I index and hand it over to termGenerator
and all the other stuff to add a document. This works.
However, it runs really slow (a document every several
seconds or so, input document size about 2k-40k). When I do
a "ps -ef" command from the command line I see a
task belonging to my daemon that shows the command being run
as "/bin/cat". Looking in the xapian source code I
have found that to be in the flint backend locking code.
 
Since I am serializing my updates (one after another) and
only from a single process, why am I seeing what appears to
be long-term locks?
 
This index code ran very fast in pre-1.0 versions of the
indexer. I upgraded to 1.0.0, then 1.0.1, etc. But I didn't
need to index until recently.

There are currently only 10,000 documents in the database.
 
Thanks,
 
-Michael Lewis
_______________________________________________
Xapian-discuss mailing list
Xapian-discusslists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss

Re: minor problem
country flaguser name
United Kingdom
2007-12-23 15:27:27
On Sun, Dec 23, 2007 at 02:38:14PM -0500, Michael A. Lewis
wrote:

[flint indexing processing apparently blocking on /bin/cat]
> Since I am serializing my updates (one after another)
and only from
> a single process, why am I seeing what appears to be
long-term
> locks?

Which OS? We had problems on Mac OS X which I don't remember
ever
actually getting to the bottom of, that involved its sitting
on
/bin/cat. (I noticed it while running the test suite.)

IIRM that was using the remotetcp backend, with flint behind
it; the
tests were failing to get the remote backend up into
listening state,
because the locking was getting stuck. AFAIK we haven't seen
the same
thing on other operating systems (Mac OS X still fails these
tests in
HEAD for me, and there has been discussion of writing our
own /bin/cat
replacement in an effort, amongst other things, to fix
this).

J

-- 
/-----------------------------------------------------------
---------------
  James Aylett                                              
   xapian.org
  jamestartarus.org                              
uncertaintydivision.org

_______________________________________________
Xapian-discuss mailing list
Xapian-discusslists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss

RE: minor problem
country flaguser name
United States
2007-12-23 15:28:08
Sorry, uname displays:
 
Linux xelent.net 2.6.18-1.2798.fc6 #1 SMP Mon Oct 16
14:39:22 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux

________________________________

From: xapian-discuss-bounceslists.xapian.org on behalf
of James Aylett
Sent: Sun 12/23/2007 4:27 PM
To: xapian-discusslists.xapian.org
Subject: Re: [Xapian-discuss] minor problem



On Sun, Dec 23, 2007 at 02:38:14PM -0500, Michael A. Lewis
wrote:

[flint indexing processing apparently blocking on /bin/cat]
> Since I am serializing my updates (one after another)
and only from
> a single process, why am I seeing what appears to be
long-term
> locks?

Which OS? We had problems on Mac OS X which I don't remember
ever
actually getting to the bottom of, that involved its sitting
on
/bin/cat. (I noticed it while running the test suite.)

IIRM that was using the remotetcp backend, with flint behind
it; the
tests were failing to get the remote backend up into
listening state,
because the locking was getting stuck. AFAIK we haven't seen
the same
thing on other operating systems (Mac OS X still fails these
tests in
HEAD for me, and there has been discussion of writing our
own /bin/cat
replacement in an effort, amongst other things, to fix
this).

J

--
/-----------------------------------------------------------
---------------
  James Aylett                                              
   xapian.org
  jamestartarus.org                              
uncertaintydivision.org

_______________________________________________
Xapian-discuss mailing list
Xapian-discusslists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss


_______________________________________________
Xapian-discuss mailing list
Xapian-discusslists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss

RE: minor problem
country flaguser name
United States
2007-12-23 16:17:34
The last server ID was incorrect. That was my old Xapian
server. My software is currently running on:
 
Linux localhost.localdomain 2.6.18-1.2798.fc6 #1 SMP Mon Oct
16 14:37:32 EDT 2006 i686 i686 i386 GNU/Linux
________________________________

From: xapian-discuss-bounceslists.xapian.org on behalf
of James Aylett
Sent: Sun 12/23/2007 4:27 PM
To: xapian-discusslists.xapian.org
Subject: Re: [Xapian-discuss] minor problem



On Sun, Dec 23, 2007 at 02:38:14PM -0500, Michael A. Lewis
wrote:

[flint indexing processing apparently blocking on /bin/cat]
> Since I am serializing my updates (one after another)
and only from
> a single process, why am I seeing what appears to be
long-term
> locks?

Which OS? We had problems on Mac OS X which I don't remember
ever
actually getting to the bottom of, that involved its sitting
on
/bin/cat. (I noticed it while running the test suite.)

IIRM that was using the remotetcp backend, with flint behind
it; the
tests were failing to get the remote backend up into
listening state,
because the locking was getting stuck. AFAIK we haven't seen
the same
thing on other operating systems (Mac OS X still fails these
tests in
HEAD for me, and there has been discussion of writing our
own /bin/cat
replacement in an effort, amongst other things, to fix
this).

J

--
/-----------------------------------------------------------
---------------
  James Aylett                                              
   xapian.org
  jamestartarus.org                              
uncertaintydivision.org

_______________________________________________
Xapian-discuss mailing list
Xapian-discusslists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss


_______________________________________________
Xapian-discuss mailing list
Xapian-discusslists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss

Re: minor problem
country flaguser name
United Kingdom
2007-12-23 21:58:41
On Sun, Dec 23, 2007 at 02:38:14PM -0500, Michael A. Lewis
wrote:
> When I do a "ps -ef" command from the command
line I see a task
> belonging to my daemon that shows the command being run
as "/bin/cat".
> Looking in the xapian source code I have found that to
be in the flint
> backend locking code.

The semantics of fcntl() locking within a process are rather
unhelpful,
so we fork a child process to take and hold the lock for us.
 To
minimise VM use, we just exec /bin/cat once the lock is
obtained.

> Since I am serializing my updates (one after another)
and only from a
> single process, why am I seeing what appears to be
long-term locks?

The lock is held (and so the /bin/cat child process exists)
for as long
as you have the WritableDatabase open.  So unless you're
closing and
reopening the database for each addition (which generally is
probably
not a good idea) then this sounds like what I'd expect.

> This index code ran very fast in pre-1.0 versions of
the indexer. I
> upgraded to 1.0.0, then 1.0.1, etc. But I didn't need
to index until
> recently.

It's hard to know what's going on from the information
given.  You said
you're using TermGenerator, which is new in 1.0.0, so that
may be
indexing significantly differently to whatever you were
using before.
Though several seconds per document for a 10,000 document
database
really is excessively slow anyway.

Could you show us what the indexing code looks like?

Cheers,
    Olly

_______________________________________________
Xapian-discuss mailing list
Xapian-discusslists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss

Re: minor problem
country flaguser name
United Kingdom
2007-12-23 22:14:48
On Sun, Dec 23, 2007 at 09:27:27PM +0000, James Aylett
wrote:
> On Sun, Dec 23, 2007 at 02:38:14PM -0500, Michael A.
Lewis wrote:
> 
> [flint indexing processing apparently blocking on
/bin/cat]
> > Since I am serializing my updates (one after
another) and only from
> > a single process, why am I seeing what appears to
be long-term
> > locks?
> 
> Which OS? We had problems on Mac OS X which I don't
remember ever
> actually getting to the bottom of, that involved its
sitting on
> /bin/cat. (I noticed it while running the test suite.)

I don't think this is the same issue.  Michael reports slow
indexing
rather than the process hanging.

> IIRM that was using the remotetcp backend, with flint
behind it; the
> tests were failing to get the remote backend up into
listening state,
> because the locking was getting stuck. AFAIK we haven't
seen the same
> thing on other operating systems (Mac OS X still fails
these tests in
> HEAD for me, and there has been discussion of writing
our own /bin/cat
> replacement in an effort, amongst other things, to fix
this).

This thread I guess?
 
http://thread.gmane.org/gmane.comp.search.
xapian.devel/1082/focus=1087

Though the error we were persuing there seems to be
unrelated to remote
backend use if I read the messages correctly.

I don't think I've heard similar reports for any other OS. 
Or from
another OS X user - could it be something odd about your
setup perhaps?

I don't have access to Mac OS X, so I'm going to need help
on this one!  

Cheers,
    Olly

_______________________________________________
Xapian-discuss mailing list
Xapian-discusslists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss

RE: minor problem
country flaguser name
United States
2007-12-23 22:31:17
Thanks for the response Olly. My indexing code appears
below. A note about the speed. It was this slow (at least to
the naked eye) even when there were only a couple of hundred
documents. After this code, the child process which contains
this code just exits.
 
try {                        
        Xapian::WritableDatabase database(dbname,
DB_CREATE_OR_OPEN);                        
        Xapian::TermGenerator indexer;                      
 
        Xapian::Stem stemmer("english");          
             
        indexer.set_stemmer(stemmer);                       

        Xapian:ocument
doc;                        
        doc.set_data(line);                        
        indexer.set_document(doc);                        
        indexer.index_text(line);                        
        if (meta) {                        
                doc.set_data(metatext);                     
  
        }                        
        docid=database.add_document(doc);                   
    
        sprintf( tmp1, "%lu", docid );            
           
        x = write( c_id, tmp1, strlen(tmp1) );              
         
        if ( x != strlen(tmp1) ) {                        
            log_it( "ERROR: insert could not write to
socket" );                        
        }                        
}

 
________________________________

From: Olly Betts [mailto:ollysurvex.com]
Sent: Sun 12/23/2007 10:58 PM
To: Michael A. Lewis
Cc: xapian-discusslists.xapian.org
Subject: Re: [Xapian-discuss] minor problem



On Sun, Dec 23, 2007 at 02:38:14PM -0500, Michael A. Lewis
wrote:
> When I do a "ps -ef" command from the command
line I see a task
> belonging to my daemon that shows the command being run
as "/bin/cat".
> Looking in the xapian source code I have found that to
be in the flint
> backend locking code.

The semantics of fcntl() locking within a process are rather
unhelpful,
so we fork a child process to take and hold the lock for us.
 To
minimise VM use, we just exec /bin/cat once the lock is
obtained.

> Since I am serializing my updates (one after another)
and only from a
> single process, why am I seeing what appears to be
long-term locks?

The lock is held (and so the /bin/cat child process exists)
for as long
as you have the WritableDatabase open.  So unless you're
closing and
reopening the database for each addition (which generally is
probably
not a good idea) then this sounds like what I'd expect.

> This index code ran very fast in pre-1.0 versions of
the indexer. I
> upgraded to 1.0.0, then 1.0.1, etc. But I didn't need
to index until
> recently.

It's hard to know what's going on from the information
given.  You said
you're using TermGenerator, which is new in 1.0.0, so that
may be
indexing significantly differently to whatever you were
using before.
Though several seconds per document for a 10,000 document
database
really is excessively slow anyway.

Could you show us what the indexing code looks like?

Cheers,
    Olly


_______________________________________________
Xapian-discuss mailing list
Xapian-discusslists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss

Re: minor problem
user name
2007-12-25 17:18:53
Michael,

I am tracking the performance of indexing now for second
year. The
indexing performance has been dramatically deteriorating.
Only users
who haven't actually re indexed their data for long time as
in  this
example Michael, can only notice the slowness of Xapian
indexing. As
you can track my postings from about year ago I used to
index 20
million documents within an hour. Now I am indexing 50
million
documents in about 29 hours.

The biggest downshift in Xapian indexing performance was
introducing
Flint database with compression. The second down shift in
performance
was introducing locking Flint databases. However Xapian
indexing is
still the fastest compare to other technologies otherwise we
wouldn't
be here ...

Cheers
__________________________________
  Kevin Duraj
  http://UncensoredWebSe
arch.com


On Dec 23, 2007 8:31 PM, Michael A. Lewis <MALicginc.com> wrote:
> Thanks for the response Olly. My indexing code appears
below. A note about the speed. It was this slow (at least to
the naked eye) even when there were only a couple of hundred
documents. After this code, the child process which contains
this code just exits.
>
> try {
>        Xapian::WritableDatabase database(dbname,
DB_CREATE_OR_OPEN);
>        Xapian::TermGenerator indexer;
>        Xapian::Stem stemmer("english");
>        indexer.set_stemmer(stemmer);
>        Xapian:ocument
doc;
>        doc.set_data(line);
>        indexer.set_document(doc);
>        indexer.index_text(line);
>        if (meta) {
>                doc.set_data(metatext);
>        }
>        docid=database.add_document(doc);
>        sprintf( tmp1, "%lu", docid );
>        x = write( c_id, tmp1, strlen(tmp1) );
>        if ( x != strlen(tmp1) ) {
>            log_it( "ERROR: insert could not write
to socket" );
>        }
> }
>
>
> ________________________________
>
> From: Olly Betts [mailto:ollysurvex.com]
> Sent: Sun 12/23/2007 10:58 PM
> To: Michael A. Lewis
> Cc: xapian-discusslists.xapian.org
> Subject: Re: [Xapian-discuss] minor problem
>
>
>
>
> On Sun, Dec 23, 2007 at 02:38:14PM -0500, Michael A.
Lewis wrote:
> > When I do a "ps -ef" command from the
command line I see a task
> > belonging to my daemon that shows the command
being run as "/bin/cat".
> > Looking in the xapian source code I have found
that to be in the flint
> > backend locking code.
>
> The semantics of fcntl() locking within a process are
rather unhelpful,
> so we fork a child process to take and hold the lock
for us.  To
> minimise VM use, we just exec /bin/cat once the lock is
obtained.
>
> > Since I am serializing my updates (one after
another) and only from a
> > single process, why am I seeing what appears to be
long-term locks?
>
> The lock is held (and so the /bin/cat child process
exists) for as long
> as you have the WritableDatabase open.  So unless
you're closing and
> reopening the database for each addition (which
generally is probably
> not a good idea) then this sounds like what I'd
expect.
>
> > This index code ran very fast in pre-1.0 versions
of the indexer. I
> > upgraded to 1.0.0, then 1.0.1, etc. But I didn't
need to index until
> > recently.
>
> It's hard to know what's going on from the information
given.  You said
> you're using TermGenerator, which is new in 1.0.0, so
that may be
> indexing significantly differently to whatever you were
using before.
> Though several seconds per document for a 10,000
document database
> really is excessively slow anyway.
>
> Could you show us what the indexing code looks like?
>
> Cheers,
>    Olly
>
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discusslists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>



--

_______________________________________________
Xapian-discuss mailing list
Xapian-discusslists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss

Re: minor problem
country flaguser name
United Kingdom
2007-12-26 16:56:03
On Sun, Dec 23, 2007 at 11:31:17PM -0500, Michael A. Lewis
wrote:
> Thanks for the response Olly. My indexing code appears
below. A note
> about the speed. It was this slow (at least to the
naked eye) even
> when there were only a couple of hundred documents.
After this code,
> the child process which contains this code just exits.
>  
> try {                        
>         Xapian::WritableDatabase database(dbname,
DB_CREATE_OR_OPEN);

The reason why this runs slowly is that you are reopening
the database
for every single document you're indexing.  If you just open
the
database once and keep it open, this will run a lot more
quickly.  Also
you'll be able to add new documents in batches, which is a
lot more
efficient.

The reason that this is slower with 1.0.x than 0.9.x is that
flint's
locking code is different to that used by quartz (this was
changed to
eliminate the problem of leaving stale lock files behind if
an indexing
process was killed).  Part of the extra overhead will be
because we now
call fork(), but in this situation there's also the overhead
of the
lock-holding child process needing to exit (releasing the
lock) before
the database can be reopened for writing.

Cheers,
    Olly

_______________________________________________
Xapian-discuss mailing list
Xapian-discusslists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss

RE: minor problem
country flaguser name
United States
2007-12-26 18:27:48
Thank, Olly. After your last email this was my suspicion
also.  I am recoding to keep the DB open. Unlike most Xapian
Apps I've seen, this app is transaction based. Records are
added one or two or 100 at a time. I'll let you know how it
turns out.
 
-Michael

________________________________

From: Olly Betts [mailto:ollysurvex.com]
Sent: Wed 12/26/2007 5:56 PM
To: Michael A. Lewis
Cc: Xapian Discussion
Subject: Re: [Xapian-discuss] minor problem



On Sun, Dec 23, 2007 at 11:31:17PM -0500, Michael A. Lewis
wrote:
> Thanks for the response Olly. My indexing code appears
below. A note
> about the speed. It was this slow (at least to the
naked eye) even
> when there were only a couple of hundred documents.
After this code,
> the child process which contains this code just exits.
> 
> try {                       
>         Xapian::WritableDatabase database(dbname,
DB_CREATE_OR_OPEN);

The reason why this runs slowly is that you are reopening
the database
for every single document you're indexing.  If you just open
the
database once and keep it open, this will run a lot more
quickly.  Also
you'll be able to add new documents in batches, which is a
lot more
efficient.

The reason that this is slower with 1.0.x than 0.9.x is that
flint's
locking code is different to that used by quartz (this was
changed to
eliminate the problem of leaving stale lock files behind if
an indexing
process was killed).  Part of the extra overhead will be
because we now
call fork(), but in this situation there's also the overhead
of the
lock-holding child process needing to exit (releasing the
lock) before
the database can be reopened for writing.

Cheers,
    Olly


_______________________________________________
Xapian-discuss mailing list
Xapian-discusslists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss

[1-10] [11]

about | contact  Other archives ( Real Estate discussion Medical topics )