List Info

Thread: Re: Search performance issues and profiling/debugging




Re: Search performance issues and profiling/debugging
country flaguser name
Israel
2007-10-24 11:14:22
 Hi..

Alex:
Undoubtedly, XenSource is the cause of the OProfile problem.
We looked 
into a patch for that, but there is one only for version 3
and we are 
using version 4. (and, yes, we are using the
"official" XenSource one).
As for the disks, each of the 6 disks is defined as an lvm,
where one is 
allocated with 80Gb for system/log and the other 5 and
allocated fully 
(465GB of the 465.5GB) to the databases. Each two databases
sit on the 
same disk. No other instance (since none exist) or Dom0 is
using the 5 
disks of the databases.

Hope that clarifies the disk mapping.

James:
While anything we put on the server complicates things a
bit, xen is not 
really the issue here I believe. If the problem was Xen
related, why 
would a scheduling problem effect "no recip" in
such a consistent way, 
even after compacting the databases and moving them around?
If Xapian 
used DMA, undocumented interrupts or something else out of
the ordinary, 
I understand why it would be something to look into first,
but what 
makes you think that Xen in the mix can explain the
variation in 
estimates, the strange performance issues with specific
queries only, 
and other strange things we see?
We will certainly try to profile things, even test without
Xen if we 
can't profile on it. Again, being the only VM instance
running on that 
machine, there is little scheduling to do and no competition
over IO and 
other resources. But even if there were, why would it be so
constant on 
"no recip" search? Don't make too much sense
unless we are missing 
something.

We indeed tested things well over 3 times. This is why I
picked "no 
recip" as a search. It is constantly performing badly
even when searched 
second of third time right after the first (see debug
output).

Below are stats from 100 runs:

Chris:
We will try to test it without Xen as well later on. Keep in
mind that 
to do so we will have to move aside 10 databases of 50GB,
reinstall the 
machine and remove the database into place. We would do it
first thing 
if we believed its Xen's issue, although if we can't profile
things we 
might do this anyway (or test it on a different machine).

Olly: Sorry, we removed the old Database10 after compressing
it. Since 
then we didn't see the seg fault. We will keep a close eye
and contact 
you as soon as we see such error again.

Best regards,
Ron.



Alexandre Gauthier wrote:
> Chris Good a écrit :
>> Ron Kass wrote:
>>  
>>> Not sure what you mean by "other VMs could
well be confusing your 
>>> results"
>>> We use XenServer on this machine, but we have
only one instance 
>>> (DomU), and only this instance is running
everything locally. So 
>>> there are no other VMs to confuse things, and
even if there were, 
>>> they have nothing to do with the VM we run the
test on or with the 
>>> test itself.
>>> (Can you clarify what you mean?)
>>>     
>>
>> If you have multiple VMs sharing the same hardware
then activity on one
>> will obviously affect the performance on other VMs.
 Since you're 
>> running
>> a lone DomU other DomUs aren't going to be
competing for resources 
>> but it's possible that something in Dom0 is getting
swapped in and 
>> running.
>>
>> How are you accessing your drives, is DomU
accessing the raw devices 
>> or is
>> it mapped via virtual files from Dom0?
>>
>> Is it possible to run these tests either directly
from Dom0 or even 
>> better
>> with a non-xen kernel?
>>
>> Given your current configuration of a single VM xen
isn't adding 
>> anything so removing it would eliminate any
side-effects of it.  I 
>> also suspect
>> that it would cure your oprofile issue.
>>
>> Chris
>>
>>   
> Sorry to intrude, but if I may offer some insight, the
Dom0 instance 
> in a Xen set-up is just as paravirtualized as a DomU --
it just has 
> control access to memory inside DomUs, and offers the
drivers back-end 
> interfaces. The Dom0 and DomUs both run on top of the
Xen kernel.
>
> Also, if he is running a commercial Xen from XenSource,
he won't have 
> access to the Dom0, which is a custom frankenstein mix
of SuSE and 
> RHEL witth no other puprose but to control the DomUs, a
bit like ESX.
>
> The question of the DomU's disk mapping is still valid,
and I'd be 
> curious to hear the answer. I also think Xen is
responsible for the 
> oprofile troubles, I get that on a Debian DomU as
well.
>
> I hope this vaguely helps...
>
> Alex
>
>

James Aylett wrote:
> On Wed, Oct 24, 2007 at 04:04:22PM +0200, Ron Kass
wrote:
>
>   
>> Although we should never rule out something
completely without checking, 
>> I believe quite strongly that the issues we are
seeing are not coming 
>> from Xen, as per this instance it is a regular
dedicated Linux (centos 
>> 5) machine and the resources are fully dedicated to
it.
>>     
>
> It seems to me that there are two distinct problems.
You have some
> queries that are underperforming, which with some
profiling will
> expose either something unusual about your database or
code, or a
> bottleneck or optimisation problem in Xapian.
>
> The other is the variation. I agree with Chris that
adding Xen into
> the mix is complicating matters considerably. Things
like IO
> scheduling, for instance, become harder in even the
best
> virtualisation systems. It's bad enough that a single
instance of an
> OS can suddenly start doing things you don't expect,
even with no
> other significant userspace clients :-/
>
> Out of interest, are your figures averages of multiple
runs? If not,
> I'd be interested in seeing 1st, 2nd and 3rd query
times (broken down
> as Olly suggests), but with mean & sd over say 100
runs.
>
> (Apologies if you have done that - I've been trying to
follow this
> thread closely, but an explosion of posts has combined
with a busy
> period at my end 
>
> J
>
>   
Chris Good wrote:
> Ron Kass wrote:
>   
>> I believe quite strongly that the issues we are
seeing are not coming 
>> from Xen, as per this instance it is a regular
dedicated Linux (centos 
>> 5) machine and the resources are fully dedicated to
it.
>>     
>
> I'd still encourage you to give it a go if only to rule
it out and let
> you run oprofile.  Running inside Xen certainly
shouldn't affect your
> match sets but it the diskwriter process kicking in
could fully explain
> some of the timing variances that you've seen when
re-running queries.
>   


Olly Betts wrote:
>> Anyway, we have actually used xapian-compress on
the databases to see if 
>> it helps. It appears to have rid of the
segmentation fault error on 
>> database 10, but the slowness and the variations in
estimates still exist.
>>     
>
> A seg fault is clearly a bug somewhere, and I'd really
like to know
> where.  Do you still have the un-compacted database, or
if not can you
> recreate it?  If so, please rerun the test on it under
gdb as I
> requested in my previous mail!
>
> Cheers,
>     Olly

_______________________________________________
Xapian-discuss mailing list
Xapian-discusslists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss

Re: Search performance issues and profiling/debugging
country flaguser name
Israel
2007-10-24 16:04:01
Hi all
Sorry, seems I forgot to paste the statistics for the 100
consecutive 
runs we did on th 'no recip' search..
Here it is

    Max      : 40.845
    Min       : 0.973
    Average : 1.739141414
    StDev    : 4.161330613

http://www.pidgintech.com/other/fts/test/100-stats.xls

http://www.pidgintech.com/other/fts/test/100-stats.out


Following is an excel file of the total times and a detailed
output of 
the debug info.
Note, again that those searches happen on the same opened
instance of 
the database, on unchanged databases.

Your thoughts?

Ron

Ron Kass wrote:
>
> Below are stats from 100 runs:
>
_______________________________________________
Xapian-discuss mailing list
Xapian-discusslists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )