List Info

Thread: Sqlite performance




Sqlite performance
user name
2007-03-21 09:39:35
Hi!

I had the feeling that the yum sqlite db is slow - too
slow.

I wrote a small python program that shows the problem:
Joining with packages 
to map the pkgId to the pkgKey is expensive.

The values strongly differ depending on the presence of
indices (surprise!).
So my program runs the test twice and tries to create/drop
the index. Write 
access is needed for altering the index.

Output on my computer (FC6 core) is:

Reading requires.name for pkg 'glibc'
Access via pkgId:   7.07 0.0707
Access via pkgKey:  1.39 0.0139
Access via pkgKey is 5.1 times faster
creating index took 0.38s
Access via pkgId:   2.21 0.0221
Access via pkgKey:  0.01 0.0001
Access via pkgKey is 212.7 times faster

The first number is the overall time and the second the time
per select.

have fun

	Florian

_______________________________________________
Yum-devel mailing list
Yum-devellinux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel

  
Re: Sqlite performance
country flaguser name
United States
2007-03-22 00:17:23
On Wed, 2007-03-21 at 15:39 +0100, Florian Festi wrote:
> Hi!
> 
> I had the feeling that the yum sqlite db is slow - too
slow.
> 
> I wrote a small python program that shows the problem:
Joining with packages 
> to map the pkgId to the pkgKey is expensive.
> 
> The values strongly differ depending on the presence of
indices (surprise!).
> So my program runs the test twice and tries to
create/drop the index. Write 
> access is needed for altering the index.
> 

So anyplace we retrieve pkgIds as a unique identifier it
would seem to
make sense to grab pkgKey, too and stuff that in the package
object so
we can get to it later for doing joins with it instead of
pkgId. Is that
what you were thinking?

-sv



_______________________________________________
Yum-devel mailing list
Yum-devellinux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel

Re: Sqlite performance
user name
2007-03-22 10:08:30
> So anyplace we retrieve pkgIds as a unique identifier
it would seem to
> make sense to grab pkgKey, too and stuff that in the
package object so
> we can get to it later for doing joins with it instead
of pkgId. Is that
> what you were thinking?

I did that but it didn't help (see attachment). I tried to
optimize 
returnPrco() but it kept using up a lot of time. I was
wondering if there 
was a problem with the caching, but it turned out that it
really has to load 
lots of Prcos.

Reason for this is that the sqlitesack.search*(name) returns
a list of 
package objects. Because of that the surrounding code has to
load the 
provides for all returned packages and has to search through
them.

Easiest solution would be to return a list of (pkg, name,
flag, (e,v,r)) 
tuples or a dict {pkg -> list of (n, f, (e,v,r))} (or use
new Requires and 
Conflicts objects). Even better would be to also pass flag
and (e,v,r) to 
the search methods. That way the sqlitesack could put as
much intelligence 
into the SQL queries as it wants to further reduce the
amount of data being 
processed.

The attached patch changes all pkgId to pkgKey lookups and
loads every 
package only once. I renamed ._excluded to .excluded to
avoid usage with 
pkgId insted of pkgKey.

Florian


_______________________________________________
Yum-devel mailing list
Yum-devellinux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel

  
Re: Sqlite performance
country flaguser name
United States
2007-03-22 10:21:08
On Thu, 2007-03-22 at 16:08 +0100, Florian Festi wrote:
> Easiest solution would be to return a list of (pkg,
name, flag, (e,v,r)) 
> tuples or a dict {pkg -> list of (n, f, (e,v,r))}
(or use new Requires and 
> Conflicts objects). Even better would be to also pass
flag and (e,v,r) to 
> the search methods. That way the sqlitesack could put
as much intelligence 
> into the SQL queries as it wants to further reduce the
amount of data being 
> processed.

the flags and evr for comparisons in SQL aren't the same as
in rpm. So
passing them in won't help you.

ie: if I want foo > 1.1

It can't just look at the version field in the provides of
the packages
db.

so we search names to get as close as we can and then do the
search
using checkPrco.

-sv


_______________________________________________
Yum-devel mailing list
Yum-devellinux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel

Re: Sqlite performance
country flaguser name
United States
2007-03-23 13:35:40
On Fri, 2007-03-23 at 15:51 +0100, Florian Festi wrote:

> I see absolutely no reason why this shouldn't be
possible in the search 
> methods. Of course you would have to move the
.inPrcoRange() code into the 
> package sack. In fact the rpmsack supports just that
(whatProvides(), 
> whatRequires()).

It is possible in the search method, except it'll mean doing
the same
operation inside the sack anyway. Putting it per-package
object means
that no matter the type of sack we're using it'll work. I'll
be happy to
give it a try if it'll help speed things up further.


> If the sqlitesack and the transaction (as virtual sack
representing the 
> future rpmdb) provide this interface, too, the resolver
will look much 
> cleaner and changes on 'who searches where' will be
much easier.

A sack object from the txmbr's in the tsInfo would be
useful. I've
thought of that, too. However, it seems like we would need
to be able to
get back a ListPackageSack based on:
  - ts state
  - pkg nevra
  - all

I might be able to whip up a simple method to let us
interact with it as
a PackageSack so we can use the same methods. But that's
even more
reason the search method needs to be independent of the type
of sack it
is.

-sv


_______________________________________________
Yum-devel mailing list
Yum-devellinux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel

[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )