|
List Info
Thread: Sqlite performance
|
|
| Sqlite performance |

|
2007-03-21 09:39:35 |
Hi!
I had the feeling that the yum sqlite db is slow - too
slow.
I wrote a small python program that shows the problem:
Joining with packages
to map the pkgId to the pkgKey is expensive.
The values strongly differ depending on the presence of
indices (surprise!).
So my program runs the test twice and tries to create/drop
the index. Write
access is needed for altering the index.
Output on my computer (FC6 core) is:
Reading requires.name for pkg 'glibc'
Access via pkgId: 7.07 0.0707
Access via pkgKey: 1.39 0.0139
Access via pkgKey is 5.1 times faster
creating index took 0.38s
Access via pkgId: 2.21 0.0221
Access via pkgKey: 0.01 0.0001
Access via pkgKey is 212.7 times faster
The first number is the overall time and the second the time
per select.
have fun
Florian
_______________________________________________
Yum-devel mailing list
Yum-devel linux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel
a>
|
|
|
| Re: Sqlite performance |
  United States |
2007-03-22 00:17:23 |
On Wed, 2007-03-21 at 15:39 +0100, Florian Festi wrote:
> Hi!
>
> I had the feeling that the yum sqlite db is slow - too
slow.
>
> I wrote a small python program that shows the problem:
Joining with packages
> to map the pkgId to the pkgKey is expensive.
>
> The values strongly differ depending on the presence of
indices (surprise!).
> So my program runs the test twice and tries to
create/drop the index. Write
> access is needed for altering the index.
>
So anyplace we retrieve pkgIds as a unique identifier it
would seem to
make sense to grab pkgKey, too and stuff that in the package
object so
we can get to it later for doing joins with it instead of
pkgId. Is that
what you were thinking?
-sv
_______________________________________________
Yum-devel mailing list
Yum-devel linux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel
a>
|
|
| Re: Sqlite performance |

|
2007-03-22 10:08:30 |
> So anyplace we retrieve pkgIds as a unique identifier
it would seem to
> make sense to grab pkgKey, too and stuff that in the
package object so
> we can get to it later for doing joins with it instead
of pkgId. Is that
> what you were thinking?
I did that but it didn't help (see attachment). I tried to
optimize
returnPrco() but it kept using up a lot of time. I was
wondering if there
was a problem with the caching, but it turned out that it
really has to load
lots of Prcos.
Reason for this is that the sqlitesack.search*(name) returns
a list of
package objects. Because of that the surrounding code has to
load the
provides for all returned packages and has to search through
them.
Easiest solution would be to return a list of (pkg, name,
flag, (e,v,r))
tuples or a dict {pkg -> list of (n, f, (e,v,r))} (or use
new Requires and
Conflicts objects). Even better would be to also pass flag
and (e,v,r) to
the search methods. That way the sqlitesack could put as
much intelligence
into the SQL queries as it wants to further reduce the
amount of data being
processed.
The attached patch changes all pkgId to pkgKey lookups and
loads every
package only once. I renamed ._excluded to .excluded to
avoid usage with
pkgId insted of pkgKey.
Florian
_______________________________________________
Yum-devel mailing list
Yum-devel linux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel
a>
|
|
|
| Re: Sqlite performance |
  United States |
2007-03-22 10:21:08 |
On Thu, 2007-03-22 at 16:08 +0100, Florian Festi wrote:
> Easiest solution would be to return a list of (pkg,
name, flag, (e,v,r))
> tuples or a dict {pkg -> list of (n, f, (e,v,r))}
(or use new Requires and
> Conflicts objects). Even better would be to also pass
flag and (e,v,r) to
> the search methods. That way the sqlitesack could put
as much intelligence
> into the SQL queries as it wants to further reduce the
amount of data being
> processed.
the flags and evr for comparisons in SQL aren't the same as
in rpm. So
passing them in won't help you.
ie: if I want foo > 1.1
It can't just look at the version field in the provides of
the packages
db.
so we search names to get as close as we can and then do the
search
using checkPrco.
-sv
_______________________________________________
Yum-devel mailing list
Yum-devel linux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel
a>
|
|
| Re: Sqlite performance |
  United States |
2007-03-23 13:35:40 |
On Fri, 2007-03-23 at 15:51 +0100, Florian Festi wrote:
> I see absolutely no reason why this shouldn't be
possible in the search
> methods. Of course you would have to move the
.inPrcoRange() code into the
> package sack. In fact the rpmsack supports just that
(whatProvides(),
> whatRequires()).
It is possible in the search method, except it'll mean doing
the same
operation inside the sack anyway. Putting it per-package
object means
that no matter the type of sack we're using it'll work. I'll
be happy to
give it a try if it'll help speed things up further.
> If the sqlitesack and the transaction (as virtual sack
representing the
> future rpmdb) provide this interface, too, the resolver
will look much
> cleaner and changes on 'who searches where' will be
much easier.
A sack object from the txmbr's in the tsInfo would be
useful. I've
thought of that, too. However, it seems like we would need
to be able to
get back a ListPackageSack based on:
- ts state
- pkg nevra
- all
I might be able to whip up a simple method to let us
interact with it as
a PackageSack so we can use the same methods. But that's
even more
reason the search method needs to be independent of the type
of sack it
is.
-sv
_______________________________________________
Yum-devel mailing list
Yum-devel linux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel
a>
|
|
[1-5]
|
|
|
about | contact Other archives ( Real Estate discussion Medical topics )
|