List Info

Thread: Sqlite performance + tuning




Sqlite performance + tuning
user name
2007-03-19 11:29:08
Hi!

As yum now also resolves only on the xml/sqlite data some of
our results 
might interesting for the yum development, too.

It turned out that there where no indices in the sqlite
database for some of 
the common use cases:

returnPrco() in sqlitesack.py:
CREATE INDEX pkgprovides ON provides (pkgKey)
CREATE INDEX pkgrequires ON requires (pkgKey)
CREATE INDEX pkgconflicts ON conflicts (pkgKey)
CREATE INDEX pkgobsoletes ON obsoletes (pkgKey)

Those could be handy if the
YumSqlitePackageSack.searchFiles() would try to 
make direct use of those columns:
CREATE INDEX filenames ON files (name)
CREATE INDEX dirnames ON filelist (dirname)

I doubt that this will give any visible improvements with
the current 
depsolver as it is currently hidden by other problems but as
the other 
problems will get fixed the difference will increase. With
our own resolver 
the time needed for resolving an "install *" on
FC6 + extras + updates (6580 
packages) dropped from 10 to 3 Minutes.

As there are plans to directly ship the sqlite dbs insted of
the XML 
metadata, it might be worth checking if it is better to
create the indices 
on the client side (sorry, didn't any checks on myself
yet).

It also turned out that if any quadratic behavior and
looping over files is 
avoided resolving is reasonably fast.

If the upper point is observed the sqlite db is fast enough
to do all 
searching. That way it can replace the memory and time
consuming creation of 
in memory indices/hashes.

At several places it is possible to choose which way to
resolve thing: from 
the requesting or from the fulfilling side. The numbers
differ a lot in some 
cases to the advantage of the non obvious (obsoletes, file
requires, 
obsolete conflicts)

I kept the single points short to avoid too much text. Feel
free to ask 
about details.


Have fun

	Florian
_______________________________________________
Yum-devel mailing list
Yum-devellinux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel

Re: Sqlite performance + tuning
country flaguser name
United States
2007-03-21 02:37:59
On Mon, 2007-03-19 at 17:29 +0100, Florian Festi wrote:
> Hi!
> 
> As yum now also resolves only on the xml/sqlite data
some of our results 
> might interesting for the yum development, too.
> 
> It turned out that there where no indices in the sqlite
database for some of 
> the common use cases:
> 
> returnPrco() in sqlitesack.py:
> CREATE INDEX pkgprovides ON provides (pkgKey)
> CREATE INDEX pkgrequires ON requires (pkgKey)
> CREATE INDEX pkgconflicts ON conflicts (pkgKey)
> CREATE INDEX pkgobsoletes ON obsoletes (pkgKey)
> 
> Those could be handy if the
YumSqlitePackageSack.searchFiles() would try to 
> make direct use of those columns:
> CREATE INDEX filenames ON files (name)
> CREATE INDEX dirnames ON filelist (dirname)
> 
> I doubt that this will give any visible improvements
with the current 
> depsolver as it is currently hidden by other problems
but as the other 
> problems will get fixed the difference will increase.
With our own resolver 
> the time needed for resolving an "install *"
on FC6 + extras + updates (6580 
> packages) dropped from 10 to 3 Minutes.
> 
> As there are plans to directly ship the sqlite dbs
insted of the XML 
> metadata, it might be worth checking if it is better to
create the indices 
> on the client side (sorry, didn't any checks on myself
yet).
> 
> It also turned out that if any quadratic behavior and
looping over files is 
> avoided resolving is reasonably fast.

How much bigger do these indexes make the sqlite db files?

-sv


_______________________________________________
Yum-devel mailing list
Yum-devellinux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel

Re: Sqlite performance + tuning
user name
2007-03-21 03:26:14
seth vidal wrote:
> On Mon, 2007-03-19 at 17:29 +0100, Florian Festi
wrote:
>> Hi!
>>
>> As yum now also resolves only on the xml/sqlite
data some of our results 
>> might interesting for the yum development, too.
>>
>> It turned out that there where no indices in the
sqlite database for some of 
>> the common use cases:
>>
>> returnPrco() in sqlitesack.py:
>> CREATE INDEX pkgprovides ON provides (pkgKey)
>> CREATE INDEX pkgrequires ON requires (pkgKey)
>> CREATE INDEX pkgconflicts ON conflicts (pkgKey)
>> CREATE INDEX pkgobsoletes ON obsoletes (pkgKey)
>>
>> Those could be handy if the
YumSqlitePackageSack.searchFiles() would try to 
>> make direct use of those columns:
>> CREATE INDEX filenames ON files (name)
>> CREATE INDEX dirnames ON filelist (dirname)
>>
>> I doubt that this will give any visible
improvements with the current 
>> depsolver as it is currently hidden by other
problems but as the other 
>> problems will get fixed the difference will
increase. With our own resolver 
>> the time needed for resolving an "install
*" on FC6 + extras + updates (6580 
>> packages) dropped from 10 to 3 Minutes.
>>
>> As there are plans to directly ship the sqlite dbs
insted of the XML 
>> metadata, it might be worth checking if it is
better to create the indices 
>> on the client side (sorry, didn't any checks on
myself yet).
>>
>> It also turned out that if any quadratic behavior
and looping over files is 
>> avoided resolving is reasonably fast.
> 
> How much bigger do these indexes make the sqlite db
files?

FC6 primary.xml.gz.sqlite
-rw-r--r-- 1 ffesti ffesti  6289408 Mar 21 09:18 core
-rw-r--r-- 1 ffesti ffesti  7028736 Mar 21 09:17 core.index
-rw-r--r-- 1 ffesti ffesti  9871360 Mar 21 09:18 extras
-rw-r--r-- 1 ffesti ffesti 11011072 Mar 21 09:17
extras.index
-rw-r--r-- 1 ffesti ffesti  4164608 Mar 21 09:18 updates
-rw-r--r-- 1 ffesti ffesti  4686848 Mar 21 09:17
updates.index

-rw-r--r-- 1 ffesti ffesti 1441465 Mar 21 09:18 core.bz2
-rw-r--r-- 1 ffesti ffesti 1577672 Mar 21 09:17
core.index.bz2
-rw-r--r-- 1 ffesti ffesti 2178462 Mar 21 09:18 extras.bz2
-rw-r--r-- 1 ffesti ffesti 2393192 Mar 21 09:17
extras.index.bz2
-rw-r--r-- 1 ffesti ffesti  887487 Mar 21 09:18 updates.bz2
-rw-r--r-- 1 ffesti ffesti  985659 Mar 21 09:17
updates.index.bz2

So quite exactly 10% bigger. But we can create the indices
on the client 
side if we want to save the bandwidth. Creating the indices
shouldn't be 
that expensive.

Florian
_______________________________________________
Yum-devel mailing list
Yum-devellinux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel

Re: Sqlite performance + tuning
country flaguser name
United States
2007-03-21 09:34:56
On Wed, 2007-03-21 at 09:26 +0100, Florian Festi wrote:
> seth vidal wrote:
> > On Mon, 2007-03-19 at 17:29 +0100, Florian Festi
wrote:
> >> Hi!
> >>
> >> As yum now also resolves only on the
xml/sqlite data some of our results 
> >> might interesting for the yum development,
too.
> >>
> >> It turned out that there where no indices in
the sqlite database for some of 
> >> the common use cases:
> >>
> >> returnPrco() in sqlitesack.py:
> >> CREATE INDEX pkgprovides ON provides (pkgKey)
> >> CREATE INDEX pkgrequires ON requires (pkgKey)
> >> CREATE INDEX pkgconflicts ON conflicts
(pkgKey)
> >> CREATE INDEX pkgobsoletes ON obsoletes
(pkgKey)
> >>
> >> Those could be handy if the
YumSqlitePackageSack.searchFiles() would try to 
> >> make direct use of those columns:
> >> CREATE INDEX filenames ON files (name)
> >> CREATE INDEX dirnames ON filelist (dirname)
> >>
> >> I doubt that this will give any visible
improvements with the current 
> >> depsolver as it is currently hidden by other
problems but as the other 
> >> problems will get fixed the difference will
increase. With our own resolver 
> >> the time needed for resolving an "install
*" on FC6 + extras + updates (6580 
> >> packages) dropped from 10 to 3 Minutes.
> >>
> >> As there are plans to directly ship the sqlite
dbs insted of the XML 
> >> metadata, it might be worth checking if it is
better to create the indices 
> >> on the client side (sorry, didn't any checks
on myself yet).
> >>
> >> It also turned out that if any quadratic
behavior and looping over files is 
> >> avoided resolving is reasonably fast.
> > 
> > How much bigger do these indexes make the sqlite
db files?
> 
> FC6 primary.xml.gz.sqlite
> -rw-r--r-- 1 ffesti ffesti  6289408 Mar 21 09:18 core
> -rw-r--r-- 1 ffesti ffesti  7028736 Mar 21 09:17
core.index
> -rw-r--r-- 1 ffesti ffesti  9871360 Mar 21 09:18
extras
> -rw-r--r-- 1 ffesti ffesti 11011072 Mar 21 09:17
extras.index
> -rw-r--r-- 1 ffesti ffesti  4164608 Mar 21 09:18
updates
> -rw-r--r-- 1 ffesti ffesti  4686848 Mar 21 09:17
updates.index
> 
> -rw-r--r-- 1 ffesti ffesti 1441465 Mar 21 09:18
core.bz2
> -rw-r--r-- 1 ffesti ffesti 1577672 Mar 21 09:17
core.index.bz2
> -rw-r--r-- 1 ffesti ffesti 2178462 Mar 21 09:18
extras.bz2
> -rw-r--r-- 1 ffesti ffesti 2393192 Mar 21 09:17
extras.index.bz2
> -rw-r--r-- 1 ffesti ffesti  887487 Mar 21 09:18
updates.bz2
> -rw-r--r-- 1 ffesti ffesti  985659 Mar 21 09:17
updates.index.bz2
> 
> So quite exactly 10% bigger. But we can create the
indices on the client 
> side if we want to save the bandwidth. Creating the
indices shouldn't be 
> that expensive.

If you have some example code to make the indexes on the fly
I'll give
it a run on the XO and see how bad it really ends up being.

-sv


_______________________________________________
Yum-devel mailing list
Yum-devellinux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel

Re: Sqlite performance + tuning
country flaguser name
United States
2007-03-21 23:14:58
On Wed, 2007-03-21 at 15:39 +0100, Florian Festi wrote:
> > If you have some example code to make the indexes
on the fly I'll give
> > it a run on the XO and see how bad it really ends
up being.
> 
> Just execute the following SQL statements (terminate
with ";" when in the 
> sqlite3 shell)
> 
> CREATE INDEX IF NOT EXISTS pkgprovides ON provides
(pkgKey)
> CREATE INDEX IF NOT EXISTS pkgrequires ON requires
(pkgKey)
> CREATE INDEX IF NOT EXISTS pkgconflicts ON conflicts
(pkgKey)
> CREATE INDEX IF NOT EXISTS pkgobsoletes ON obsoletes
(pkgKey)
> 

it's about 15s for fc6 core-only on the XO - a bit long.
Moreover it
creates a problem if we make it on the client in that the
sqlite db will
no longer match the checksum in the repomd.xml. So we'll be
downloading
metadata that isn't, otherwise, needing it. Not much fun,
that.

So if we're going to get the advantage from the indexes we
need to do
them on the repo side. The only decision now is if the
additional speed
is worth the tradeoff of more to download.

-sv


_______________________________________________
Yum-devel mailing list
Yum-devellinux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel

Re: Sqlite performance + tuning
country flaguser name
United States
2007-03-23 02:19:41
On Thu, 22 Mar 2007, seth vidal wrote:

> On Wed, 2007-03-21 at 15:39 +0100, Florian Festi
wrote:
>>> If you have some example code to make the
indexes on the fly I'll give
>>> it a run on the XO and see how bad it really
ends up being.
>>
>> Just execute the following SQL statements
(terminate with ";" when in the
>> sqlite3 shell)
>>
>> CREATE INDEX IF NOT EXISTS pkgprovides ON provides
(pkgKey)
>> CREATE INDEX IF NOT EXISTS pkgrequires ON requires
(pkgKey)
>> CREATE INDEX IF NOT EXISTS pkgconflicts ON
conflicts (pkgKey)
>> CREATE INDEX IF NOT EXISTS pkgobsoletes ON
obsoletes (pkgKey)
>>
>
> it's about 15s for fc6 core-only on the XO - a bit
long. Moreover it
> creates a problem if we make it on the client in that
the sqlite db will
> no longer match the checksum in the repomd.xml. So
we'll be downloading
> metadata that isn't, otherwise, needing it. Not much
fun, that.
>
> So if we're going to get the advantage from the indexes
we need to do
> them on the repo side. The only decision now is if the
additional speed
> is worth the tradeoff of more to download.

Well, for current apt-rpm implementation of the sqlite
repodata, the 
difference for internal cache generation for FE development
repo on my 
laptop is 15 minutes vs 15 seconds.

If those indexes aren't there, to get to a decent
performance level I'd 
need to read all of PRCO data for a repo into memory at a
time.

Anyway, just a datapoint that for different usage patterns
the time 
difference can be *huge*.

 	- Panu -
_______________________________________________
Yum-devel mailing list
Yum-devellinux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel

Re: Sqlite performance + tuning
country flaguser name
United States
2007-03-23 02:26:21
On Fri, 2007-03-23 at 09:19 +0200, Panu Matilainen wrote:
> Well, for current apt-rpm implementation of the sqlite
repodata, the 
> difference for internal cache generation for FE
development repo on my 
> laptop is 15 minutes vs 15 seconds.
> 
> If those indexes aren't there, to get to a decent
performance level I'd 
> need to read all of PRCO data for a repo into memory at
a time.
> 
> Anyway, just a datapoint that for different usage
patterns the time 
> difference can be *huge*.


a useful one, though. Thanks,
-sv


_______________________________________________
Yum-devel mailing list
Yum-devellinux.duke.edu
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel

[1-7]

about | contact  Other archives ( Real Estate discussion Medical topics )