List Info

Thread: Faster purging of SQL logging database




Faster purging of SQL logging database
country flaguser name
Slovenia
2007-06-15 05:44:25
The README.sql-pg in amavisd-new distribution suggests
the following SQL clauses to purge old records from a
database:

DELETE FROM msgs WHERE time_iso < now() - INTERVAL '3
weeks';
DELETE FROM msgs WHERE time_iso < now() - INTERVAL '1 h'
AND content IS NULL;

DELETE FROM maddr
  WHERE NOT EXISTS (SELECT 1 FROM msgs    WHERE sid=id)
    AND NOT EXISTS (SELECT 1 FROM msgrcpt WHERE rid=id);

The 'DELETE FROM maddr' is the worst, and I spent some time
trying to optimize it. This is what I came up with, it
works
faster by a factor of 1.5 to 2.

DELETE FROM maddr WHERE id IN (
  SELECT id FROM maddr LEFT JOIN (
    SELECT sid AS id, 1 AS f FROM msgs UNION ALL
    SELECT rid AS id, 1 AS f FROM msgrcpt
  ) AS u USING(id) WHERE u.f IS NULL);

I'd be interested in your experience before I update
the documentation.

And I never tried it on MySQL - does it work there
and if it does, is it any faster?

  Mark

------------------------------------------------------------
-------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and
take
control of your XML. No limits. Just data. Click to get it
now.
http://sourcefor
ge.net/powerbar/db2/
_______________________________________________
AMaViS-user mailing list
AMaViS-userlists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user

AMaViS-FAQ:http://www.amav
is.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/ho
wto/

Re: Faster purging of SQL logging database
user name
2007-06-26 08:05:08
On 6/15/07, Mark Martinec <Mark.Martinec+amavisijs.si> wrote:
> The README.sql-pg in amavisd-new distribution suggests
> the following SQL clauses to purge old records from a
database:
>
> DELETE FROM msgs WHERE time_iso < now() - INTERVAL
'3 weeks';

The script that does this on my amavis-setup had a
convenient bug in
it so that it refused to run, leading to no deletion,
leading to a
database larger than 4 gigabytes, leading to the above
taking days. (I
only let it run for 48 hours though before I started looking
for more
efficient methods.)

The problem is the foreign key constraints on msgrcpt.

I found a solution:
* put/have a proxy in front of the amavis-server to queue
mail
* stop postfix+amavis on the amavis-server
* rename msgrcpt to msgrcpt_old (alter table...)
* record the grants and foreign key constraints somewhere,
for
instance via  pg_dump(1)
* make a new msgrcpt that lacks all foreign key constraints
* make a temporary table of all "new" mail_id in
msgs (flip the comparison)
* copy from msgrcpt_old all fields with new mail_ids (insert
with
inner join with temp-table)
* drop msgrcpt_old (and suddenly the db was two gigs
smaller...)
* THEN delete old msgs with the standard method
* add foreign keys to the "new" msgrcpt
* fix the grants
* turn on amavis and postfix

Voila, takes less than 30 minutes, including typing/pasting,
on 3.4
million rows in msgs/4.2 million rows in msgrcpt which
should be a
small enough delay for most uses.

Also, if on a 32-bit linux system, set SHMALL (sysctl(1)) to
the
amount of memory on the server and SHMMAX to at least a
gigabyte
provided you have more than a gig on the system.

> DELETE FROM msgs WHERE time_iso < now() - INTERVAL
'1 h' AND content IS NULL;

This was only a few hundred messages so with an index on
time_iso this
was quick.

> DELETE FROM maddr WHERE id IN (
>   SELECT id FROM maddr LEFT JOIN (
>     SELECT sid AS id, 1 AS f FROM msgs UNION ALL
>     SELECT rid AS id, 1 AS f FROM msgrcpt
>   ) AS u USING(id) WHERE u.f IS NULL);
>
> I'd be interested in your experience before I update
> the documentation.

DELETEs are expensive but SELECTs are quick. The cost
(through EXPLAIN
<query>) of the above for my data (after getting rid
of old messages)
was "cost=1623113.61..1765429.58 rows=1664586
width=6".

So instead, I used temporary tables to get a list of
maddr.ids to
delete, and the cost for the final delete was
"cost=86070.32..177104.67 rows=1247620 width=6",
an order of magnitude
less.

As it says in the postgres-docs, what is speedy for a small
db might
not be speedy for a large one. It seems that for postgres,
using
temporary tables can pay in a big way.


HM

------------------------------------------------------------
-------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and
take
control of your XML. No limits. Just data. Click to get it
now.
http://sourcefor
ge.net/powerbar/db2/
_______________________________________________
AMaViS-user mailing list
AMaViS-userlists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user

AMaViS-FAQ:http://www.amav
is.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/ho
wto/

Re: Faster purging of SQL logging database
country flaguser name
Slovenia
2007-06-26 20:28:06
HM,

> The problem is the foreign key constraints on msgrcpt.
> I found a solution:
...
> Voila, takes less than 30 minutes, including
typing/pasting, on 3.4
> million rows in msgs/4.2 million rows in msgrcpt which
should be a
> small enough delay for most uses.

Thanks, useful for purging majority of records.

Perhaps using old style deletion when ON DELETE CASCADE
constraint
was not yet suggested could give similar results
(README.sql-mysql):

--  (optional) just in case the ON DELETE CASCADE did not do
its job, we may
--  explicitly delete orphaned records (with no
corresponding msgs entry);
--  if ON DELETE CASCADE did work, there should be no
deletions at this step
DELETE FROM quarantine
  WHERE NOT EXISTS (SELECT 1 FROM msgs WHERE
mail_id=quarantine.mail_id);
DELETE FROM msgrcpt
  WHERE NOT EXISTS (SELECT 1 FROM msgs WHERE
mail_id=msgrcpt.mail_id);


> > DELETE FROM maddr WHERE id IN (
> >   SELECT id FROM maddr LEFT JOIN (
> >     SELECT sid AS id, 1 AS f FROM msgs UNION ALL
> >     SELECT rid AS id, 1 AS f FROM msgrcpt
> >   ) AS u USING(id) WHERE u.f IS NULL);
> >
> > I'd be interested in your experience before I
update
> > the documentation.
>
> DELETEs are expensive but SELECTs are quick. The cost
(through EXPLAIN
> <query>) of the above for my data (after getting
rid of old messages)
> was "cost=1623113.61..1765429.58 rows=1664586
width=6".
>
> So instead, I used temporary tables to get a list of
maddr.ids to
> delete, and the cost for the final delete was
> "cost=86070.32..177104.67 rows=1247620
width=6", an order of magnitude
> less.
>
> As it says in the postgres-docs, what is speedy for a
small db might
> not be speedy for a large one. It seems that for
postgres, using
> temporary tables can pay in a big way.

I don't know. Trying the CREATE TEMPORARY TABLE AS, followed
by a DELETE
took about the same time as combining both in the same
statement.
Building a table of maddr id-s to be purged took majority of
time
(on frequent deletes where deletion does not remove lots of
records):

CREATE TEMPORARY TABLE unused AS (
  SELECT id FROM maddr LEFT JOIN (
    SELECT sid AS id, 1 AS f FROM msgs UNION ALL
    SELECT rid AS id, 1 AS f FROM msgrcpt)
  AS u USING(id) WHERE u.f IS NULL);

DELETE FROM maddr WHERE id IN (SELECT * FROM unused);

Perhaps you had something else in mind?

Btw, both statements should probably be combined in a
transaction
if amavisd is left running during purging.

  Mark



------------------------------------------------------------
-------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and
take
control of your XML. No limits. Just data. Click to get it
now.
http://sourcefor
ge.net/powerbar/db2/
_______________________________________________
AMaViS-user mailing list
AMaViS-userlists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user

AMaViS-FAQ:http://www.amav
is.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/ho
wto/

Re: Faster purging of SQL logging database
user name
2007-06-27 02:47:27
On 6/27/07, Hanne Moa <hanne.moagmail.com> wrote:
> On 6/27/07, Mark Martinec <Mark.Martinec+amavisijs.si> wrote:
> > CREATE TEMPORARY TABLE unused AS (
> >   SELECT id FROM maddr LEFT JOIN (
> >     SELECT sid AS id, 1 AS f FROM msgs UNION ALL
> >     SELECT rid AS id, 1 AS f FROM msgrcpt)
> >   AS u USING(id) WHERE u.f IS NULL);
> >
> > DELETE FROM maddr WHERE id IN (SELECT * FROM
unused);
> >
> > Perhaps you had something else in mind?
>
> I had this in mind, yes (well, a little simpler). This
still doesn't
> work very well while FOREIGN KEYS exist on msgrcpt.

I wound up dropping the FOREIGN KEYS linking msgrcpt to
maddr, and the
delete took seconds...

Is it guaranteed that, in msgrcpt, the pair mail_id+rid will
be
unique? In that case we have a candidate for PRIMARY KEY.
The indexes
made might speed things up, I'll experiment a little.


HM

------------------------------------------------------------
-------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and
take
control of your XML. No limits. Just data. Click to get it
now.
http://sourcefor
ge.net/powerbar/db2/
_______________________________________________
AMaViS-user mailing list
AMaViS-userlists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user

AMaViS-FAQ:http://www.amav
is.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/ho
wto/

[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )