List Info

Thread: Foreign Keys Necessary?




Foreign Keys Necessary?
country flaguser name
United States
2007-08-15 11:32:30
I've read all the back threads regarding purge performance
in the 
logging database, specifically in mysql.  It seems like a
still 
pending issue.  In our application where we are increasing
the rows 
in the msgs table by roughly 1million per day, purging the
database 
creates an *extreme* load.

In testing, I've attempted purge of the msgs table, and
under load 
I'm getting roughly 25records/second deleted.  Regardless
the method 
(individual record deletes, or as a single query).  I pull
the 
foreign keys and I can get 3000records/second deleted.

Is the only reason for foreign keys to keep the database
clean?  Can 
I do away with them and clean it up manually without causing
any problems?

I'm also curious, what type of stats do the pgsql people see
with 
foreign keys on?

Thanks,

- Nate


------------------------------------------------------------
-------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and
a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
AMaViS-user mailing list
AMaViS-userlists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user

AMaViS-FAQ:http://www.amav
is.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/ho
wto/

Re: Foreign Keys Necessary?
country flaguser name
Slovenia
2007-08-15 17:49:37
Nate,

> I've read all the back threads regarding purge
performance in the
> logging database, specifically in mysql.  It seems like
a still
> pending issue.  In our application where we are
increasing the rows
> in the msgs table by roughly 1million per day, purging
the database
> creates an *extreme* load.
>
> In testing, I've attempted purge of the msgs table, and
under load
> I'm getting roughly 25records/second deleted. 
Regardless the method
> (individual record deletes, or as a single query).  I
pull the
> foreign keys and I can get 3000records/second deleted.
>
> Is the only reason for foreign keys to keep the
database clean?  Can
> I do away with them and clean it up manually without
causing any problems?


A purpose of 'FOREIGN KEY ... ON DELETE RESTRICT' is to keep
a database
tidy, not allowing to remove a record that is still being
referenced.
It may be removed if desired.

A purpose of 'FOREIGN KEY ... ON DELETE CASCADE' is to let
deletion
remove dependent records automatically, along with a record
being deleted.
If ON DELETE CASCADE is removed, these other records need to
be removed
explicitly in a separate operation. If both deletions
together
take less time than a single deletion with an implied second
deletion,
then it would be worth removing ON DELETE CASCADE.

With my tests using PostgreSQL it was worth letting SQL do
a
cascading deletion on its own. Don't know about MySQL,
which
I abandoned for my amavisd SQL logging because purging of
old
records was so slow.

  Mark

------------------------------------------------------------
-------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and
a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
AMaViS-user mailing list
AMaViS-userlists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user

AMaViS-FAQ:http://www.amav
is.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/ho
wto/

Re: Foreign Keys Necessary?
country flaguser name
United States
2007-08-17 11:18:57
>>A purpose of 'FOREIGN KEY ... ON DELETE RESTRICT' is
to keep a database
>>tidy, not allowing to remove a record that is still
being referenced.
>>It may be removed if desired.
>>
>>A purpose of 'FOREIGN KEY ... ON DELETE CASCADE' is
to let deletion
>>remove dependent records automatically, along with a
record being deleted.
>>If ON DELETE CASCADE is removed, these other records
need to be removed
>>explicitly in a separate operation. If both
deletions together
>>take less time than a single deletion with an
implied second deletion,
>>then it would be worth removing ON DELETE CASCADE.
>>
>>With my tests using PostgreSQL it was worth letting
SQL do a
>>cascading deletion on its own. Don't know about
MySQL, which
>>I abandoned for my amavisd SQL logging because
purging of old
>>records was so slow.
>
>In MySQL it clearly makes a huge performance increase to
remove 
>them.  I went from over 3 hours with foreign keys to do
a purge down 
>to about 6 minutes without them and doing the queries
manually to 
>cleanup the database.  MySQL clearly is not as capable
as pgSQL when 
>it comes to this it sounds like.
>
>I'm curious if anyone else has found alternate ways to
tune mysql 
>for better performance in the purge.  Even at 6 minutes,
some of the 
>queries run such a load on the database that inserts and
updates get 
>held up so long that amavisd drops the connection and
has to retry 
>later.  It's not a big deal because postfix requeue's
the message in 
>about 15 minutes, but any tips would be happily
accepted.

Just for reference, the queries I'm doing to clean the
database now 
with foreign keys removed in mysql.  I believe this is
catching 
everything, and taking roughly 7 minutes per day, down from
3 hours 
when using foreign keys.

# clear msgs table for anything older than $days_quarantine
DELETE LOW_PRIORITY FROM msgs WHERE time_num <
UNIX_TIMESTAMP() - 
$days_quarantine*24*60*60

# delete anythign that's not quarantined longer than we keep
data for penpals
DELETE LOW_PRIORITY FROM msgs WHERE quar_type != 'Q' AND
(time_num < 
UNIX_TIMESTAMP() - $penpal_days*24*60*60)

# clean maddr table from orphaned records
DELETE LOW_PRIORITY FROM maddr WHERE NOT EXISTS (SELECT 1
FROM msgs 
WHERE sid=id) AND NOT EXISTS (SELECT 1 FROM msgrcpt WHERE
rid=id)

# clean msgrcpt table from orphaned records
DELETE LOW_PRIORITY FROM msgrcpt WHERE NOT EXISTS (SELECT 1
FROM msgs 
WHERE mail_id=msgrcpt.mail_id)

  # clean quarantine table from orphaned records
DELETE LOW_PRIORITY FROM quarantine WHERE NOT EXISTS (SELECT
1 FROM 
msgs WHERE mail_id=quarantine.mail_id)

- Nate



------------------------------------------------------------
-------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and
a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
AMaViS-user mailing list
AMaViS-userlists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user

AMaViS-FAQ:http://www.amav
is.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/ho
wto/

Re: Foreign Keys Necessary?
user name
2007-08-19 16:04:32
Hello,

On 8/17/07, Nate <nm_listvisp.net> wrote:
>
> Just for reference, the queray, down from 3 hours
> when using foreign keys.ies I'm doing to clean the
database now
> with foreign keys removed in mysql.  I believe this is
catching
> everything, and taking roughly 7 minutes per d
>
> # clear msgs table for anything older than
$days_quarantine
> DELETE LOW_PRIORITY FROM msgs WHERE time_num <
UNIX_TIMESTAMP() -
> $days_quarantine*24*60*60

Maybe get the value of
UNIX_TIMESTAMP() - $days_quarantine*24*60*60
assigned to a variable before executing the query to avoid
calculating
UNIX_TIMESTAMP() for each scanned row.

> # delete anythign that's not quarantined longer than we
keep data for penpals
> DELETE LOW_PRIORITY FROM msgs WHERE quar_type != 'Q'
AND (time_num <
> UNIX_TIMESTAMP() - $penpal_days*24*60*60)

!= is very slow operator, testing the nonexistance of a
value can lead
to full table scan in many cases. Change it to
IN (' ','F','Z','B','Q','M','L')
this should work much faster.

And apply the above note for the
UNIX_TIMESTAMP() - $penpal_days*24*60*60
expression.

> # clean maddr table from orphaned records
> DELETE LOW_PRIORITY FROM maddr WHERE NOT EXISTS (SELECT
1 FROM msgs
> WHERE sid=id) AND NOT EXISTS (SELECT 1 FROM msgrcpt
WHERE rid=id)

NOT EXISTS is another slow operator. Try to change the query
to:

DELETE maddr FROM maddr LEFT OUTER JOIN msgs ON sid=id LEFT
OUTER JOIN
msgrcpt ON rid=id WHERE sid IS NULL OR rid IS NULL;

This should execute much faster.

> # clean msgrcpt table from orphaned records
> DELETE LOW_PRIORITY FROM msgrcpt WHERE NOT EXISTS
(SELECT 1 FROM msgs
> WHERE mail_id=msgrcpt.mail_id)

Try to change to:
DELETE msgrcpt FROM msgrcpt LEFT OUTER JOIN msgs ON
msgrcpt.mail_id=msgs.mail_id WHERE msgs.mail_id IS NULL;

>   # clean quarantine table from orphaned records
> DELETE LOW_PRIORITY FROM quarantine WHERE NOT EXISTS
(SELECT 1 FROM
> msgs WHERE mail_id=quarantine.mail_id)

DELETE quarantine FROM quarantine LEFT OUTER JOIN msgs ON
quarantine.mail_id=msgs.mail_id WHERE msgs.mail_id IS NULL;

Regards,
Andrzej

------------------------------------------------------------
-------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and
a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
AMaViS-user mailing list
AMaViS-userlists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user

AMaViS-FAQ:http://www.amav
is.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/ho
wto/

Re: Foreign Keys Necessary?
user name
2007-08-20 01:29:35
On 8/19/07, Andrzej Kukula <akukulagmail.com> wrote:
> Hello,
>
> On 8/17/07, Nate <nm_listvisp.net> wrote:
> >
> > Just for reference, the queray, down from 3 hours
> > when using foreign keys.ies I'm doing to clean the
database now
> > with foreign keys removed in mysql.  I believe
this is catching
> > everything, and taking roughly 7 minutes per d
> >
> > # clear msgs table for anything older than
$days_quarantine
> > DELETE LOW_PRIORITY FROM msgs WHERE time_num <
UNIX_TIMESTAMP() -
> > $days_quarantine*24*60*60
>
> Maybe get the value of
> UNIX_TIMESTAMP() - $days_quarantine*24*60*60
> assigned to a variable before executing the query to
avoid calculating
> UNIX_TIMESTAMP() for each scanned row.
>
> > # delete anythign that's not quarantined longer
than we keep data for penpals
> > DELETE LOW_PRIORITY FROM msgs WHERE quar_type !=
'Q' AND (time_num <
> > UNIX_TIMESTAMP() - $penpal_days*24*60*60)
>
> != is very slow operator, testing the nonexistance of a
value can lead
> to full table scan in many cases. Change it to
> IN (' ','F','Z','B','Q','M','L')
> this should work much faster.

Sorry, without 'Q' of course...

> Regards,
> Andrzej

------------------------------------------------------------
-------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and
a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
AMaViS-user mailing list
AMaViS-userlists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user

AMaViS-FAQ:http://www.amav
is.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/ho
wto/

Re: Foreign Keys Necessary?
country flaguser name
Slovenia
2007-08-24 18:18:35
Nate,

Btw, versions of amavisd prior to 2.4.0 (which started
suggesting
the use of a foreign key) had the following example in
README.sql:


DELETE FROM msgs WHERE UNIX_TIMESTAMP()-time_num >
7*24*60*60;
DELETE FROM msgs WHERE UNIX_TIMESTAMP()-time_num > 60*60
AND content IS NULL;

DELETE quarantine FROM quarantine LEFT JOIN msgs
USING(mail_id)
  WHERE msgs.mail_id IS NULL;

DELETE msgrcpt    FROM msgrcpt    LEFT JOIN msgs
USING(mail_id)
  WHERE msgs.mail_id IS NULL;

DELETE FROM maddr
  WHERE NOT EXISTS (SELECT sid FROM msgs    WHERE sid=id)
    AND NOT EXISTS (SELECT rid FROM msgrcpt WHERE rid=id);


Don't know how it compares in performance to your
suggestion.


Also, you should swap the order of deletions:

> # clean maddr table from orphaned records
> DELETE LOW_PRIORITY FROM maddr WHERE NOT EXISTS (SELECT
1 FROM msgs
> WHERE sid=id) AND NOT EXISTS (SELECT 1 FROM msgrcpt
WHERE rid=id)
>
> # clean msgrcpt table from orphaned records
> DELETE LOW_PRIORITY FROM msgrcpt WHERE NOT EXISTS
(SELECT 1 FROM msgs
> WHERE mail_id=msgrcpt.mail_id)

The msgrcpt would better be cleaned prior to purging maddr,
as it
may release a need to keep unnecessary recipient addresses
around.

  Mark

------------------------------------------------------------
-------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and
a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
AMaViS-user mailing list
AMaViS-userlists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user

AMaViS-FAQ:http://www.amav
is.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/ho
wto/

Re: Foreign Keys Necessary?
country flaguser name
Slovenia
2007-08-28 12:44:47
Andrzej,

> > Btw, versions of amavisd prior to 2.4.0 (which
started suggesting
> > the use of a foreign key) had the following
example in README.sql:
> >
> > DELETE FROM msgs WHERE UNIX_TIMESTAMP()-time_num
> 7*24*60*60;
> > DELETE FROM msgs WHERE UNIX_TIMESTAMP()-time_num
> 60*60
> >   AND content IS NULL;

> Mark, that would surely prevent index on time_num from
being used, so
> this query would be the slowest one...

I was quoting the old README.sql.


The current release notes (2.5.2) suggests a swapped order:

  README.sql-mysql:
DELETE FROM msgs WHERE time_num <
UNIX_TIMESTAMP()-14*24*60*60;
DELETE FROM msgs WHERE time_num < UNIX_TIMESTAMP()-60*60
AND content IS NULL;

  README.sql-pg:
DELETE FROM msgs WHERE time_iso < now() - INTERVAL '3
weeks';
DELETE FROM msgs WHERE time_iso < now() - INTERVAL '1 h'
AND content IS NULL;

At least on PostgreSQL it does use an index on time_iso.

  Mark

------------------------------------------------------------
-------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and
a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
AMaViS-user mailing list
AMaViS-userlists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user

AMaViS-FAQ:http://www.amav
is.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/ho
wto/

Re: Foreign Keys Necessary?
user name
2007-08-28 15:37:33
Mark,

On 8/28/07, Mark Martinec <Mark.Martinec+amavisijs.si> wrote:
> Andrzej,
>
> > > Btw, versions of amavisd prior to 2.4.0
(which started suggesting
> > > the use of a foreign key) had the following
example in README.sql:
> > >
> > > DELETE FROM msgs WHERE
UNIX_TIMESTAMP()-time_num > 7*24*60*60;
> > > DELETE FROM msgs WHERE
UNIX_TIMESTAMP()-time_num > 60*60
> > >   AND content IS NULL;
>
> > Mark, that would surely prevent index on time_num
from being used, so
> > this query would be the slowest one...
>
> I was quoting the old README.sql.

Ok nothing wrong with it, just that OP wanted high
performance out of
the queries, because of huge tables. I looked at it and
found some
things we can optimize.

> The current release notes (2.5.2) suggests a swapped
order:
>
>   README.sql-mysql:
> DELETE FROM msgs WHERE time_num <
UNIX_TIMESTAMP()-14*24*60*60;
> DELETE FROM msgs WHERE time_num <
UNIX_TIMESTAMP()-60*60 AND content IS NULL;

For large tables I'd suggest one improvement:
SET min_time := UNIX_TIMESTAMP()-14*24*60*60;
DELETE FROM msgs WHERE time_num < mintime;

That would avoid calculating UNIX_TIMESTAMP() and the
subtraction for
every row in the table (thousands or millions times).

Of course there's also problem with deleting large batch of
records,
they first must be written to the transaction log (and maybe
to binary
log used for replication), what surely take much time; it's
better to
delete them in batches, say 1000 rows in a batch, to reduce
transaction log contention. This is pretty trivial when
there are no
joins and somewhat more sophisticated with joins; in both
cases it
needs a stored procedure; I'll write some tomorrow.

>   README.sql-pg:
> DELETE FROM msgs WHERE time_iso < now() - INTERVAL
'3 weeks';
> DELETE FROM msgs WHERE time_iso < now() - INTERVAL
'1 h' AND content IS NULL;
>
> At least on PostgreSQL it does use an index on
time_iso.

They, too, force SQL to calculate now() and the subtraction
for every
row in the table (see EXPLAIN VERBOSE), and also exploit
the
transaction log problem with large deletes, but I don't know
how to
optimize these queries in pure SQL as understood by PgSQL.

Regards,
Andrzej

------------------------------------------------------------
-------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and
a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
AMaViS-user mailing list
AMaViS-userlists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user

AMaViS-FAQ:http://www.amav
is.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/ho
wto/

Re: Foreign Keys Necessary?
country flaguser name
Slovenia
2007-08-28 17:52:07
Andrzej,

> > DELETE FROM msgs WHERE time_num <
UNIX_TIMESTAMP()-14*24*60*60;
> > DELETE FROM msgs WHERE time_num <
UNIX_TIMESTAMP()-60*60 AND content IS
> > NULL;
>
> For large tables I'd suggest one improvement:
> SET min_time := UNIX_TIMESTAMP()-14*24*60*60;
> DELETE FROM msgs WHERE time_num < mintime;
>
> That would avoid calculating UNIX_TIMESTAMP() and the
subtraction for
> every row in the table (thousands or millions times).

Did you try it and makes a difference, or is it just a
theory?

I'd think the constant subexpression would be factored out
of the loop
by an optimizer.

> >   README.sql-pg:
> > DELETE FROM msgs WHERE time_iso < now() -
INTERVAL '3 weeks';
> > DELETE FROM msgs WHERE time_iso < now() -
INTERVAL '1 h' AND content IS
> > NULL;
> >
> > At least on PostgreSQL it does use an index on
time_iso.
>
> They, too, force SQL to calculate now() and the
subtraction for every
> row in the table (see EXPLAIN VERBOSE), but I don't
know how to
> optimize these queries in pure SQL as understood by
PgSQL.

I don't think it makes any difference with PostgreSQL:


SELECT now() - INTERVAL '30 days';
2007-07-30 00:02:57


EXPLAIN DELETE FROM msgs WHERE time_iso < '2007-07-30';
                                       QUERY PLAN
------------------------------------------------------------
-----------------------------
 Bitmap Heap Scan on msgs  (cost=18068.35..127812.26
rows=548073 width=6)
   Recheck Cond: (time_iso < '2007-07-30
00:00:00+02'::timestamp with time zone)
   ->  Bitmap Index Scan on msgs_idx_time_iso 
(cost=0.00..17931.33 rows=548073 width=0)
         Index Cond: (time_iso < '2007-07-30
00:00:00+02'::timestamp with time zone)


EXPLAIN DELETE FROM msgs WHERE time_iso < now() -
INTERVAL '30 days';
                                       QUERY PLAN
------------------------------------------------------------
-----------------------------
 Bitmap Heap Scan on msgs  (cost=18073.10..130559.08
rows=548170 width=6)
   Recheck Cond: (time_iso < (now() - '30
days'::interval))
   ->  Bitmap Index Scan on msgs_idx_time_iso 
(cost=0.00..17936.06 rows=548170 width=0)
         Index Cond: (time_iso < (now() - '30
days'::interval))



(similar on actual deletions with EXPLAIN ANALYZE, even
without a foreign key constraint)

  Mark

------------------------------------------------------------
-------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and
a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
AMaViS-user mailing list
AMaViS-userlists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user

AMaViS-FAQ:http://www.amav
is.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/ho
wto/

Re: Foreign Keys Necessary?
country flaguser name
France
2007-08-29 17:54:48
Nate wrote:
> I've read all the back threads regarding purge
performance in the 
> logging database, specifically in mysql.  It seems like
a still 
> pending issue.  In our application where we are
increasing the rows 
> in the msgs table by roughly 1million per day, purging
the database 
> creates an *extreme* load.
> 
> In testing, I've attempted purge of the msgs table, and
under load 
> I'm getting roughly 25records/second deleted. 
Regardless the method 
> (individual record deletes, or as a single query).  I
pull the 
> foreign keys and I can get 3000records/second deleted.
> 
> Is the only reason for foreign keys to keep the
database clean?  Can 
> I do away with them and clean it up manually without
causing any problems?
> 
> I'm also curious, what type of stats do the pgsql
people see with 
> foreign keys on?


you can temporarily disable the foreign keys
(FOREIGN_KEY_CHECKS=0) 
before the purge, and restore them at the end.

it may be good to use temporary tables to store results
(either those 
that will be deleted, or those that will be kept).




------------------------------------------------------------
-------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and
a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
AMaViS-user mailing list
AMaViS-userlists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user

AMaViS-FAQ:http://www.amav
is.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/ho
wto/

[1-10] [11]

about | contact  Other archives ( Real Estate discussion Medical topics )