|
List Info
Thread: jobs, bric_queued, bric_dist_mon, publishing times exceedingly long.
|
|
| jobs, bric_queued, bric_dist_mon,
publishing times exceedingly long. |

|
2007-09-12 15:39:35 |
Hi all,
Our bric system for ET (1.10.2) has been running a very long
time (data
going back to 2002 or 2003). At the moment, we're
experiencing unreasonably
long publish times for objects whether they have templates
or not. Our
systems are not overloaded at all, we have no problems
accessing the publish
location (by ftp) quickly...
we have QUEUE_PUBLISH_JOBS set to 0 and are using
bric_dist_mon on the cron
every minute.
In the past I've found that the jobs table would become
filled with "stuck
jobs" and would just empty the table. I suspect this
has caused some
mismatch between the jobs table and the job_member,
job__resource, and
job__server_type tables. How can I reconcile this? I don't
need any
historical information about past jobs, but I don't want to
damage anything
(else) and I also suspect that the member table is
involved.
could these issues with the jobs table be the reason for the
extreme publish
time? even publishing a single media object takes upward of
6 minutes.
I would like to get the system working using bric_queued,
but I tried
switching the conf directive and running bric_queued from
the cron instead
of bric_dist_mon... but nothing was happening - nothing was
getting
published. I think until I get the publish times sped up /
back to normal I
don't want to leave it up to bric_queued.
any advise is greatly appreciated.
John Durkin
|
|
| Re: jobs, bric_queued, bric_dist_mon,
publishing times exceedingly long. |

|
2007-09-12 20:06:17 |
I've now gotten bric_queued running smoothly on the other
system (for
www.theinsideronline.com) - I set its cron job to run every
five minutes and
set the delay to 60 sec. (the -d option for setting how long
to wait after
finding an empty queue before repolling).
now my question is - with bric_queued - will i run into
trouble if an
instance of bric_queued gets started at 9:00 and is still
busy at 9:05 when
another one gets started? or - does it notice that the one
which started
five minutes prior is still busy and exit?
what are typical cron intervals for other people? my
publish jobs take a
very long time sometimes because our "articles"
use keywords as indicators
of the article's presence as a post on a keyword-tag page,
so when a new
article is made with a given keyword, (say, "britney
spears"), then in
addition to the article page getting published, the britney
spears tag page
will also be published, as well as any other tag pages that
the article is
associated with, and also the home page (which is a
collection of
chronologically ordered "posts" - like a blog.)
our systems are heavily used by 5-15 people simultaneously
publishing
articles, media, etc.
best,
jd
On 9/12/07, John Durkin <john.durkin gmail.com> wrote:
>
> we emptied the other job_* tables and then vacuumed
each one and also
> vacuumed the member table. now our publishing times
are back to normal and
> we're going to set up bric_queued. i realized we did
not need bric_dist_mon
> running as we no longer schedule advanced publish
jobs... all our publishing
> is done in real time as a user requests it, so I took
that out of the cron.
>
> when running bric_queued - it suggests a "fairly
aggressive" cronjob. is
> once per two minutes "fairly aggressive" -
and - do I need to specify the
> pid file or will it do its own without using that
flag?
>
> thanks!
>
> JD
> On 9/12/07, Beaudet, David P. <D-Beaudet nga.gov> wrote:
> >
> >
> > Have you tried vacuuming / analyzing your
database?
> >
> > What does your postmaster v.s. httpd process
utilization look like
> > during a publish job? Also, check your memory and
swap utilization --
> > if your system is underpowered in terms of memory,
swap can kill
> > performance without spiking the CPU.
> >
> > Also, I wouldn't run the bric_dist_mon included
with 1.10.2 from CRON
> > with any frequency... there's no check to see
whether another
> > bric_dist_mon is currently running, so you run the
risk of many
> > bric_dist_mons running simultaneously, adding more
and more publication
> > jobs, further hampering performance. I have a
patched bric_dist_mon
> > that writes out a PID / lock file to behave more
graciously when running
> >
> > from CRON and can send it to you if it turns out
to be the issue.
> >
> >
> > -----Original Message-----
> > From: John Durkin [mailto:john.durkin gmail.com]
> > Sent: Wednesday, September 12, 2007 4:40 PM
> > To: Bricolage Developers; Bricolage Users
> > Subject: jobs, bric_queued, bric_dist_mon,
publishing times exceedingly
> > long.
> >
> > Hi all,
> >
> > Our bric system for ET (1.10.2) has been running a
very long time (data
> > going back to 2002 or 2003). At the moment, we're
experiencing
> > unreasonably
> > long publish times for objects whether they have
templates or not. Our
> > systems are not overloaded at all, we have no
problems accessing the
> > publish
> > location (by ftp) quickly...
> >
> > we have QUEUE_PUBLISH_JOBS set to 0 and are using
bric_dist_mon on the
> > cron
> > every minute.
> >
> > In the past I've found that the jobs table would
become filled with
> > "stuck
> > jobs" and would just empty the table. I
suspect this has caused some
> > mismatch between the jobs table and the
job_member, job__resource, and
> > job__server_type tables. How can I reconcile
this? I don't need any
> > historical information about past jobs, but I
don't want to damage
> > anything
> > (else) and I also suspect that the member table is
involved.
> >
> > could these issues with the jobs table be the
reason for the extreme
> > publish
> > time? even publishing a single media object takes
upward of 6 minutes.
> >
> > I would like to get the system working using
bric_queued, but I tried
> > switching the conf directive and running
bric_queued from the cron
> > instead
> > of bric_dist_mon... but nothing was happening -
nothing was getting
> > published. I think until I get the publish times
sped up / back to
> > normal I
> > don't want to leave it up to bric_queued.
> >
> > any advise is greatly appreciated.
> >
> > John Durkin
> >
>
>
|
|
| RE: jobs, bric_queued, bric_dist_mon,
publishing times exceedingly long. |

|
2007-09-12 22:21:34 |
You'll benefit from the modifications that David Wheeler
wrote that I've been testing and tweaking for a few months.
It prevents unnecessary burns by eliminating duplicate burn
requests.
We're about to start our Bric pilot tomorrow after about 9
months of off-again / on-again work, so hopefully after we
go live I'll have some time to commit that stuff to trunk so
it can be included in the next major release.
I use a cron interval of 1 minute with bric_dist_mon, but
that's with a modification to detect whether it's already
running. I haven't used bric_queued, but a look at the code
suggests that you don't have to worry about overlapping
processes so long as you're running it as a daemon. If you
run it in single job mode, then you could run into that
problem -- (the single job mode would benefit from a PID /
lock file to prevent this). You'll also have to ensure that
the parent process is restarted if it dies (shell script
perhaps).
I don't believe you should run bric_queued from cron if
you're running it in daemon mode because you could end up
with a bunch of daemons running by tomorrow morning -- there
is some code for checking the writeability of a PID file (in
daemon mode only), but I don't think that's going to prevent
another daemon from starting via cron since the write_pid()
closes the PID file after storing the PID...
-----Original Message-----
From: John Durkin [mailto:john.durkin gmail.com]
Sent: Wed 9/12/2007 9:06 PM
To: devel lists.bricolage.cc; Bricolage Users
Subject: Re: jobs, bric_queued, bric_dist_mon, publishing
times exceedingly long.
I've now gotten bric_queued running smoothly on the other
system (for
www.theinsideronline.com) - I set its cron job to run every
five minutes and
set the delay to 60 sec. (the -d option for setting how long
to wait after
finding an empty queue before repolling).
now my question is - with bric_queued - will i run into
trouble if an
instance of bric_queued gets started at 9:00 and is still
busy at 9:05 when
another one gets started? or - does it notice that the one
which started
five minutes prior is still busy and exit?
what are typical cron intervals for other people? my
publish jobs take a
very long time sometimes because our "articles"
use keywords as indicators
of the article's presence as a post on a keyword-tag page,
so when a new
article is made with a given keyword, (say, "britney
spears"), then in
addition to the article page getting published, the britney
spears tag page
will also be published, as well as any other tag pages that
the article is
associated with, and also the home page (which is a
collection of
chronologically ordered "posts" - like a blog.)
our systems are heavily used by 5-15 people simultaneously
publishing
articles, media, etc.
best,
jd
On 9/12/07, John Durkin <john.durkin gmail.com> wrote:
>
> we emptied the other job_* tables and then vacuumed
each one and also
> vacuumed the member table. now our publishing times
are back to normal and
> we're going to set up bric_queued. i realized we did
not need bric_dist_mon
> running as we no longer schedule advanced publish
jobs... all our publishing
> is done in real time as a user requests it, so I took
that out of the cron.
>
> when running bric_queued - it suggests a "fairly
aggressive" cronjob. is
> once per two minutes "fairly aggressive" -
and - do I need to specify the
> pid file or will it do its own without using that
flag?
>
> thanks!
>
> JD
> On 9/12/07, Beaudet, David P. <D-Beaudet nga.gov> wrote:
> >
> >
> > Have you tried vacuuming / analyzing your
database?
> >
> > What does your postmaster v.s. httpd process
utilization look like
> > during a publish job? Also, check your memory and
swap utilization --
> > if your system is underpowered in terms of memory,
swap can kill
> > performance without spiking the CPU.
> >
> > Also, I wouldn't run the bric_dist_mon included
with 1.10.2 from CRON
> > with any frequency... there's no check to see
whether another
> > bric_dist_mon is currently running, so you run the
risk of many
> > bric_dist_mons running simultaneously, adding more
and more publication
> > jobs, further hampering performance. I have a
patched bric_dist_mon
> > that writes out a PID / lock file to behave more
graciously when running
> >
> > from CRON and can send it to you if it turns out
to be the issue.
> >
> >
> > -----Original Message-----
> > From: John Durkin [mailto:john.durkin gmail.com]
> > Sent: Wednesday, September 12, 2007 4:40 PM
> > To: Bricolage Developers; Bricolage Users
> > Subject: jobs, bric_queued, bric_dist_mon,
publishing times exceedingly
> > long.
> >
> > Hi all,
> >
> > Our bric system for ET (1.10.2) has been running a
very long time (data
> > going back to 2002 or 2003). At the moment, we're
experiencing
> > unreasonably
> > long publish times for objects whether they have
templates or not. Our
> > systems are not overloaded at all, we have no
problems accessing the
> > publish
> > location (by ftp) quickly...
> >
> > we have QUEUE_PUBLISH_JOBS set to 0 and are using
bric_dist_mon on the
> > cron
> > every minute.
> >
> > In the past I've found that the jobs table would
become filled with
> > "stuck
> > jobs" and would just empty the table. I
suspect this has caused some
> > mismatch between the jobs table and the
job_member, job__resource, and
> > job__server_type tables. How can I reconcile
this? I don't need any
> > historical information about past jobs, but I
don't want to damage
> > anything
> > (else) and I also suspect that the member table is
involved.
> >
> > could these issues with the jobs table be the
reason for the extreme
> > publish
> > time? even publishing a single media object takes
upward of 6 minutes.
> >
> > I would like to get the system working using
bric_queued, but I tried
> > switching the conf directive and running
bric_queued from the cron
> > instead
> > of bric_dist_mon... but nothing was happening -
nothing was getting
> > published. I think until I get the publish times
sped up / back to
> > normal I
> > don't want to leave it up to bric_queued.
> >
> > any advise is greatly appreciated.
> >
> > John Durkin
> >
>
>
|
|
| Re: jobs, bric_queued, bric_dist_mon,
publishing times exceedingly long. |

|
2007-09-13 05:07:00 |
On Wed, 12 Sep 2007, John Durkin wrote:
> So, since I'd like to use bric_queued to eliminate the
long delays in the UI
> for users trying to publish items, I should try running
it as a daemon?
Yes
> I was confused by the documentation about this, because
in the same breath
> that it suggests running it as a daemon, it also says
to run it from a "very
> aggressive cron job"
Try perldoc bric_queued .
> so - to run it as a daemon means to just start it up
> command line once?
Yes
> - otherwise it would run as long as there are still
jobs
> to be done and then die ?
No
"In cases where the program finds no jobs in the queue
it will wait
a specified amount of time (defaulting to 30 seconds) and
then re-poll."
I'd like its current behavoir to be a little different.
I don't want any ponies, however.
Currently the job queue has two loops.
The outer loop is an infinite loop that sleeps.
The inner loop grabs ALL jobs scheduled before NOW
and executes them. So if you have 1000 jobs scheduled
for 14:30 today, when that time passes the queued will
go into the loop over 1000 items and not check back again
until that loop has completed. If that takes 5 hours,
it doesn't matter; once the loop has started, no other jobs
scheduled in the meantime will be published,
even if you (retroactively) schedule something
to publish before 14:30. (I'm not saying that's
"incorrect",
just I don't like it, and no I don't have a patch.
|
|
| RE: jobs, bric_queued, bric_dist_mon,
publishing times exceedingly long. |

|
2007-09-13 05:07:28 |
>So, since I'd like to use bric_queued to eliminate the
long delays in the UI
>for users trying to publish items, I should try running
it as a daemon? I
>was confused by the documentation about this, because in
the same breath
>that it suggests running it as a daemon, it also says to
run it from a "very
>aggressive cron job" - so - to run it as a daemon
means to just start it up
>command line once? from what i got out of the
documentation, it doesn't
>seem like it would "persist" for longer than
my -d value (if there are no
>jobs in the queue) - otherwise it would run as long as
there are still jobs
>to be done and then die ?
Oops, sorry. You're totally right (RTFM, right?). The
daemon exits after the job queue is empty, so it would
probably also suffer from the potential problem of
overlapping with itself if the CRON frequency is more
aggressive than the time it takes to empty the job queue --
With bric_dist_mon, I noticed this happening with a long
initial job queues or with job queues that grow
significantly as templates burn additional resources. Seems
like the same thing would happen with bric_queued.
So, without trying out bric_queued in my environment, I
cannot say with definitiveness that this problem would crop
up, but it appears possible. Not sure I agree with the
aggressive cron job statement in the documentation without a
mechanism to prevent two bric_queue processes from running
simultaneously.
There's a read_pid() method in bric_queue that doesn't
appear to be used anywhere -- I wonder if someone was
thinking of adding a lock functionality and never completed
it?
It might also make sense for the daemon to just not
terminate when its child distribution job has finished.
That way, you start it up as a daemon at boot time and it
just runs continuously in the background.
Does bric_queue run properly even when the Bricolage Apache
process is down?
|
|
| RE: jobs, bric_queued, bric_dist_mon,
publishing times exceedingly long. |

|
2007-09-13 05:14:50 |
>> - otherwise it would run as long as there are
still jobs
>> to be done and then die ?
>
>No
>"In cases where the program finds no jobs in the
queue it will wait
> a specified amount of time (defaulting to 30 seconds)
and then re-poll."
I obviously still haven't RTFM, but that apparently doesn't
help
Scott's right. The parent process (pub) only terminates if
it catches a termination signal... not by default. It's the
child process that terminates upon completion each time.
Is the documentation wrong?
|
|
| RE: jobs, bric_queued, bric_dist_mon,
publishing times exceedingly long. |

|
2007-09-13 05:21:40 |
On Thu, 13 Sep 2007, Beaudet, David P. wrote:
> Oops, sorry. You're totally right (RTFM, right?). The
daemon exits after the
> job queue is empty, so it would probably also suffer
from the potential
> problem of overlapping with itself if the CRON
frequency is more aggressive
> than the time it takes to empty the job queue
It's a daemon. bric_queueD. D for daemon. Server. Stays
running.
Unless you tell it to one-off execute jobs.
> Does bric_queue run properly even when the Bricolage
Apache process is down?
Yes, assuming the problem causing the httpd to be down
doesn't also affect the bric_queued.
|
|
| Re: jobs, bric_queued, bric_dist_mon,
publishing times exceedingly long. |

|
2007-09-13 05:09:27 |
On Thu, 13 Sep 2007, Steffen Schwigon wrote:
> "Beaudet, David P." <D-Beaudet NGA.GOV> writes:
>> so hopefully after we go live I'll have some time
to commit that
>> stuff to trunk so it can be included in the next
major release.
>
> I suggest to also apply it to rev_1_10, so that the
current branch
> participates in the evolution. Does this make sense to
the others?
IMO, it depends on the impact of the changes.
If it's big, I'd prefer it in trunk,
and maybe someone will be motivated
to finish off rev_1_10 so trunk can see the day. :}
|
|
| RE: jobs, bric_queued, bric_dist_mon,
publishing times exceedingly long. |

|
2007-09-13 05:34:30 |
>It's a daemon. bric_queueD. D for daemon. Server. Stays
running.
>Unless you tell it to one-off execute jobs.
Avast! The "D" should be capitalized or separated
by another _ then!
|
|
| RE: jobs, bric_queued, bric_dist_mon,
publishing times exceedingly long. |

|
2007-09-13 05:34:30 |
>It's a daemon. bric_queueD. D for daemon. Server. Stays
running.
>Unless you tell it to one-off execute jobs.
Avast! The "D" should be capitalized or separated
by another _ then!
|
|
|
|