List Info

Thread: Kernel scheduler seems to be making mistakes




Kernel scheduler seems to be making mistakes
country flaguser name
United Kingdom
2007-05-20 09:11:52
This is turning into a bad month  :-(

I'm running BOINC clients on this box, and the kernel seems
unable to 
schedule them properly. I'm subscribed to several projects,
so I should 
have one on each CPU all the time, running at nice 19 and
therefore mopping 
up all available CPU cycles. That's how it used to run. But
nowadays the 
kernel scheduler insists on allocating both of them to the
same CPU, thus 
limiting them to 50% load. Occasionally it will start up
correctly, but 
only if I've started the BOINC client interactively rather
than from a 
startup script, but even if so it still reverts to its bad
behaviour after 
a while. I haven't been able so far to spot any particular
influence that 
might cause this reversion, and the time before it happens
is apparently 
random.

The box is a Supermicro H8DCE with 2 x Opteron 246 CPUs and
2 x 2GB RAM. 
This board divides the DIMM slots into two banks of four,
one bank next to 
each CPU and associated with it. I've tried various kernels
from 2.6.16-r13 
to 2.6.21-r1. I've tried unsetting all the clever-looking
optimisations in 
the kernel, I've tried all three scheduling algorithms and
I've tried 
resetting the BIOS to "optimised" defaults. I've
even tried a genkernel 
kernel with default config, but that version couldn't see
the root 
disk /dev/sda for some reason, and of course it wouldn't
boot.

It's also odd that CPU1 runs 5 - 6 C hotter than CPU0,
whether loaded or 
not.

Sometimes I suspect a problem with APIC or perhaps the
IOMMU, re which I 
have mostly default or conservative settings in the kernel.
Has anyone here 
some experience they could offer?

I've also been to the BOINC project sites and changed my
preferences to the 
most conservative I can find, but still I can't get proper
allocation of 
boinc clients to processors. I've tried the forums and got
some useful 
help, but not yet a solution.

This all started some time ago, about the time when I had to
replace the 
motherboard, but as I wasn't following it very closely at
the time I 
haven't been able to pinpoint the factor that caused the
change in kernel 
scheduling behaviour.

-- 
Rgds
Peter Humphrey
Linux Counter 5290, Aug 93
-- 
gentoo-amd64gentoo.org mailing list


Re: Re: Kernel scheduler seems to be making mistakes
user name
2007-05-20 16:35:12

On Sun, 20 May 2007, Duncan wrote:

> Date: Sun, 20 May 2007 16:14:45 +0000 (UTC)
> From: Duncan <1i5t5.duncancox.net>
> Reply-To: gentoo-amd64lists.gentoo.org
> To: gentoo-amd64lists.gentoo.org
> Subject: [gentoo-amd64]  Re: Kernel scheduler seems to
be making mistakes
> 
> Peter Humphrey <prhgotadsl.co.uk> posted
> 200705201511.52195.prhgotadsl.co.uk, excerpted
below, on  Sun, 20 May
> 2007 15:11:52 +0100:
> So, if you emerge schedutils, one of the binaries you
get is called
> taskset.  Once it's emerged, you can read the notes on
taskset in /usr/
> share/doc/schedutils-*/README.bz2, and/or the taskset
manpage.  It's
> pretty simple, to use, however.  For example, on a
two-core or two-CPU
> system (CPU0 and CPU1), setting an already running X to
run on CPU0 only,
> on CPU1 only, or on both, is done with the following
commands (the number
> being a CPU bitmask, obviously):
>
> taskset -p 1 `pidof X`
> taskset -p 2 `pidof X`
> taskset -p 3 `pidof X`

The problem with that is that the boinc client starts a new
thread for 
every work unit.

I've noticed the same problem and I think that it should be
up to boinc to 
set that up properly
-- 
gentoo-amd64gentoo.org mailing list


Re: Re: Kernel scheduler seems to be making mistakes
country flaguser name
United Kingdom
2007-05-21 03:10:13
On Sunday 20 May 2007 22:35:12 Nuitari wrote:

> I've noticed the same problem and I think that it
should be up to boinc
> to set that up properly

Thanks for verifying that it's not just me. Do you also use
a Supermicro 
H8DCE?

I did raise a bug report with BOINC a few weeks ago, after a
long search for 
their bug system, but it was closed with no action. When I
say "no action" 
I mean that I received no acknowledgement of the creation of
the report, 
nor of its change of status; and when the report was closed,
it was "moved 
to trac", whatever that is, with an http link to a page
from which I 
couldn't find a single trace of fault reports. Evidently my
bug report had 
been buried, and I see no sign of any recognition that they
might have a 
problem in their scheduling algorithm.

Readers with long memories may remember that I lost my
temper with them and 
asked on this list for ways of deleting my existence from
the BOINC 
project. I later relented and decided to have a go myself,
but this is as 
far as I've got.

-- 
Rgds
Peter Humphrey
Linux Counter 5290, Aug 93
-- 
gentoo-amd64gentoo.org mailing list


[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )