|
List Info
Thread: ULE vs. 4BSD in RELENG_7
|
|
| ULE vs. 4BSD in RELENG_7 |

|
2007-10-23 11:02:41 |
Hello,
I posted this to the stable mailing list, as I thought it
was
pertinent there, but I think it will get better attention
here. So I
apologize in advance for cross-posting if this is a faux
pas.
Anyway, in summary, ULE is about 5-6 % slower than 4BSD for
two
workloads that I am sensitive to: building world with -j X,
and ffmpeg
-threads X. Other benchmarks seem to indicate relatively
equal
performance between the two. MySQL, on the other hand, is
significantly faster in ULE.
I'm trying to understand why ffmpeg and buildworld are
slower in ULE
than 4BSD, since it seems to me that ULE was supposed to be
the better
scaling scheduler.
Here is a link to the original thread on the stable mailing
list:
http://lists.freebsd.org/pipermail/fre
ebsd-stable/2007-October/037379.html
Remy replied with some interesting results for building
world between
the two schedulers on an 8-way system. It seems that ULE
suffers as
more threads/processes are thrown at it, at least it appears
that way
from Remy's data.
Does anyone have any additional performance tests I can run
that might
help indicate where the deficiency is in the ULE scheduler?
MySQL
performance is excellent, so I'm wondering if it was tuned
to that
particular workload?
I'm not sure if Remy subscribes to this list, so I am CC'ing
him. Hope
you don't mind Remy
Regards,
Josh
_______________________________________________
freebsd-performance freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-p
erformance
To unsubscribe, send any mail to
"freebsd-performance-unsubscribe freebsd.org"
|
|
| Re: ULE vs. 4BSD in RELENG_7 |
  United States |
2007-10-23 12:49:48 |
Josh Carroll wrote:
> Hello,
>
> I posted this to the stable mailing list, as I thought
it was
> pertinent there, but I think it will get better
attention here. So I
> apologize in advance for cross-posting if this is a
faux pas.
>
> Anyway, in summary, ULE is about 5-6 % slower than 4BSD
for two
> workloads that I am sensitive to: building world with
-j X, and ffmpeg
> -threads X. Other benchmarks seem to indicate
relatively equal
> performance between the two. MySQL, on the other hand,
is
> significantly faster in ULE.
>
> I'm trying to understand why ffmpeg and buildworld are
slower in ULE
> than 4BSD, since it seems to me that ULE was supposed
to be the better
> scaling scheduler.
>
> Here is a link to the original thread on the stable
mailing list:
>
> http://lists.freebsd.org/pipermail/fre
ebsd-stable/2007-October/037379.html
>
> Remy replied with some interesting results for building
world between
> the two schedulers on an 8-way system. It seems that
ULE suffers as
> more threads/processes are thrown at it, at least it
appears that way
> from Remy's data.
>
> Does anyone have any additional performance tests I can
run that might
> help indicate where the deficiency is in the ULE
scheduler? MySQL
> performance is excellent, so I'm wondering if it was
tuned to that
> particular workload?
>
> I'm not sure if Remy subscribes to this list, so I am
CC'ing him. Hope
> you don't mind Remy
One major difference is that your workload is 100% user.
Also you were
reporting ULE had more idle time, which looks like a bug
since I would
expect it be basically 0% idle on such a workload.
Kris
_______________________________________________
freebsd-performance freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-p
erformance
To unsubscribe, send any mail to
"freebsd-performance-unsubscribe freebsd.org"
|
|
| Re: ULE vs. 4BSD in RELENG_7 |

|
2007-10-23 12:47:45 |
On 10/23/07, Josh Carroll <josh.carroll gmail.com> wrote:
> Hello,
>
> I posted this to the stable mailing list, as I thought
it was
> pertinent there, but I think it will get better
attention here. So I
> apologize in advance for cross-posting if this is a
faux pas.
>
> Anyway, in summary, ULE is about 5-6 % slower than 4BSD
for two
> workloads that I am sensitive to: building world with
-j X, and ffmpeg
> -threads X. Other benchmarks seem to indicate
relatively equal
> performance between the two. MySQL, on the other hand,
is
> significantly faster in ULE.
>
> I'm trying to understand why ffmpeg and buildworld are
slower in ULE
> than 4BSD, since it seems to me that ULE was supposed
to be the better
> scaling scheduler.
>
> Here is a link to the original thread on the stable
mailing list:
>
> http://lists.freebsd.org/pipermail/fre
ebsd-stable/2007-October/037379.html
>
> Remy replied with some interesting results for building
world between
> the two schedulers on an 8-way system. It seems that
ULE suffers as
> more threads/processes are thrown at it, at least it
appears that way
> from Remy's data.
>
> Does anyone have any additional performance tests I can
run that might
> help indicate where the deficiency is in the ULE
scheduler? MySQL
> performance is excellent, so I'm wondering if it was
tuned to that
> particular workload?
>
> I'm not sure if Remy subscribes to this list, so I am
CC'ing him. Hope
> you don't mind Remy
ULE is tuned towards providing cpu affinity compilation and
evidently
encoding are workloads that do not benefit from affinity.
Before we
conclude that it is slower, try building with -j5, -j6, j7.
-Kip
_______________________________________________
freebsd-performance freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-p
erformance
To unsubscribe, send any mail to
"freebsd-performance-unsubscribe freebsd.org"
|
|
| Re: ULE vs. 4BSD in RELENG_7 |

|
2007-10-23 14:57:45 |
> ULE is tuned towards providing cpu affinity compilation
and evidently
> encoding are workloads that do not benefit from
affinity. Before we
> conclude that it is slower, try building with -j5, -j6,
j7.
Here are the results of running ffmpeg with 4 through 8
threads on
both schedulers:
4 threads 4bsd: 117.21
5 threads 4bsd: 95.75
6 threads 4bsd: 93.10
7 threads 4bsd: 92.19
8 threads 4bsd: 92.38
4 threads ule: 122.19
5 threads ule: 107.26
6 threads ule: 101.40
7 threads ule: 98.72
8 threads ule: 96.38
4 threads difference: 4.25 %
5 threads difference: 12.02 %
6 threads difference: 8.92 %
7 threads difference: 7.08 %
8 threads difference: 4.33 %
I'm not sure why the performance differential is not
consistent
(probably something very technical a scheduler developer
could
explain)
Do these results help at all? When running with 9 or more
threads,
ffmpeg spits out a lot of errors, so 8 was as high as I
could go:
Error while decoding stream #0.0
[h264 0x264ae180]too many threads
[h264 0x264ae180]decode_slice_header error
[h264 0x264ae180]no frame!
My next step is to run some transcodes with mencoder to see
if it has
similar performance between the two schedulers. When I have
those
results, I'll post them to this thread.
Thanks for the attention,
Josh
_______________________________________________
freebsd-performance freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-p
erformance
To unsubscribe, send any mail to
"freebsd-performance-unsubscribe freebsd.org"
|
|
| Re: ULE vs. 4BSD in RELENG_7 |

|
2007-10-23 15:16:54 |
> Just curious, but are these results obtained while you
are
> overclocking your 2.4ghz CPU to 3.4ghz? That might be
a useful
> datapoint.
Yes they are with the CPU overclocked. I have verified the
results
when not overclocked as well (running at stock).
> It also might be useful to know what sort of disks you
are using.
> SATA is notoriously bad at parallel access, and
compiling is of
> course horribly disk bound to begin with.
I'm sure disk I/O is a factor here. ULE is supposed to
provide better
interactiveness during high load (and I/O load), right?
Perhaps the
scheduler is being too liberal with time slices for I/O?
Josh
_______________________________________________
freebsd-performance freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-p
erformance
To unsubscribe, send any mail to
"freebsd-performance-unsubscribe freebsd.org"
|
|
| Re: ULE vs. 4BSD in RELENG_7 |
  United States |
2007-10-23 15:09:00 |
On Tuesday 23 October 2007, Josh Carroll wrote:
> > ULE is tuned towards providing cpu affinity
compilation and
> > evidently encoding are workloads that do not
benefit from
> > affinity. Before we conclude that it is slower,
try building with
> > -j5, -j6, j7.
>
> Here are the results of running ffmpeg with 4 through 8
threads on
> both schedulers:
>
> 4 threads 4bsd: 117.21
> 5 threads 4bsd: 95.75
> 6 threads 4bsd: 93.10
> 7 threads 4bsd: 92.19
> 8 threads 4bsd: 92.38
>
> 4 threads ule: 122.19
> 5 threads ule: 107.26
> 6 threads ule: 101.40
> 7 threads ule: 98.72
> 8 threads ule: 96.38
>
> 4 threads difference: 4.25 %
> 5 threads difference: 12.02 %
> 6 threads difference: 8.92 %
> 7 threads difference: 7.08 %
> 8 threads difference: 4.33 %
>
> I'm not sure why the performance differential is not
consistent
> (probably something very technical a scheduler
developer could
> explain)
>
> Do these results help at all? When running with 9 or
more threads,
> ffmpeg spits out a lot of errors, so 8 was as high as I
could go:
>
> Error while decoding stream #0.0
> [h264 0x264ae180]too many threads
> [h264 0x264ae180]decode_slice_header error
> [h264 0x264ae180]no frame!
>
> My next step is to run some transcodes with mencoder to
see if it
> has similar performance between the two schedulers.
When I have
> those results, I'll post them to this thread.
>
> Thanks for the attention,
> Josh
Just curious, but are these results obtained while you are
overclocking your 2.4ghz CPU to 3.4ghz? That might be a
useful
datapoint.
It also might be useful to know what sort of disks you are
using.
SATA is notoriously bad at parallel access, and compiling is
of
course horribly disk bound to begin with.
make buildworld also was never designed for massive
parallelism at
all, and slows down considerably as you try to scale it up
with more
cpus and increasing -j past a certain point. I don't know
where the
break is, but it defintely has been hit at 16 cores.
--
Thanks,
Josh Paetzel
|
|
| Re: ULE vs. 4BSD in RELENG_7 |

|
2007-10-23 16:55:34 |
> My next step is to run some transcodes with mencoder to
see if it has
> similar performance between the two schedulers. When I
have those
> results, I'll post them to this thread.
mencoder is linked against the same libx264 library that
ffmpeg uses
for h.264 encoding, so I was expecting similar results as
ffmpeg.
However, the results are slightly different:
4BSD (threads=2): 93.82 real 182.82 user
0.30 sys
4BSD (threads=3): 64.79 real 184.27 user
0.41 sys
4BSD (threads=4): 51.36 real 185.76 user
0.31 sys
4BSD (threads=5): 49.88 real 186.11 user
0.24 sys
4BSD (threads=6): 49.53 real 186.28 user
0.32 sys
4BSD (threads=7): 49.45 real 186.32 user
0.33 sys
4BSD (threads=8): 49.36 real 186.39 user
0.34 sys
ULE (threads=2): 92.81 real 182.41 user
0.36 sys
ULE (threads=3): 64.28 real 184.57 user
0.39 sys
ULE (threads=4): 56.83 real 185.83 user
0.32 sys
ULE (threads=5): 55.30 real 185.95 user
0.42 sys
ULE (threads=6): 55.38 real 186.12 user
0.45 sys
ULE (threads=7): 55.24 real 186.14 user
0.60 sys
ULE (threads=8): 55.08 real 186.28 user
0.52 sys
What's interesting is that for threads=2 and threads=3, ULE
and 4BSD
are performing the same. After that, though, there's a 10%
gap for
the remaining data points. Also interesting is that they
both reach a
plateau at threads=5. I suppose this means mencoder is more
efficient
than ffmpeg? Anyway, ULE is still 10% slower with mencoder,
which is
"worse" than the 5% drop with ffmpeg.
I decided to run pbzip2 also. The -p argument doesn't seem
to
necessarily create as many threads as you request (or it's
completely
I/O bound):
4BSD(-p 4): 30.91 real 117.32 user 4.67
sys
4BSD(-p 5): 31.45 real 119.49 user 5.02
sys
4BSD(-p 6): 31.85 real 120.42 user 5.49
sys
4BSD(-p 7): 31.55 real 119.16 user 5.59
sys
4BSD(-p 8): 31.92 real 120.29 user 5.81
sys
ULE(-p 4): 33.73 real 114.60 user 4.51
sys
ULE(-p 5): 31.57 real 116.80 user 5.18
sys
ULE(-p 6): 31.74 real 118.00 user 5.21
sys
ULE(-p 7): 32.04 real 118.32 user 5.39
sys
ULE(-p 8): 32.35 real 120.22 user 6.05
sys
ULE is slightly slower here with -p4 (9.12 %) and -p8 (1.35
%), but
about the same for 5-7.
Hope this helps,
Josh
_______________________________________________
freebsd-performance freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-p
erformance
To unsubscribe, send any mail to
"freebsd-performance-unsubscribe freebsd.org"
|
|
| Re: ULE vs. 4BSD in RELENG_7 |

|
2007-10-23 20:06:39 |
I decided to do some testing of concurrent processes (rather
than a
single process that's multi-threaded). Specifically, I ran 4
ffmpeg
(without the -threads option) commands at the same time.
The
difference was less than a percent:
4bsd: 439.92 real 1755.91 user 1.08 sys
ule: 442.10 real 1754.65 user 1.34 sys
The difference in user/sys is slight, but there. Not sure if
that's
pertinent, though, given it is such a small percentage.
I also ran the same scenario with mencoder, with similar
results:
4bsd: 377.96 real 1501.58 user 2.04 sys
ule: 377.50 real 1501.68 user 1.93 sys
I think this is important, as it shows an N-process workload
on an
N-processor system is the same between ULE and 4BSD, while a
single
process (N-threads) workload on an N-processor system seems
to favor
4BSD (at least for media encoding). I'm still unsure why
MySQL is so
much better with ULE, given these results.
Again, hope this information is useful!
Josh
_______________________________________________
freebsd-performance freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-p
erformance
To unsubscribe, send any mail to
"freebsd-performance-unsubscribe freebsd.org"
|
|
| Re: ULE vs. 4BSD in RELENG_7 |
  United States |
2007-10-23 20:25:53 |
Kris Kennaway wrote:
> One major difference is that your workload is 100%
user. Also you were
> reporting ULE had more idle time, which looks like a
bug since I would
> expect it be basically 0% idle on such a workload.
>
> Kris
>
We can not ignore this performance bug, also I had found
that ULE is
slower than 4BSD when testing super-smack's update benchmark
on my
dual-core machine.
Regards,
David Xu
_______________________________________________
freebsd-performance freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-p
erformance
To unsubscribe, send any mail to
"freebsd-performance-unsubscribe freebsd.org"
|
|
| Re: ULE vs. 4BSD in RELENG_7 |

|
2007-10-23 21:10:25 |
> We can not ignore this performance bug, also I had
found that ULE is
> slower than 4BSD when testing super-smack's update
benchmark on my
> dual-core machine.
I actually saw improved performance with ULE over 4BSD for
super-smack. What were the parameters you used for your
testing? These
were mine:
super-smack ./select-key.smack 10 10000
super-smack ./update-select.smack 10 10000
I ran them again to confirm (10 runs each, averaged):
4BSD:
super-smack ./select-key.smack 10 10000 : 55235.3
super-smack ./update-select.smack 10 10000 : 17029
ULE:
super-smack ./select-key.smack 10 10000 : 65758.5
super-smack ./update-select.smack 10 10000 : 17366.7
So select-key is 19% faster!
The numbers I had from 6.2 (4BSD, with libmap.conf set up to
map
libpthread to libthr):
select-key: 50177.34
update-select: 14598.61
So either way, RELENG_7 is faster than 6.2 for super-smack,
at least
for me. And ULE here is quite a bit faster for select-key.
Josh
_______________________________________________
freebsd-performance freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-p
erformance
To unsubscribe, send any mail to
"freebsd-performance-unsubscribe freebsd.org"
|
|
|
|