On Wed, 2 Jan 2008, Bruce Evans wrote:
> On Tue, 1 Jan 2008, Jeff Roberson wrote:
>
>> On Tue, 1 Jan 2008, Gergely CZUCZY wrote:
>
>>> There's this SYSCALL CPU extension with the
SYSENTER/SYSEXIT features.
>>> IIRC
>>> Linux takes advantage of this, while FreeBSD
doesn't. I might be wrong
>>> here,
>>> of course.
>>
>> This is true on 32bit x86 and not true on
amd64/x86_64. On 32bit x86
>> platforms our syscalls cost about 750 cycles more
due to using int0x80.
>> Various patches have been around for a while to
implement sysenter/sysexit
>> support but it's difficult to get compatibility
right and probably not
>> worth it now that everyone is moving to 64bit.
>
> No, syscalls on i386 UP take about 65 cyles _less_ than
on amd64, due
It is true that we are slower on i386 by not using sysenter
and on par on
64bit amd64.
> mainly to 64-bit code and data being larger. A syscall
takes about
> 385 cycles on an A64 running i386 UP (0.17us
2.205GHz), so it can't
> possibly take 750 cycles more than on the same A64
running amd64 UP
> (0.20us 2.205GHz). I think SYSENTER/SYSEXIT
saves more like 7.5 or
> 75 cycles and thus compensates for some of the 64-bit
overhead, else
> amd64 would be even slower. I don't have documents or
measurements
> for current int0x80 or SYS* times -- on i486, int0x80
takes about 80
> cycles and iret takes about the same, so the total
overhead from the
> bad hardware interface is about half of the total
syscall overerhead.
I have not benchmarked since the P4 days so my data must be
grossly out of
date. At the time I had a small operating system that I
used for
benchmarking processor features. I also tested call gates,
task gates,
etc. I might be confusing the results of one of these
tests.
Thanks,
Jeff
>
> The times 0.17us and 0.20us are from lmbench2 doing a
COMPAT_43 getppid().
> As is well known, getppid() is a better benchmark than
getpid() since it
> is much harder for libraries to cache (since the parent
may change to
> init at any time). In FreeBSD, it always does proc
locking, while getpid()
> only does proc locking if COMPAT_43. But the overhead
for uncontested
> locking on UP is in the noise -- it is about 5-10
cycles on this hardware.
>
> lmbench2 is not up to date enough enough to report
things with nanoseconds
> resolution. I have more accurate measurements for
clock_gettime().
> After some optimizations, clock_gettime() timing itself
takes an average
> of 233ns in my version of 5.2 and 250-260ns in
-current, both on i386 UP
> 2.205GHz.
>
> Linux-2.6.10 i386 UP takes 0.13us for getpid() on
slightly different
> hardware (AXP 2.223GHz) where FreeBSD i386 UP takes
slightly longer
> than on the A64 (0.17-0.18us). Not a big difference.
The difference
> is more interesting for the even-more-bogus "null
I/O" micro-benchmark.
> This writes 1 byte to /dev/null. Linux used to be 4-5
times faster on
> this (on the AXP, in 0.16us in Linux-2.3.99 vs 0.90us
in FreeBSD-~5.2),
> but Linux has been speeded down (0.19us in
Linux-2.6.10) and FreeBSD
> has been speeded up (0.33us on the A64 in -current). I
consider the
> speedups bogus since they consist of combining/avoiding
vfs layers for
> devices only. The usual case of (cached) file i/o
remains unnecessarily
> slow. (For most devices, and for uncached file i/o,
the hardware part is
> necessarily slow, so optimization of the software
hardly matters.)
>
> Bruce
> _______________________________________________
> freebsd-performance freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-p
erformance
> To unsubscribe, send any mail to
> "freebsd-performance-unsubscribe freebsd.org"
>
_______________________________________________
freebsd-performance freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-p
erformance
To unsubscribe, send any mail to
"freebsd-performance-unsubscribe freebsd.org"
|