List Info

Thread: Re: faster memset




Re: faster memset
country flaguser name
United States
2008-05-22 16:54:52
Aaron J. Grier <aaron <at> frye.com> writes:

> 
> On Thu, May 22, 2008 at 04:56:54PM +0000, Eric Blake
wrote:
> > My patched assembly is no longer sensitive to
alignment, and always
> > gets the speed of 8-byte alignment.  This clinches
it - for memset,
> > x86 assembly is noticeably faster than C.
> 
> have you done comparisons with the builtin memset() in
recent versions
> of gcc?
> 

I was testing with gcc 3.4.4, which does have
__builtin_memset.  But my 
understanding is that __builtin_memset defers to the library
function on cases 
it cannot optimize at compile time?  At any rate, my test
app called the 
library function via a function pointer - does
__builtin_memset even have an 
address to be used via a function pointer?

If I understand it correctly, __builtin_memset(ptr,0,8) is a
good example of 
where the compiler optimization helps (it is faster to
open-code two 32-bit 
writes than to call a function), in which case that is
faster than anything I 
can code in assembly.  But __builtin_memset(ptr,0,1000),
even though 1000 is 
constant, starts to be such a large amount of open-coded
assignments that the 
compiler probably falls back to the library routine anyway,
probably trusting 
that the library knows more architecture tricks for
efficiency than what you 
can represent generically in gcc's builtin definition table.
 Finally, 
__builtin_memset(ptr,0,len) cannot be optimized, since len
is not known at 
compile time, so the compiler must fall back on the
library.

In other words, by comparing against __builtin_memset,
wouldn't I merely be 
comparing against my own implementation for most of the
interesting cases?

-- 
Eric Blake



[1]

about | contact  Other archives ( Real Estate discussion Medical topics )