> The other oddity is, when I build with i586 assembly,
the checks
> run _slower_ than in i386 mode.
> I get 1min 19sec vs. 2min 14sec on a MacBook CoreDuo
1.83GHz with
> 1GB RAM.
> Even when using aggressive optimisation
(CFLAGS="-arch i586 -
> march=yonah -O3 -ffast-math -mfpmath=sse -msse
-msse2"), I still
> only get 1min 47secs. For i386, I didn't use any
special compiler
> flags.
>
> What are me and my Mac messing up here?
I think I've found the problem.
In mpi/config.links, there's a rule for i586-* that sets the
macro
ELF_SYNTAX in asm-syntax.h. This in turn causes the
assembler to see
the line
.align (1<<3)
in front of the Loop: label in mpih-sub1-asm.S and
mpih-add1-asm.S.
At least with the Apple assembler, this will be interpreted
as "align
the next instruction on a 2^(1<<3) boundary" -
which is BSD syntax.
I'm not quite sure, but I thought I read somewhere that this
2^(align
size) type syntax is even used in recent gas versions? In
any case,
the 1<<(1<<3) = 0x100 = 256 byte alignment
produces 200+ nops, which
slow the routine down considerably.
I fixed this by adding the darwin triplets to the djgpp
triplets in
config.links:
i[3467]86*-msdosdjgpp* |
i[34]86*-apple-darwin*)
echo '#define BSD_SYNTAX'
>>./mpi/asm-syntax.h
cat $srcdir/mpi/i386/syntax.h
>>./mpi/asm-syntax.h
path="i386"
;;
i586*-msdosdjgpp* |
i[567]86*-apple-darwin*)
echo '#define BSD_SYNTAX'
>>./mpi/asm-syntax.h
cat $srcdir/mpi/i386/syntax.h
>>./mpi/asm-syntax.h
path="i586 i386"
;;
This takes out the nops - but it's still slower.
Using the aggressive optimisation flags mentioned earlier,
i386
assembly lets benchmark run in 49secs, and in 68secs with
i586 assembly.
Disabling assembly yields 65secs by the way.
I think I give up on this for now - it's fast enough and I'm
happy
that gcrypt builds with a little bit of speed improvement on
OSX.
Thanks for all your work,
Gregor
_______________________________________________
Gcrypt-devel mailing list
Gcrypt-devel gnupg.org
http://lists.gnupg.org/mailman/listinfo/gcrypt-devel
|