List Info

Thread: OpenSSL: openssl/crypto/sha/asm/ sha512-sparcv9.pl




OpenSSL: openssl/crypto/sha/asm/ sha512-sparcv9.pl
country flaguser name
Germany
2007-09-26 07:16:33
  OpenSSL CVS Repository
  http://cvs.openssl.org/
 
____________________________________________________________
________________

  Server: cvs.openssl.org                  Name:   Andy
Polyakov
  Root:   /v/openssl/cvs                   Email:  approopenssl.org
  Module: openssl                          Date:  
26-Sep-2007 14:16:33
  Branch: HEAD                             Handle:
2007092613163200

  Modified files:
    openssl/crypto/sha/asm  sha512-sparcv9.pl

  Log:
    Clarify commentary in sha512-sparcv9.pl.

  Summary:
    Revision    Changes     Path
    1.3         +14 -6     
openssl/crypto/sha/asm/sha512-sparcv9.pl
 
____________________________________________________________
________________

  patch -p0 <<' .'
  Index: openssl/crypto/sha/asm/sha512-sparcv9.pl
 
============================================================
================
  $ cvs diff -u -r1.2 -r1.3 sha512-sparcv9.pl
  --- openssl/crypto/sha/asm/sha512-sparcv9.pl	10 May 2007
06:48:28 -0000	1.2
  +++ openssl/crypto/sha/asm/sha512-sparcv9.pl	26 Sep 2007
12:16:32 -0000	1.3
   -17,7 +17,7 
   # Performance is >75% better than 64-bit code
generated by Sun C and
   # over 2x than 32-bit code. X[16] resides on stack, but
access to it
   # is scheduled for L2 latency and staged through 32 least
significant
  -# bits of %l0-%l7. The latter is done to achieve
32-/64-bit bit ABI
  +# bits of %l0-%l7. The latter is done to achieve
32-/64-bit ABI
   # duality. Nevetheless it's ~40% faster than SHA256,
which is pretty
   # good [optimal coefficient is 50%].
   #
   -25,14 +25,22 
   #
   # It's not any faster than 64-bit code generated by Sun C
5.8. This is
   # because 64-bit code generator has the advantage of
using 64-bit
  -# loads to access X[16], which I consciously traded for
32-/64-bit ABI
  -# duality [as per above]. But it surpasses 32-bit Sun C
generated code
  -# by 60%, not to mention that it doesn't suffer from
severe decay when
  -# running 4 times physical cores threads and that it
leaves gcc [3.4]
  -# behind by over 4x factor! If compared to SHA256, single
thread
  +# loads(*) to access X[16], which I consciously traded
for 32-/64-bit
  +# ABI duality [as per above]. But it surpasses 32-bit Sun
C generated
  +# code by 60%, not to mention that it doesn't suffer from
severe decay
  +# when running 4 times physical cores threads and that it
leaves gcc
  +# [3.4] behind by over 4x factor! If compared to SHA256,
single thread
   # performance is only 10% better, but overall throughput
for maximum
   # amount of threads for given CPU exceeds corresponding
one of SHA256
   # by 30% [again, optimal coefficient is 50%].
  +#
  +# (*)	Unlike pre-T1 UltraSPARC loads on T1 are executed
strictly
  +#	in-order, i.e. load instruction has to complete prior
next
  +#	instruction in given thread is executed, even if the
latter is
  +#	not dependent on load result! This means that on T1 two
32-bit
  +#	loads are always slower than one 64-bit load. Once
again this
  +#	is unlike pre-T1 UltraSPARC, where, if scheduled
appropriately,
  +#	2x32-bit loads can be as fast as 1x64-bit ones.
   
   $bits=32;
   for (ARGV)	{ $bits=64 if (/-m64/ || /-xarch=v9/); }
   .
____________________________________________________________
__________
OpenSSL Project                                 http://www.openssl.org
CVS Repository Commit List                    
openssl-cvsopenssl.org
Automated List Manager                          
majordomoopenssl.org

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )