OpenSSL CVS Repository
http://cvs.openssl.org/
____________________________________________________________
________________
Server: cvs.openssl.org Name: Andy
Polyakov
Root: /v/openssl/cvs Email: appro openssl.org
Module: openssl Date:
26-Sep-2007 14:16:33
Branch: HEAD Handle:
2007092613163200
Modified files:
openssl/crypto/sha/asm sha512-sparcv9.pl
Log:
Clarify commentary in sha512-sparcv9.pl.
Summary:
Revision Changes Path
1.3 +14 -6
openssl/crypto/sha/asm/sha512-sparcv9.pl
____________________________________________________________
________________
patch -p0 <<' .'
Index: openssl/crypto/sha/asm/sha512-sparcv9.pl
============================================================
================
$ cvs diff -u -r1.2 -r1.3 sha512-sparcv9.pl
--- openssl/crypto/sha/asm/sha512-sparcv9.pl 10 May 2007
06:48:28 -0000 1.2
+++ openssl/crypto/sha/asm/sha512-sparcv9.pl 26 Sep 2007
12:16:32 -0000 1.3
 -17,7 +17,7 
# Performance is >75% better than 64-bit code
generated by Sun C and
# over 2x than 32-bit code. X[16] resides on stack, but
access to it
# is scheduled for L2 latency and staged through 32 least
significant
-# bits of %l0-%l7. The latter is done to achieve
32-/64-bit bit ABI
+# bits of %l0-%l7. The latter is done to achieve
32-/64-bit ABI
# duality. Nevetheless it's ~40% faster than SHA256,
which is pretty
# good [optimal coefficient is 50%].
#
 -25,14 +25,22 
#
# It's not any faster than 64-bit code generated by Sun C
5.8. This is
# because 64-bit code generator has the advantage of
using 64-bit
-# loads to access X[16], which I consciously traded for
32-/64-bit ABI
-# duality [as per above]. But it surpasses 32-bit Sun C
generated code
-# by 60%, not to mention that it doesn't suffer from
severe decay when
-# running 4 times physical cores threads and that it
leaves gcc [3.4]
-# behind by over 4x factor! If compared to SHA256, single
thread
+# loads(*) to access X[16], which I consciously traded
for 32-/64-bit
+# ABI duality [as per above]. But it surpasses 32-bit Sun
C generated
+# code by 60%, not to mention that it doesn't suffer from
severe decay
+# when running 4 times physical cores threads and that it
leaves gcc
+# [3.4] behind by over 4x factor! If compared to SHA256,
single thread
# performance is only 10% better, but overall throughput
for maximum
# amount of threads for given CPU exceeds corresponding
one of SHA256
# by 30% [again, optimal coefficient is 50%].
+#
+# (*) Unlike pre-T1 UltraSPARC loads on T1 are executed
strictly
+# in-order, i.e. load instruction has to complete prior
next
+# instruction in given thread is executed, even if the
latter is
+# not dependent on load result! This means that on T1 two
32-bit
+# loads are always slower than one 64-bit load. Once
again this
+# is unlike pre-T1 UltraSPARC, where, if scheduled
appropriately,
+# 2x32-bit loads can be as fast as 1x64-bit ones.
$bits=32;
for ( ARGV) { $bits=64 if (/-m64/ || /-xarch=v9/); }
 .
____________________________________________________________
__________
OpenSSL Project http://www.openssl.org
CVS Repository Commit List
openssl-cvs openssl.org
Automated List Manager
majordomo openssl.org
|