Hi,
I've proceeded with further development of John the Ripper
after the 1.7
release. A new development version is out - numbered 1.7.1:
http://www.openwall.com
/john/
JtR 1.7.1 adds bitslice DES code for x86 with SSE2 for
better
performance at DES-based crypt(3) hashes on Pentium 4 and
SSE2-capable
AMD processors, as well as assorted high-level changes to
improve
performance on current x86-64 processors (both AMD and
Intel).
On a related note, the SecurityFocus interview with me on
John the
Ripper 1.7 is now also available off the Openwall website:
http://www.openwall.com/john/interviews/SF-20060222-p1
For those who are interested in some benchmarks of the new
code, here
they are. I've used two systems, one with an Intel P4 Xeon
(3.2 GHz)
and the other with an AMD Athlon 64 ("3200+",
2.0 GHz). Although the
Xeon is capable of Hyper-Threading, I only ran one process,
thereby not
taking advantage of HT for these benchmarks. Both CPUs are
SSE2 and
64-bit capable. The OS on both systems was Linux and the
same builds of
John were used (I copied my pre-compiled executables to both
systems).
I've omitted the "BSDI DES" and "Kerberos
AFS DES" benchmarks to make it
easier to see the really important ones. The "BSDI
DES" results are in
all cases proportional to the "Traditional DES"
ones (as expected) and
the "Kerberos AFS DES" implementation is
unoptimal and unimportant to
most users of John.
I'll start with the Xeon:
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Xeon(TM) CPU
3.20GHz
stepping : 3
Native 64-bit (pure C, built on Owl-current for x86-64, gcc
3.4.5):
Benchmarking: Traditional DES [64/64 BS]... DONE
Many salts: 949593 c/s real, 949593 c/s virtual
Only one salt: 875699 c/s real, 877454 c/s virtual
Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
Raw: 10106 c/s real, 10106 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw: 450 c/s real, 450 c/s virtual
Benchmarking: NT LM DES [64/64 BS]... DONE
Raw: 8848K c/s real, 8848K c/s virtual
The DES performance is rather good and Blowfish is OK, but
it's the
performance at FreeBSD-style MD5-based crypt(3) that stands
out.
Most CPUs don't cross 10k c/s at this benchmark. This one
does due to
the high clock rate and the availability of 16 registers
with x86-64,
which enables John to do two MD5 hashes in parallel, even
with pure C
code.
32-bit with SSE2 build on the Xeon:
Benchmarking: Traditional DES [128/128 BS SSE2]... DONE
Many salts: 924518 c/s real, 924518 c/s virtual
Only one salt: 814592 c/s real, 814592 c/s virtual
Benchmarking: NT LM DES [128/128 BS SSE2]... DONE
Raw: 7069K c/s real, 7069K c/s virtual
Although SSE2 is effectively 128-bit, this is a little bit
slower than
the native 64-bit build, but it has the advantage of not
requiring a
64-bit capable CPU or OS. Similar performance is expected
on non-Xeon
P4s and on P4 Celerons that are not 64-bit capable.
32-bit with MMX build on the Xeon:
Benchmarking: Traditional DES [64/64 BS MMX]... DONE
Many salts: 654080 c/s real, 654080 c/s virtual
Only one salt: 599385 c/s real, 599385 c/s virtual
Benchmarking: NT LM DES [64/64 BS MMX]... DONE
Raw: 6521K c/s real, 6521K c/s virtual
As you can see, both DES-based hashes were faster with SSE2.
In case of
the traditional DES-based crypt(3), the difference is 35% to
40% in
favor of the new SSE2 implementation. (On older Pentium 4
CPUs, the MMX
code is faster than the above per-MHz, so the advantages of
the use of
SSE2 may be smaller.)
For the sake of completeness, the other two benchmarks from
the 32-bit
builds (they are the same since these use neither SSE2 nor
MMX):
Benchmarking: FreeBSD MD5 [32/32]... DONE
Raw: 9159 c/s real, 9159 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/32]... DONE
Raw: 453 c/s real, 454 c/s virtual
Here MD5 became a little bit slower compared to the 64-bit
build because
there are only 8 registers available in 32-bit mode and only
one hash is
being computed at a time.
Now the Athlon 64:
vendor_id : AuthenticAMD
cpu family : 15
model : 47
model name : AMD Athlon(tm) 64 Processor 3200+
stepping : 2
Native 64-bit:
Benchmarking: Traditional DES [64/64 BS]... DONE
Many salts: 791219 c/s real, 791219 c/s virtual
Only one salt: 720435 c/s real, 720435 c/s virtual
Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
Raw: 7419 c/s real, 7419 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw: 330 c/s real, 330 c/s virtual
Benchmarking: NT LM DES [64/64 BS]... DONE
Raw: 6638K c/s real, 6638K c/s virtual
This is rather good considering that the real clock rate is
only 2.0 GHz,
but it is slower than the Xeon. So the "3200+"
rating does not hold for
this benchmark.
However, with SSE2 things are better:
Benchmarking: Traditional DES [128/128 BS SSE2]... DONE
Many salts: 951193 c/s real, 951193 c/s virtual
Only one salt: 827776 c/s real, 827776 c/s virtual
Benchmarking: NT LM DES [128/128 BS SSE2]... DONE
Raw: 6474K c/s real, 6474K c/s virtual
Now we're at the same level of performance that the Xeon
provides for
DES-based crypt(3).
For comparison against previous versions of John, the MMX
build:
Benchmarking: Traditional DES [64/64 BS MMX]... DONE
Many salts: 785318 c/s real, 785318 c/s virtual
Only one salt: 703667 c/s real, 703667 c/s virtual
Benchmarking: NT LM DES [64/64 BS MMX]... DONE
Raw: 6503K c/s real, 6503K c/s virtual
As you can see, this is around 20% slower than SSE2 at
DES-based
crypt(3), achieving about the same performance that the
native 64-bit
build does. However, the performance at LM hashes is
similar for all
three builds (unlike on the Xeon).
Finally, for the sake of completeness, the other two
benchmarks for the
32-bit builds:
Benchmarking: FreeBSD MD5 [32/32]... DONE
Raw: 5935 c/s real, 5935 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/32]... DONE
Raw: 360 c/s real, 360 c/s virtual
Overall, the new SSE2 code may provide an up to 40% speedup
on current
CPUs for DES-based crypt(3) (both traditional and
BSDI-style), but its
effect on LM hashes is not always positive. Future versions
of JtR
might provide support for SSE2 with 64-bit builds and
improvements for
LM hashes.
Comments are welcome on the john-users mailing list.
--
Alexander Peslyak <solar at openwall.com>
GPG key ID: B35D3598 fp: 6429 0D7E F130 C13E C929 6447
73C3 A290 B35D 3598
http://www.openwall.com -
bringing security into open computing environments
|