"Dan Oetting" <dan_oetting qwest.net> on Fri, 20 Oct 2006 22:08:03 -0600
writes:
> You either need to compute the L[ ] values to feed into
the first
> round of each key schedule stage or save the S[ ]
values for each
> iteration between stages. You could generate the L[ ]
values by
> running your 3 stages through 3 passes with the same
key to generate
> and pass the required values. Alternatively, you could
replicate the
> early key schedule stages and feed them with the next 2
keys to be
> processed. You would then have a total of 6 key
schedule stages and 1
> decrypt stage but only need 1 pass per key and no S[ ]
storage. I
> figure that's about a 40% savings.
>From an FPGA/VLSI perspective, I don't see how this is a
40% savings for
a fully unrolled solution.
A "stage" in an FPGA is something around 5*32
4-LUT's, and SBox storage
is 2*32 4-LUT's. The round 2 SBox propagating to Round 3,
would take about
2080 4-LUTs to replace 832 4-LUT's of LUT Rams for the SBox,
almost 250%
more expensive. Worse numbers for the round 3 to round 4
SBox propagation,
as you need about 4160 4-LUT's to regenerate the round 3
sbox terms, and
only 832 to store them.
The "trick" works for a processor solution when
the storage is more expensive
than the cycles, such as a small microprocessor. It's
expensive for nearly
every other case.
However, it could be cheaper for a fully looped design, just
as it is for
small prcoessors. And might be useful in running many small
looped engines
in the FPGA, rather than one large unrolled engine.
_______________________________________________
Hardware mailing list
Hardware lists.distributed.net
http://lists.distributed.net/mailman/listinfo/hardware
|