List Info

Thread: In all the talk of super computers there is not...




In all the talk of super computers there is not...
country flaguser name
United States
2007-09-03 22:55:55
There does not seem to be much consideration about what is 
computationally infeasible, even with rainbow tables.

If I remember correctly an 8 character 94 key space table is

about 300 MB. How big would it be if it was covering 12 
characters? How long would it take to compute assuming 1,000
3 
GHz CPUs on a bot net?

Now take the phrase "Mary had a lamb, and its fleece
was as white 
as snow." Not counting the quotes it is 52 characters
and has 
both upper and lower case characters, spaces and two
specials or 
a total of 55 key space. How big would the rainbow table be
to 
contain that? How long would it take to compute with 1,000 3
GHz 
CPUs?

Of course one could not assume that the pass phrase would
only 
have the 55 above, so what about the 94 key space table for
52 
characters? How big? How long to compute?

 From the spreadsheet I have it runs out of space to
calculate 
and just renders errors.

I'm guessing that even the botnets in current use couldn't
do it 
in any reasonable time frame nor is the storage space
available 
at an affordable price for any but three letter agencies.

Am I correct?

Allen

------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography"
to majordomometzdowd.com

Re: In all the talk of super computers there is not...
country flaguser name
Czech Republic
2007-09-05 04:36:02
Allen wrote:
> Now take the phrase "Mary had a lamb, and its
fleece was as white as
> snow." Not counting the quotes it is 52 characters
and has both upper
> and lower case characters, spaces and two specials or a
total of 55 key
> space. How big would the rainbow table be to contain
that? How long
> would it take to compute with 1,000 3 GHz CPUs?

You have given english sentence of 12 words as passphrase.
If we count
about 2.5bits of information per word and hash without
adding salt, it
results in about 2^30 combinations. When we divide it over
botnet of
1000 computers, each must try about 10^6 hashes. I guess you
can
calculate the rest of the anwers.

-- 
Martin Tomasek

------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography"
to majordomometzdowd.com

Re: In all the talk of super computers there is not...
country flaguser name
United States
2007-09-05 23:01:45
Hi Martin,

I did forget to say that it would be salted so that throws
it off 
by 2^12

A couple of questions. How did you come up with the ~2.5
bits per 
word? Would a longer word have more bits?

Why are you using entropy rather than 2^(keyspace)? With 55

possible characters per each individual character space, a
12 
character password would have 766,217,865,410,400,000,000
possible combinations without a salt.

Tom Sullivan's Excel spreadsheet for calculating Rainbow
Tables, 
as corrected by Philippe Oechlin, and based on Philippe's 
optimization in the following reference:

http://lasecwww.epfl.ch/php_code/publications
/search.php?ref=Oech03

says that it would take 180.341 years to crack with an 86%
chance 
of success at a hash calculation rate of 100,000,000/sec.

Based on the same speed it would take only 247,465.463 years
to 
calculate the Rainbow Tables.

So what it boils down to is what is the calculation rate of
a 
1000 CPU botnet in reality? I chose the 100,000,000 rate
sort of 
arbitrarily, making assumptions about the hours of real use
and 
the % of CPU time that would be devoted to creating the
Rainbow 
Tables.

Even if one were to assume that the real rate would be 1,000

times faster, it still would take nearly 25 years to create
the 
tables for a twelve character password. If you go to 15 
characters then it says it is a mere 4 million years to
generate 
the tables.

Best,

Allen




mtd wrote:
> Allen wrote:
>> Now take the phrase "Mary had a lamb, and its
fleece was as white as
>> snow." Not counting the quotes it is 52
characters and has both upper
>> and lower case characters, spaces and two specials
or a total of 55 key
>> space. How big would the rainbow table be to
contain that? How long
>> would it take to compute with 1,000 3 GHz CPUs?
> 
> You have given english sentence of 12 words as
passphrase. If we count
> about 2.5bits of information per word and hash without
adding salt, it
> results in about 2^30 combinations. When we divide it
over botnet of
> 1000 computers, each must try about 10^6 hashes. I
guess you can
> calculate the rest of the anwers.
> 

------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography"
to majordomometzdowd.com

Re: In all the talk of super computers there is not...
user name
2007-09-06 08:28:40
| Hi Martin,
| 
| I did forget to say that it would be salted so that throws
it off by
| 2^12
| 
| A couple of questions. How did you come up with the ~2.5
bits per
| word? Would a longer word have more bits?
He misapplied an incorrect estimate!   The usual
estimate - going
back to Shannon's original papers on information theory,
actually - is
that natural English text has about 2.5 (I think it's
usually given as
2.4) bits of entropy per *character*.  There are several
problems here:

	- The major one is that the estimate should be for
*characters*,
		not *words*.  So the number of bits of entropy in
		a 55-character phrase is about 137 (132, if you use
		2.4 bits/character), not 30.

	- The minor one is that the English entropy estimate looks
just
		at letters and spaces, not punctuation and
capitalization.
		So it's probably low anyway.  However, this is a much
		smaller effect.

							-- Jerry

------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography"
to majordomometzdowd.com

Re: In all the talk of super computers there is not...
country flaguser name
United States
2007-09-06 12:43:29
On Thu, 6 Sep 2007 09:28:40 -0400 (EDT)
"Leichter, Jerry" <leichter_jerroldemc.com> wrote:

> | Hi Martin,
> | 
> | I did forget to say that it would be salted so that
throws it off by
> | 2^12
> | 
> | A couple of questions. How did you come up with the
~2.5 bits per
> | word? Would a longer word have more bits?
> He misapplied an incorrect estimate!   The usual
estimate - going
> back to Shannon's original papers on information
theory, actually - is
> that natural English text has about 2.5 (I think it's
usually given as
> 2.4) bits of entropy per *character*.  There are
several problems
> here:

It's less than that.  See, for example, the bottom of the
first page of
http://www.cs.brown.edu/courses/cs195-5/extras/sh
annon-1951.pdf :

	From this analysis it appears that, in ordinary literary
	English, the long range statistical effects (up to 100
letters)
	reduce the entropy to something of the order of one bit
per
	letter, with a corresponding redundancy of roughly 75%.
The
	redundancy may be still higher when structure extending
over
	paragraphs, chapters, etc. is included.

> 
> 	- The major one is that the estimate should be for
> *characters*, not *words*.  So the number of bits of
entropy in
> 		a 55-character phrase is about 137 (132, if you use
> 		2.4 bits/character), not 30.
> 
> 	- The minor one is that the English entropy estimate
looks
> just at letters and spaces, not punctuation and
capitalization.
> 		So it's probably low anyway.  However, this is a
much
> 		smaller effect.

The interesting question is whether or not one can
effectively
enumerate candidate phrases for a guessing program.  For
that problem,
punctuation and capitalization are important.

		--Steve Bellovin, http://www.cs.columbi
a.edu/~smb

------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography"
to majordomometzdowd.com

Re: In all the talk of super computers there is not...
user name
2007-09-06 13:24:11
| > | Hi Martin,
| > | 
| > | I did forget to say that it would be salted so that
throws it off by
| > | 2^12
| > | 
| > | A couple of questions. How did you come up with the
~2.5 bits per
| > | word? Would a longer word have more bits?
| > He misapplied an incorrect estimate!   The usual
estimate - going
| > back to Shannon's original papers on information
theory, actually - is
| > that natural English text has about 2.5 (I think it's
usually given as
| > 2.4) bits of entropy per *character*.  There are
several problems
| > here:
| 
| It's less than that.  See, for example, the bottom of the
first page of
| http://www.cs.brown.edu/courses/cs195-5/extras/sh
annon-1951.pdf :
Interesting paper - I hadn't seen that one, only the earlier
one that
got an estimate - cited in this one - for 2.3 (not 2.4) bits
per
character for samples of length 8 (*very* roughly).

| 	From this analysis it appears that, in ordinary literary
| 	English, the long range statistical effects (up to 100
letters)
| 	reduce the entropy to something of the order of one bit
per
| 	letter, with a corresponding redundancy of roughly 75%.
The
| 	redundancy may be still higher when structure extending
over
| 	paragraphs, chapters, etc. is included.
| 
| > 
| > 	- The major one is that the estimate should be for
| > *characters*, not *words*.  So the number of bits of
entropy in
| > 		a 55-character phrase is about 137 (132, if you
use
| > 		2.4 bits/character), not 30.
| > 
| > 	- The minor one is that the English entropy estimate
looks
| > just at letters and spaces, not punctuation and
capitalization.
| > 		So it's probably low anyway.  However, this is a
much
| > 		smaller effect.
| 
| The interesting question is whether or not one can
effectively
| enumerate candidate phrases for a guessing program.  For
that problem,
| punctuation and capitalization are important.
Well, for *general purpose* algorithms, you can get a rough
idea by
looking at how well the best compressors do.  zip deflate on
a random
selection of English text I used managed to reduce the text
to about 31%
of its original size.  You can't easily compare this to
Shannon's 25%
estimate because zup had an easy job:  The input was 7-bit
ASCII, the
top bit of every byte was always 0; and of the remaining 128
possible
bytes, at least 30 (probably more) never occur.  If we
assume the
input text had only 70 possible characters in it, then there
are
"really" only a little more than 6 bits of true
entropy per byte
of input.  This brings the effective compression from the
"smart"
parts of the algorithm down to about from 69% to 60%.

zip deflate isn't the state of the art in compression
algorithms, but
nothing does all *that* much better.

I suspect the best first-try algorithm for generating
attacks would be
an analogue of using a dictionary to guess passwords: 
Extract phrases
of the appropriate length from the huge volume of data that
is now
readily available on line.  This is likely to catch many
pass phrases.

The example in the original message shows how to avoid such
an attack:
Don't use "Mary had a little lamb, it's fleece was
white as snow";
use a semantic equivalant "Mary had one tiny lamb, with
fleece that
were white as snow".  One can probably generate many
such variants
algorithmically with little trouble, though.  (What's hard
is
eliminating the ones no human would likely use for deep
semantic
reasons, but for an attack like this, generating extra ones
only
cost you time.)

Probably out of reach today for reasonably long phrases, but
I
wouldn't give it very much time.

(It would be interesting to do a detailed analysis for the
often-
recommended approach of picking a phrase and using the first
letters
of successive words.  Just the distribution of first letters
of
words is probably biased, and what the correlation of
successive
first letters looks like is anyone's guess - though given
the
ready availability of data, it's trivially easy to
compute.)

							-- Jerry

------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography"
to majordomometzdowd.com

Re: In all the talk of super computers there is not...
country flaguser name
Czech Republic
2007-09-07 07:41:44
Leichter, Jerry wrote:
> > | A couple of questions. How did you come up with
the ~2.5 bits per
> > | word? Would a longer word have more bits?
> > He misapplied an incorrect estimate!     The usual
estimate - going
> > back to Shannon's original papers on information
theory, actually - is
> > that natural English text has about 2.5 (I think
it's usually given as
> > 2.4) bits of entropy per *character*.  There are
several problems here:
> >
> > 	- The major one is that the estimate should be
for *characters*,
> > 		not *words*.  So the number of bits of entropy
in
> > 		a 55-character phrase is about 137 (132, if you
use
> > 		2.4 bits/character), not 30.


I think in weird ways.    The
rationale behind it follows:

I assume that the passphrase is in syntactically correct
English. So,
number of possible combinations is reduced by the great
amount. Also, I
 want to reduce the number of combinations, so I focus on
the most
probable sentences.

It seems ideal to use some stochastic grammar to describe
this problem.
This type of grammar can be used to produce:

1) probabilities for the sentences so:
	a) total count (state space) can be reduced by threshold
	b) sentences could be sorted by probability
2) estimate of shannon entropy (which allows me to estimate
bits per
sentence or per word and possibly to craft more effective
algorithm to
walk through the space)

At this point I did a little test for one phrase and played
with it a
little. I wanted to know, how likely is that using
stochastic grammar
description, someone can infer that passphrase. I asked
google for
aproximate count on phrases (results sorted by count):

"had a look" 2100000
"had a car"   591000
"had a little lamb" 590000
"had a drink" 562000
"mary had a little lamb" 522000
"had a fight" 466000
mary had a little fleece white snow 322000 //not a phrase
"had a president" 80200
"had a snow"   42400
"had a lamb"   27300

and also:

"I have been there"  947000
"to rescue"         2190000

"had" 1.2E9
"is"  3.68E9
"the" 5E9
"a"  7.2E9

>From this I assumed that google indexes about 5*109
pages. It can bee
seen clearly, that "had a little lamb" is common
phrase (relatively,
between similar phrases). It can be also seen, that the
whole rhyme has
about an half count of phrase "had a little
lamb".

At this point I decided not to continue further and assumed,
that this
passphrase has very low information content, so I used value
of about
2.5bits/word (which don't seem unreasonable when looking at
the numbers
above). I didn't calculated the actual value, it can be
higher or lower.
If the passphrase is "The car looked at me with a
telescope.", I would
estimate it higher (unusual combination of words).

Thinking about that original passphrase at character level
shannon way
is incorrect. It overestimates information in that sentence.
Word level
is better, but still not good enough. Information content
is
overestimated by many, especially for political speeches. 


-- Martin Tomasek

------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography"
to majordomometzdowd.com

Re: In all the talk of super computers there is not...
country flaguser name
United States
2007-09-07 13:15:29
This is an interesting analysis, and the right way to
proceed, I think,
when dealing with passphrases that contain sequences of
natural language
words. I I think the 2.5 bits per word estimate, however, is
a massive
leap.

The problem is that, when it comes to word sequences, the
perplexity of
each contiguous word is much higher than for each contiguous
letter in the
same text.  That is, there are many more possible symbols
that can follow
the current one.  You identified several alternatives that
have the "had
a" prefix, and they are are fairly likely.  Given the
"had a" bigram, it
might be the case the the conditional entropy of the
following token is
fairly small, compared to the general entropy of English
unigrams.  If
"had a little" isn't right, though, the number of
possible words that
might come next is massive.

I think the proper way to continue with this analysis would
be to look
into  the research that has been done on n-gram language
models.  I think
you'll find that even the best models will never get the
conditional
entropy of an arbitrary word down to 2.5 bits.  That would
mean that
basically had the next word narrowed down to less than 8
choices!  This
may occur in some very  common combinations, but in general
the
conditional entropy will be much higher.  In addition, the
phrase-initial
word will always have a fairly high perplexity, because the
only thing to
condition the distribution over possible words for this case
on is the
fact that it is phrase-initial.

That being said, it seems like the idea that the passprases
are much less
secure than traditional character-lever analysis would
suggest is spot on.
 Google recently published DVDs with their counts of the
frequencies of
all n-grams up to 5-grams in their web corpus
(http://googleresearch.blogspot.c
om/2006/08/all-our-n-gram-are-belong-to-you.html)
Armed with that data and enough resources, one could build a
language
model that would make passphrase guessing much more
principled and could
reduce the amount of conditional entropy in a passphrase
significantly. 
In fact, for passphrases up to 5 words in length, the entire
phrase is
probably already in the Google data, it's just a matter of
having the
resources to be able to get through them all.

--dan

> Leichter, Jerry wrote:
>> > | A couple of questions. How did you come up
with the ~2.5 bits per
>> > | word? Would a longer word have more bits?
>> > He misapplied an incorrect estimate!     The usual
estimate - going
>> > back to Shannon's original papers on
information theory, actually - is
>> > that natural English text has about 2.5 (I
think it's usually given as
>> > 2.4) bits of entropy per *character*.  There
are several problems
>> here:
>> >
>> > 	- The major one is that the estimate should
be for *characters*,
>> > 		not *words*.  So the number of bits of
entropy in
>> > 		a 55-character phrase is about 137 (132, if
you use
>> > 		2.4 bits/character), not 30.
>
>
> I think in weird ways.    The
rationale behind it follows:
>
> I assume that the passphrase is in syntactically
correct English. So,
> number of possible combinations is reduced by the great
amount. Also, I
>  want to reduce the number of combinations, so I focus
on the most
> probable sentences.
>
> It seems ideal to use some stochastic grammar to
describe this problem.
> This type of grammar can be used to produce:
>
> 1) probabilities for the sentences so:
> 	a) total count (state space) can be reduced by
threshold
> 	b) sentences could be sorted by probability
> 2) estimate of shannon entropy (which allows me to
estimate bits per
> sentence or per word and possibly to craft more
effective algorithm to
> walk through the space)
>
> At this point I did a little test for one phrase and
played with it a
> little. I wanted to know, how likely is that using
stochastic grammar
> description, someone can infer that passphrase. I asked
google for
> aproximate count on phrases (results sorted by count):
>
> "had a look" 2100000
> "had a car"   591000
> "had a little lamb" 590000
> "had a drink" 562000
> "mary had a little lamb" 522000
> "had a fight" 466000
> mary had a little fleece white snow 322000 //not a
phrase
> "had a president" 80200
> "had a snow"   42400
> "had a lamb"   27300
>
> and also:
>
> "I have been there"  947000
> "to rescue"         2190000
>
> "had" 1.2E9
> "is"  3.68E9
> "the" 5E9
> "a"  7.2E9
>
>>From this I assumed that google indexes about 5*109
pages. It can bee
> seen clearly, that "had a little lamb" is
common phrase (relatively,
> between similar phrases). It can be also seen, that the
whole rhyme has
> about an half count of phrase "had a little
lamb".
>
> At this point I decided not to continue further and
assumed, that this
> passphrase has very low information content, so I used
value of about
> 2.5bits/word (which don't seem unreasonable when
looking at the numbers
> above). I didn't calculated the actual value, it can be
higher or lower.
> If the passphrase is "The car looked at me with a
telescope.", I would
> estimate it higher (unusual combination of words).
>
> Thinking about that original passphrase at character
level shannon way
> is incorrect. It overestimates information in that
sentence. Word level
> is better, but still not good enough. Information
content is
> overestimated by many, especially for political
speeches.  
>
> -- Martin Tomasek
>
>
------------------------------------------------------------
---------
> The Cryptography Mailing List
> Unsubscribe by sending "unsubscribe
cryptography" to
> majordomometzdowd.com
>

------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography"
to majordomometzdowd.com

RE: In all the talk of super computers there is not...
country flaguser name
United States
2007-09-10 14:26:45
True, the contestants are given extra information, though. 
They know
ahead of time that the words make up the name of a place, or
a common
saying, for example.  That helps decrease the entropy
considerably.  They
also know the exact number of characters in the final answer
and are able
to probe multiple characters in the phrase simultaneously.

If a system is setup correctly, you should never be able to
get a hint as
to whether you have guessed any portion of a password
correctly, and you
probably don't know what sort of phrase the target has
chosen, so it would
seem like most of the entropy-reducing information the Wheel
of Fortune
contestant is able to take advantage of are not available to
a password
cracking algorithm.

--dan

> While 2.5 bits/word seems low, the TV game show Wheel
Of Fortune is
> evidence that
> people can correctly guess phrases even when a large
proportion of the
> letters
> are missing.
>
> Peter Trei


------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography"
to majordomometzdowd.com

[1-9]

about | contact  Other archives ( Real Estate discussion Medical topics )