List Info

Thread: compressing randomly-generated numbers




compressing randomly-generated numbers
user name
2006-08-10 03:44:19
Hey,

I was mulling over some old emails about randomly-generated
numbers
and realized that if I had an imperfectly random source
(something
less than 100% unpredictable), that compressing the output
would
compress it to the point where it was nearly so.  Would
there be any
reason to choose one algorithm over another for this
application?

I recall talking to a CS prof once who said that LZW
compression was
"optimal", which seemed and still seems really
odd to me because
optimal compression would generate a null output file.  So
surely he
meant asymptotically optimal, or e-close to optimal, or
something like
that... anyone know?

Obviously I will avoid any fixed-content headers and
"magic numbers"
by using a "raw" implementation of the
algorithm, not reusing, say,
gzip.  Plus, I will be running as though the RNG was
providing me with
an infinitely long string, not reading everything into
memory and
trying to compress.

It seems as though the Burroughs-Wheeler Transform (bzip2
et. al.)
gets the best compression of the standard utilities... is it
suitable
for infinite length strings?  Is there anything better?
-- 
"If you're not part of the solution, you're part of
the precipitate."
Unix "guru" for rent or hire -><- http://www.li
ghtconsulting.com/~travis/
GPG fingerprint: 9D3F 395A DAC5 5CCC 9066  151D 0A6B 4098
0C55 1484

------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe
cryptography" to majordomometzdowd.com
compressing randomly-generated numbers
user name
2006-08-11 00:06:54
On Aug 9, 2006, at 8:44 PM, Travis H. wrote:

> Hey,
>
> I was mulling over some old emails about
randomly-generated numbers
> and realized that if I had an imperfectly random source
(something
> less than 100% unpredictable), that compressing the
output would
> compress it to the point where it was nearly so.  Would
there be any
> reason to choose one algorithm over another for this
application?

This is neither correct nor a good idea.

Taking almost random information and compressing it will
lead to less  
random results.

Specifically, I will give the general LZW case and then go
to the BZ2  
case.

1) For LZW (even ignoring the magic numbers), if a byte does
not  
match anything in the dictionary (it starts with a
dictionary of all  
0s, so the probability of a match on the first byte is low)
then that  
byte will get a header of a 0 bit. That byte now becomes 9
bits. The  
next byte will have a dictionary of the previous byte and
all zeros.  
The chance of a match will still be small and putting that
into the  
dictionary will be a 9 bit field with 0s. So in the first 2
bytes, I  
can almost guarantee I know where 2 zero bits are.

2) BZ2 transforms the data and then uses LZW. See 1)....

The correct way to "improve" almost random data
is to process it with  
a hash like function with compression.

Jim


------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe
cryptography" to majordomometzdowd.com
compressing randomly-generated numbers
user name
2006-08-11 05:43:01
On Wed, 9 Aug 2006, Travis H. wrote:

> Hey,
> 
> I was mulling over some old emails about
randomly-generated numbers
> and realized that if I had an imperfectly random source
(something
> less than 100% unpredictable), that compressing the
output would
> compress it to the point where it was nearly so.  Would
there be any
> reason to choose one algorithm over another for this
application?

What are you trying to achieve?

An ARC4 keystream derived from fixed key is 100% predictable
and yet
cannot be compressed by anything you would normally call a
"compression
algorithm".

-d


------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe
cryptography" to majordomometzdowd.com
[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )