We are both talking about the same thing
I am not saying there is a finite deterministic algorithm to
compress
every string into "small space", there isn't.
BTW, thanks for "There
is ***NO*** way round the counting theory."
All I wanted to say is:
For a specific structure (e.g. movie, picture, sound) there
is some
"good" compression algorithm.
E.g.: if you take a GIF 65536x65536, all white, with just
one pixel
black, it can be compressed into 35 bytes, see here:
http://i.iinfo.cz/urs-att/gif3_6-115626056883166.gif
If you wanted to compress the same picture using JPEG (i.e.
discrete
cosine transform), then two things would happen:
The compressed jpeg file would be a) much bigger b)
decompressed image
would have "artifacts", because Fourier
transform of a pulse is sync
(infinitely many frequencies). Sure, JPEG is a lossy
compression, but
good enough for photos and images that don't have a lot of
high
frequencies.
Cheers,
OM
On 8/28/06, Dave Korn <dave.korn artimi.com> wrote:
> On 28 August 2006 15:30, Ondrej Mikle wrote:
>
> > Ad. compression algorithm: I conjecture there
exists an algorithm (not
> > necessarily *finite*) that can compress large
numbers
> > (strings/files/...) into "small"
space, more precisely, it can
> > compress number that is N bytes long into O(P(log
N)) bytes, for some
> > polynomial P.
>
> [ My maths isn't quite up to this. Is it
*necessarily* the case that /any/
> polynomial of log N /necessarily/ grows slower than N?
If not, there's
> nothing to discuss, because this is then a conventional
compression scheme in
> which some inputs lead to larger, not smaller, outputs.
Hmm. It would seem
> to me that if P(x)==e^(2x), P(log n) will grow
exponentially faster than N.
> I'm using log to mean natural log here, substitute 10
for e in that formula if
> we're talking about log10, the base isn't important.
However, assuming that
> we're only talking about P that *do* grow more slowly,
I'll discuss the
> problem with this theory anyway. ]
>
>
> There are many, but there are no algorithms that can
compress *all* large
> numbers into small space; for all compression
algorithms, some sets of input
> data must result in *larger* output than input.
>
> There is *no* way round the sheer counting theory
aspect of this. There are
> only 2^N unique files of N bits. If you compress large
files of M bits into
> small files of N bits, and you decompression algorithm
produces deterministic
> output, then you can only possibly generate 2^N files
from the compressed
> ones.
>
> > Take as an example group of Z_p* with p prime (in
another words: DLP).
> > The triplet (Z, p, generator g) is a compression
of a string of p-1
> > numbers, each number about log2(p) bits.
> >
> > (legend: DTM - deterministic Turing machine, NTM -
nondeterministic
> > Turing machine)
> >
> > There exists a way (not necessarily
fast/polynomial with DTM) that a
> > lot of strings can be compressed into the
mentioned triplets. There
> > are \phi(p-1) different strings that can be
compressed with these
> > triplets. Exact number of course depends on
factorization of p-1.
> >
> > Decompression is simple: take generator g and
compute g, g^2, g^3,
> > g^4, ... in Z_p*
>
> This theory has been advanced many times before. The
"Oh, if I can just
> find the right parameters for a PRNG, maybe I can get
it to reconstruct my
> file as if by magic". (Or subsitute FFT, or
wavelet transform, or
> key-expansion algorithm, or ... etc.)
>
> However, there are only as many unique generators as
(2 to the power of the
> number of bits it takes to specify your generator) in
this scheme. And that
> is the maximum number of unique output files you can
generate.
>
> There is ***NO*** way round the counting theory.
>
> > I conjecture that for every permutation on 1..N
there exists a
> > function that compresses the permutation into a
"short"
> > representation.
>
> I'm afraid to tell you that, as always, you will
find the compression stage
> easy.... and the decompression impossible. There are
many many many more
> possible permutations of 1..N than the number of unique
"short
> representations" in the scheme. There is no way
that the smaller number of
> "unique representations" can possibly
produce any more than the same (smaller)
> number of permutations once expanded. There is no way
to represent the other
> (vast majority) of permutations.
>
> > It is possible that only NTM, possibly with
infinite
> > algorithm (e.g. a human) can do it in some
"short finite time".
>
> Then it's not really an algorithm, it's an ad-hoc
collection of different
> schemes. If you're allowed to use a completely
different encryption scheme
> for every file, I can do better than that: for every
file, define an
> encryption scheme where the bit '1' stands for the
content of that file, and
> everything else is represented by a '0' bit followed
by the file itself.
> Sure, most files grow 1 bit bigger, but the only file
we care about is
> compressed to just a single bit! Great!
>
> Unfortunately, all you've done is moved information
around. The amount of
> information you'd have to have in the decompressor to
know which file to
> expand any particular '1' bit into is equal to ....
the amount you've saved by
> compressing the original to a single bit.
>
> There is ***NO*** way round the counting theory.
>
> > Again,
> > I could've/should've proven or disproven the
conjecture, I just don't
> > want to do it - yet
>
> I seriously advise you not to waste much time on it.
Because ...
>
>
>
>
>
>
> There is ***NO*** way round the counting theory.
>
>
>
>
>
>
>
> cheers,
> DaveK
> --
> Can't think of a witty .sigline today....
>
>
------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe
cryptography" to majordomo metzdowd.com
|
On 28 August 2006 17:12, Ondrej Mikle wrote:
> We are both talking about the same thing
Oh!
> I am not saying there is a finite deterministic
algorithm to compress
> every string into "small space", there
isn't. BTW, thanks for "There
> is ***NO*** way round the counting theory."
>
> All I wanted to say is:
> For a specific structure (e.g. movie, picture, sound)
there is some
> "good" compression algorithm.
>
> E.g.: if you take a GIF 65536x65536, all white, with
just one pixel
> black, it can be compressed into 35 bytes, see here:
>
http://i.iinfo.cz/urs-att/gif3_6-115626056883166.gif
> If you wanted to compress the same picture using JPEG
(i.e. discrete
> cosine transform), then two things would happen:
>
> The compressed jpeg file would be a) much bigger b)
decompressed image
> would have "artifacts", because Fourier
transform of a pulse is sync
> (infinitely many frequencies). Sure, JPEG is a lossy
compression, but
> good enough for photos and images that don't have a
lot of high
> frequencies.
Absolutely, absolutely!
A compression algorithm achieves the best results if it is
designed with
statistical knowledge of the target domain taken into
account. In any
compression scheme you're balancing the set of inputs that
grow smaller on
compression against the necessary counterpart of inputs that
grow larger.
Whatever you gain in the first set, you lose in the second.
The secret is to
arrange that the inputs that tend to grow larger are the
ones that are
less-common in real-world usage, and thus that the ones that
are more common
tend to grow smaller. In practice, this means eliminating
'redundancy', where
'redundancy' is defined as 'whatever similar properties
you can tease out from
the more-common-real-world cases'.
Of course, I could point out that there is precisely *1*
bit of information
in that huge GIF, so even compressing it to 35 bytes isn't
a great
achievement... it's one of the set of less-common inputs
that grow bigger as a
compromise so that real pictures, which tend to have at
least *some*
variation, grow smaller.
cheers,
DaveK
--
Can't think of a witty .sigline today....
------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe
cryptography" to majordomo metzdowd.com
|