List Info

Thread: Re: analysis and implementation of LRW




Re: analysis and implementation of LRW
user name
2007-01-23 20:23:50
Thanks to everyone who responded with more information about IEEE P1619. Here are some of the additional links, with my reactions: Andrea Pasquinucci points to: > http://en.wikipedia.org/wiki/IEEE_P1619#LRW_issue Ben Laurie points to: > http://grouper.ieee.org/groups/1619/email/msg00558.html Wikipedia points to two concerns with LRW: (1) LRW isn't secure if you use it to encrypt part of the key; (2) something having to do with collisions. For these reasons, Wikipedia says that IEEE P1619 is moving to XEX-AES. I think (1) is a valid concern and a legitimate reason for IEEE P1619 to move to another mode. XEX-AES is a great mode and this seems like a solid move for IEEE P1619. XEX-AES rests on solid foundations, and there are good grounds for confidence in its design. I would add one caveat, though. I am not aware of any proof that XEX-AES -- or any other mode, for that matter -- is secure when used to encrypt its own key. This is not a flaw in XEX-AES, but rather a generic property of standard models of security for symmetric-key encryption. So I wouldn't be inclined to get too comfortable with the idea of encrypting the key under itself. I'm not 100% certain I follow what (2) is trying to get at, but it sounds to me like a non-issue. One interpretation of (2) is that the concern is that if part of the key is chosen in a non-uniform way (say, as a password), then LRW is insecure. Of course, you should not use any mode in that way, and I don't know of anyone who suggests otherwise. The remedy is straightforward: crypto keys should be truly uniform. This is standard advice that applies to all modes of operation. Another possible interpretation of (2) is that if you use LRW to encrypt close to 2^64 blocks of plaintext, and if you are using a 128-bit block cipher, then you have a significant chance of a birthday collision, which may leak partial information about the plaintext or key. That's absolutely true, though it is pretty much a standard feature of any mode of operation based on 128-bit block ciphers. Standard advice is to change keys long before that happens, and that advice doesn't seem terribly hard to follow. (See, e.g., my prior post on this subject for evidence that this doesn't seem likely to be a serious problem for current disk encryption applications. That's fortunate for narrow-block cryptography, because otherwise none of the solutions would be acceptable.) So it sounds like concern (2) is a bit of a red herring, and LRW is still ok for applications that won't be used to encrypt the key or any material derived from the key. The good news out of IEEE P1619 is that a number of excellent modes of operation are coming out of that effort, and other applications should be able to take advantage of the good work that P1619 is doing. This is good stuff. Disclaimer: Of course, LRW is of personal interest to me, so I'm sure I'm biased. Form your own opinions accordingly. --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to majordomometzdowd.com
Re: analysis and implementation of LRW
user name
2007-01-24 17:28:50
David Wagner wrote: [snip] > Another possible interpretation of (2) is that if you use LRW to encrypt > close to 2^64 blocks of plaintext, and if you are using a 128-bit block > cipher, then you have a significant chance of a birthday collision, Am I doing the math correctly that 2^64 blocks of 128 bits is 2^32 bytes or about 4 gigs of data? Or am I looking at this the wrong way? If 4 gigs is right, would it then be records to look for to break the code via birthday attacks would be things like seismic data, which tend to be very large. Feed a known file in and look at the output and use that to find the key for the unknown files? As you can tell, my interests are often the vectors, not the exact details of how to achieve the crack. Currently I'm dealing with very large - though not as large as 4 gig - x-ray, MRI, and similar files that have to be protected for the lifespan of the person, which could be 70+ years after the medical record is created. Think of the MRI of a kid to scan for some condition that may be genetic in origin and has to be monitored and compared with more recent results their whole life. Thanks, Allen --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to majordomometzdowd.com
data under one key, was Re: analysis and implementation of LRW
user name
2007-01-27 21:08:34
On Wed, Jan 24, 2007 at 03:28:50PM -0800, Allen wrote:
> If 4 gigs is right, would it then be records to look
for to break 
> the code via birthday attacks would be things like
seismic data,

In case anyone else couldn't parse this, he means "the
amount of
encrypted material necessary to break the key would be
large" or
"the size of a lookup table would be large" or
something like
that.

> Currently I'm dealing 
> with very large - though not as large as 4 gig - x-ray,
MRI, and 
> similar files that have to be protected for the
lifespan of the 
> person, which could be 70+ years after the medical
record is 
> created. Think of the MRI of a kid to scan for some
condition 
> that may be genetic in origin and has to be monitored
and 
> compared with more recent results their whole life.

That's longer than computers have been available, and also
longer
than modern cryptography has existed.  The only way I would
propose
to be able to stay secure that long is either:
1) use a random key as large as the plaintext
(one-time-pad)
2) prevent the ciphertext from leaking
   (quantum crypto, spread-spectrum communication,
steganography)

Even then, I doubt Lloyd's would insure it.  Anyone who
claims to know
what the state of the art will be like in 70+ years is a
fool.  I
would be cautious about extrapolating more than five years.

The problem is not the amount of data under one key; that's
easy
enough, generate random keys for every n bits of plaintext
and encrypt
them with a meta-key, creating a two-level hierarchy.  You
calculate a
information-theoretic bound on n by computing the entropy of
the
plaintext and the unicity distance of the cipher.  Note that
the data
(keys) encrypted directly with the meta-key is completely
random, so
the unicity distance is infinite.  Furthermore, one can't
easily
brute-force the meta-key by trying the decrypted normal keys
on the
ciphertext because all the plaintext under one key
equivocates because
it is smaller than the unicity distance.  I'm not sure how
it
compounds when the meta-key encrypts multiple keys, I'd have
to look
into that.  In any case, you can create a deeper and deeper
hierarchy
as you go along.

This bound is the limit for information-theoretic, or
unconditional
security.  Shannon proved that a system with these
characteristics is
unbreakable.  If you don't know what the entropy of the
plaintext is,
you have to use a one-time pad.  The unicity distance of
DES, last
time I looked, was so low that one might as well use a
one-time pad.

With computational security, you can fudge a little by
trying to
calculate how much data you can safely encrypt under one
key.
However, I believe this value can only go down over time, as
new
cryptanalytic attacks are developed against the cipher.

Another method is to derive many data keys from bits of a
larger
meta-key in a way that is computationally infeasible. 
However, every
time you hear "computationally infeasible",
remember that it is an
argument of ignorance; we don't know an efficient way to
break it,
yet, or if someone does they aren't talking.

You can also make this argument more "scientific"
by extrapolating
future attacks and computational advances from trends
(Moore's Law
et. al.) - see "Rules of Thumb in Data
Engineering" from Microsoft;
it's on the storagemojo.com blog and well worth reading.

Furthermore, you should provide a mechanism for the crypto
to be
changed transparently as technology progresses; an installed
base is
forever, but computational security is not.  Permitting
multiple
security configurations is complex, but I don't think
anything short
of OTP can give an absolute assurance of confidentiality
when the
opponent has access to the plaintext.

Another simple solution, the belt-and-suspenders method, is
to
superencrypt the ciphertext with a structurally different
cipher.
This basically makes the plaintext fed to the top-level
cipher
computationally indistinguishable from random data, and so
the unicity
distance of the top-level cipher is infinite according to
computational security of the lower-level cipher.  I'm
mixing
terminology here, but the net result is that you're
guaranteed that
the combination is as secure as either alone, and in most
cases a
weakness in one cipher will not be a weakness in the other
(this
is because of the "structurally independent"
assumption).

You get the same effect by sending an encrypted file over an
encrypted
network connection (unless the file is converted to base64
or
something prior to transmission), assuming that the opponent
is not an
insider with access to decrypted network traffic.




Some assumptions to consider are:

What problem(s) are we trying to solve, and why?

Can we set up a secure distribution network for key
material?

Who is the opponent?  How many years will they remain
interested in a
captured record?

What is the budget?

Who are the users?  How competent are they?  How much can we
educate them?

How will we fix bugs or update it?

What are the security priorities?
(usability, confidentiality, authenticity, integrity,
availability,
identification, authorization, repudiation)

When the system fails, does it fail-safe or fail unsafe? 
What is
the backup method when it fails?  Can the main system be
forced to
fail by the opponent?

What are the legal and economic considerations?  That is,
are the
people who can best secure the system also the ones with the
financial
liability?  If the patient is the beneficiary of the
confidentiality,
do they have any control over which system they use, or do
they have
any ability to make sure the system is secure?  Who bears
the cost of
the system?  Ultimately, a patient should be able to choose
between
priorities; if my drug allergies were stored on the system,
and I had
a heart attack and passed out, confidentiality and
self-authorization
would not be my primary concern.  It's easy to imagine other
scenarios
with other priorities.

In my experience, the last consideration (legal and
economic) is the
most common reason why systems fail.  Compare the security
of voting
systems versus the security of slot machines and you'll see
exactly
how economics and self-interest trumps everything else.

What you want to do is terribly difficult to do with any
assurance,
unless you're willing to spend a lot of money on it.  I
would
recommend figuring out what you're trying to do, then hiring
some
consultants; cryptographers, systems analysts, penetration
testers,
professional engineeers on reliability, and so on. 
Brainstorm.
Propose ideas and shoot them down.  If your funds are
limited,
publication on the web, trade magazines, and a public email
list plus
referendum may be the best way to get free design critiques.
 Since
"all bugs are shallow to one set of eyes", so too
most weaknesses are
obvious to some brain, and most failures are predictable by
someone.
The closer they are to the problem domain, the more valuable
they are,
but even stopped clocks are right twice a day.

I'm thinking about unconditional security, and will write up
a
proposed design soon.  I'll send it around when it's ready
for public
vetting.
-- 
``Unthinking respect for authority is the greatest enemy of
truth.''
-- Albert Einstein -><- <URL:http://www.
subspacefield.org/~travis/>
For a good time on my UBE blacklist, email johnsubspacefield.org.
Re: data under one key, was Re: analysis and implementation of LRW
user name
2007-01-30 13:02:19

Travis H. wrote:
> On Wed, Jan 24, 2007 at 03:28:50PM -0800, Allen wrote:
>> If 4 gigs is right, would it then be records to
look for to break 
>> the code via birthday attacks would be things like
seismic data,
> 
> In case anyone else couldn't parse this, he means
"the amount of
> encrypted material necessary to break the key would be
large" or
> "the size of a lookup table would be large"
or something like
> that.

Thanks for attempting to fix my badly worded post. What I
think I 
really meant is that the data quantity is so large there
would be 
  key re-use, allowing attack that way.

> 
>> Currently I'm dealing 
>> with very large - though not as large as 4 gig -
x-ray, MRI, and 
>> similar files that have to be protected for the
lifespan of the 
>> person, which could be 70+ years after the medical
record is 
>> created. Think of the MRI of a kid to scan for some
condition 
>> that may be genetic in origin and has to be
monitored and 
>> compared with more recent results their whole
life.
> 
> That's longer than computers have been available, and
also longer
> than modern cryptography has existed.  The only way I
would propose
> to be able to stay secure that long is either:
> 1) use a random key as large as the plaintext
(one-time-pad)

I can't imagine any way of managing the number of
one-time-pads 
that would be needed for 70+ years of medical records of 6+

million patients.

> 2) prevent the ciphertext from leaking
>    (quantum crypto, spread-spectrum communication,
steganography)

Alas, still not practical in large real-world scenarios, if
I 
understand what I've seen so far. Maybe in 20 years.
> 
> Even then, I doubt Lloyd's would insure it.  Anyone who
claims to know
> what the state of the art will be like in 70+ years is
a fool.  I
> would be cautious about extrapolating more than five
years.

[snip]

I'll skip the rest of your excellent, and thought provoking
post 
as it is future and I'm looking at now.

 From what you've written and other material I've read, it
is 
clear that even if the horizon isn't as short as five years,
it 
is certainly shorter than 70. Given that it appears what has
to 
be done is the same as the audio industry has had to do with
30 
year old master tapes when they discovered that the binder
that 
held the oxide to the backing was becoming gummy and
shedding the 
music as the tape was playing - reconstruct the data and 
re-encode it using more up to date technology.

I guess we will have grunt jobs for a long time to come.


Best,

Allen

------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography"
to majordomometzdowd.com

Re: data under one key, was Re: analysis and implementation of LRW
country flaguser name
Russian Federation
2007-02-04 02:33:30
Allen wrote on 31.01.2007 01:02:
> I'll skip the rest of your excellent, and thought
provoking post as it
> is future and I'm looking at now.
>
> From what you've written and other material I've read,
it is clear that
> even if the horizon isn't as short as five years, it is
certainly
> shorter than 70. Given that it appears what has to be
done is the same
> as the audio industry has had to do with 30 year old
master tapes when
> they discovered that the binder that held the oxide to
the backing was
> becoming gummy and shedding the music as the tape was
playing -
> reconstruct the data and re-encode it using more up to
date technology.
>
> I guess we will have grunt jobs for a long time to
come. 

I think you underestimate what Travis said about ensurance
on a
long-term encrypted data. If an attacker can (and it is very
likely) now
obtain your ciphertext encrypted with a scheme that isn't
strong in
70-years perspective, he will be able to break the scheme in
the future
when technology and science allows it, effectively
compromising [part
of] your clients private data, despite your efforts to
re-encrypt it
later with improved scheme.

The point is that encryption scheme for long-term secrets
must be strong
from the beginning to the end of the data needed to stay
secret.

-- 
SATtva
www.vladmiller.info
www.pgpru.com


Re: data under one key, was Re: analysis and implementation of LRW
user name
2007-02-04 17:40:11

Vlad "SATtva" Miller wrote:
> Allen wrote on 31.01.2007 01:02:
>> I'll skip the rest of your excellent, and thought
provoking post as it
>> is future and I'm looking at now.
>>
>> From what you've written and other material I've
read, it is clear that
>> even if the horizon isn't as short as five years,
it is certainly
>> shorter than 70. Given that it appears what has to
be done is the same
>> as the audio industry has had to do with 30 year
old master tapes when
>> they discovered that the binder that held the oxide
to the backing was
>> becoming gummy and shedding the music as the tape
was playing -
>> reconstruct the data and re-encode it using more up
to date technology.
>>
>> I guess we will have grunt jobs for a long time to
come. 
> 
> I think you underestimate what Travis said about
ensurance on a
> long-term encrypted data. If an attacker can (and it is
very likely) now
> obtain your ciphertext encrypted with a scheme that
isn't strong in
> 70-years perspective, he will be able to break the
scheme in the future
> when technology and science allows it, effectively
compromising [part
> of] your clients private data, despite your efforts to
re-encrypt it
> later with improved scheme.
> 
> The point is that encryption scheme for long-term
secrets must be strong
> from the beginning to the end of the data needed to
stay secret.

Imagine this, if you will. You have a disk with encrypted
data 
and the key to decrypt it. You can take two paths that I can
see:

1. Encrypt the old data and its key with the new, more
robust, 
encryption algorithm and key as you migrate it from the now
aged 
HD which is nearing the end of its lifespan. Then use the
then 
current disk wiping technology of choice to destroy the old
data. 
I think a blast furnace might be a great choice for a long
time 
to come.

2. Decrypt the data using the key and re-encrypt it with the
new 
algorithm using a new key, then migrate it to a new HD.
Afterward 
destroy the old drive/data by your favorite method at the
time. I 
still like the blast furnace as tool of choice.

Both approaches suffer from one defect in common - there is
the 
assumption that the old disk you have the data on is the
only 
copy in existence, clearly a *bad* idea if you should have a

catastrophic failure of the HD or other storage device, so
then 
it boils down to finding all known and unknown copies of the

encrypted data and securely destroying them as well. Not a
safe 
assumption as we know from looking at the history of papers
dug 
up hundreds of years after the original appears to be lost
forever.

Approach 1 also suffers from the problem that we may not
have the 
software readily available waaay down the road to decrypt
the 
many layers of the onion. And that will surely bring tears
to our 
eyes.

Since we know that we can not protect against future
developments 
in cryptanalysis - just look at both linear and differential

analysis versus earlier tools - how do we create an
algorithm 
that is proof against the future? Frankly I don't think it
is 
possible and storing all those one-time pads is too much of
a 
headache, as well as risky, to bother with. So what do we
do?

This is where I think we need to set our sights on
"...good 
enough given what we know now...." This does not mean
sloppy 
thinking, just that at some point you have done the best
humanly 
possible to assess and mitigate risks.

Anyone got better ideas?

Best,

Allen

------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography"
to majordomometzdowd.com

Entropy of other languages
user name
2007-02-04 17:46:41
Hi gang,

An idle question. English has a relatively low entropy as a

language. Don't recall the exact figure, but if you look at
words 
that start with "q" it is very low indeed.

What about other languages? Does anyone know the relative
entropy 
of other alphabetic languages? What about the entropy of 
ideographic languages? Pictographic? Hieroglyphic?

Thanks,

Allen

------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography"
to majordomometzdowd.com

Re: data under one key, was Re: analysis and implementation of LRW
user name
2007-02-04 22:27:00
| > Currently I'm dealing 
| > with very large - though not as large as 4 gig -
x-ray, MRI, and 
| > similar files that have to be protected for the
lifespan of the 
| > person, which could be 70+ years after the medical
record is 
| > created. Think of the MRI of a kid to scan for some
condition 
| > that may be genetic in origin and has to be monitored
and 
| > compared with more recent results their whole life.
| 
| That's longer than computers have been available, and also
longer
| than modern cryptography has existed.  The only way I
would propose
| to be able to stay secure that long is either:
| 1) use a random key as large as the plaintext
(one-time-pad)
...thus illustrating once again both the allure and the
uselessness (in
almost all situations) of one-time pads.  Consider:  I have
4 GB of
data that must remain secure.  I'm afraid it may leak out. 
So I
generate 4 GB of random bits, XOR them, and now have 4 GB of
data
that's fully secure.  I can release it to the world.  The
only
problem is ... what do I do with this 4 GB of random pad?  I
need
to store *it* securely.  But if I can do that ... why
couldn't I
store the 4 GB of original data security to begin with?

*At most*, if I use different, but as secure as I can make
them,
methods for storing *both* 4 GB datasets, then someone would
have to
get *both* to make any sense of the data.  In effect, I've
broken my
secret into two shares, and only someone who can get both
can read it.
I can break it into more shares if I want to - though if I
want
information-theoretic security (presumably the goal here,
since I'm
worried about future attacks against any technique that
relies on
something weaker) each share will end up being the same size
as
the data.

Of course, the same argument can be made for *any*
cryptographic
technique!  The difference is that it seems somewhat easier
to protect
a 128-bit key (or some other reasonable length anything
beyond 256 is
just silly due to fundamental limits on computation:  At 256
bits, either
there is an analytic attack - which is just as likely at
2560 bits, or
running the entire universe as computer to do brute force
attacks won't
give you the answer soon enough to matter) than a 4 GB one. 
It's not
easy to make really solid sense of such a comparison,
however, as our
ability to store more and more data in less and less space
continues
for a couple of generations more.  When CD's first came out,
600 MB
seemed like more than anyone could imagine using as raw
data.  These
days, that's not enough RAM to make a reasonable PC.

I would suggest that we look at how such data has
traditionally been
kept safe.  We have thousands of years of experience in
maintaining
physical security.  That's what we rely on to protect the
*last* 70
years worth of X-ray plates.  In fact, the security on those
is pretty
poor - up until a short while ago, when this stuff started
to be digital
"from birth", at least the last couple of year's
worth of X-rays were
sitting in a room in the basement of the hospital.  The room
was
certainly locked, but it was hardly a bank vault.  Granted,
in digital
form, this stuff is much easier to search, copy, etc. - but
I doubt
that a determined individual would really have much trouble
getting
copies of most people's medical records.  If nothing else,
the combination
of strict hierarchies in hospitals - where the doctor is at
the top -
with the genuine need to deal with emergencies makes social
engineering
particularly easy.

Anyway ... while the question "how can we keep
information secure for
70 years" has some theoretical interest, we have enough
trouble knowing
how to keep digital information *accessible* for even 20
years that it's
hard to know where to reasonably start.  In fact, if you
really want to
be sure those X-rays will be readable in 70 years, you're
probably best
off today putting them on microfiche or using some similar
technology.
Then put the 'fiche in a vault....

							-- Jerry

------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography"
to majordomometzdowd.com

OTP, was Re: data under one key, was Re: analysis and implementation of LRW
country flaguser name
United States
2007-02-05 06:39:35
On Sun, Feb 04, 2007 at 11:27:00PM -0500, Leichter, Jerry
wrote:
> | 1) use a random key as large as the plaintext
(one-time-pad)
> ...thus illustrating once again both the allure and the
uselessness (in
> almost all situations) of one-time pads.

For long-term storage, you are correct, OTP at best gives
you secret
splitting.  However, if people can get at your stored data,
you have
an insider or poor security (network or OS).  Either way,
this is not
necessarily a crypto problem.  The system should use
conventional
crypto to deal with the data remanance problem, but others
have
alleged this is bad or unnecessary or both; I haven't seen
it proven
either way.  In any case, keeping the opponent off your
systems is
less of a crypto problem than a simple access control
problem.

It was my inference that this data must be transmitted
around via some
not-very-secure channels, and so the link could be primed
by
exchanging key material via registered mail, courier, or
whatever
method they felt comfortable with for communicating paper
documents
_now_, or whatever system they would use with key material
in any
other proposed system.  The advantage isn't magical so much
as
practical; you don't have to transmit the pad material every
time you
wish to send a message.  You do have to store it securely
(see above).
You should compose it with a conventional system, for the
best of both
worlds.

Of course any system can be used incorrectly; disclosing a
key or
choosing a bad one can break security in most systems.  So
you already
have a requirement for unpredictability and secure storage
and
confidential transmission of key material (in the case of
symmetric
crypto).  The OTP is the only "cipher" I know of
that hasn't had any
cryptanalytic success against it for over 70 years, and
offers a
proof. [1]

As an aside, it would be interesting to compare data
capacity/density
and networking speeds to see if it is getting harder or
easier to use
OTP to secure a network link.

[1] Cipher meaning discrete symbol-to-symbol encoding. 
OTP's proof
does rely on a good RNG.  I am fully aware that
unpredictability is
just as slippery a topic as resistance to cryptanalysis,
both being
universal statements that can only be proved by a
counterexample, but
that is an engineering or philosophical problem.  By
securely
combining it with a CSPRNG you get the least predictable of
the pair.

Everyone in reliable computing understands that you don't
want single
points of failure.  If someone proposed that they were going
to deploy
a system - any system - that could stay up for 70 years, and
it didn't
have any form of backup or redundancy, and no proof that it
wouldn't
wear down over 70 years (e.g. it has moving parts,
transistors, etc.),
they'd be ridiculed.

And yet every time OTP comes up among cryptographers, the
opposite
happens.

When it comes to analysis, absence of evidence is not
evidence of
absence.

> Anyway ... while the question "how can we keep
information secure for
> 70 years" has some theoretical interest, we have
enough trouble knowing
> how to keep digital information *accessible* for even
20 years that it's
> hard to know where to reasonably start.

I think that any long-term data storage solution would have
to accept two
things:

1) The shelf life is a complete unknown.  By the time we
know it, we will
be using different media, so don't hold your breath.

2) The best way to assure being able to read the data is to
seal up a
seperate instance of the hardware, and to use documented
formats so
you know how to interpret them.  Use some redundancy, too,
with
tolerance of the kind of errors the media is expected to
see.

3) Institutionalize a data refresh policy; have a procedure
for
reading the old data off old media, correcting errors, and
writing it
to new media (see below).

The trend seems to be that I/O capacity is going up much
faster than
I/O bandwidth is increasing, and there doesn't seem to be a
fundamental limitation in the near future, so the data is
"cooling"
rapidly and will continue to do so (in storage jargon,
temperature is
related to how often the data is read or written).

Further, tape is virtually dead, and it looks like
disk-to-disk is the
most pragmatic replacement.  That actually simplifies
things; you can
migrate the data off disks before they near their lifespan
in an
automated way (plug in new computer, transfer data over
direct network
connection, drink coffee).  Or even more simply, stagger
your primary
and backup storage machines, so that 1/2 way through the
MTTF of the
drive, you have a new machine with a new set of drives as
the backup,
do one backup and swap roles.  Now your data refresh and
backup are
handled with the same mechanism.

At least, that's what I'm doing.  YMMV.
-- 
The driving force behind innovation is sublimation.
-><- <URL:http://www.
subspacefield.org/~travis/>
For a good time on my UBE blacklist, email johnsubspacefield.org.
Re: Entropy of other languages
country flaguser name
United States
2007-02-05 15:48:16
On Sun, 04 Feb 2007 15:46:41 -0800
Allen <netsecuritysound-by-design.com> wrote:

> Hi gang,
> 
> An idle question. English has a relatively low entropy
as a language.
> Don't recall the exact figure, but if you look at words
that start
> with "q" it is very low indeed.
> 
> What about other languages? Does anyone know the
relative entropy of
> other alphabetic languages? What about the entropy of
ideographic
> languages? Pictographic? Hieroglyphic?
> 
It should be pretty easy to do at least some experiments
today --
there's a lot of online text in many different languages. 
Have a look
at http://www.gutenber
g.org/catalog/ for freely-available books that
one could mine for statistics.


		--Steve Bellovin, http://www.cs.columbi
a.edu/~smb

------------------------------------------------------------
---------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography"
to majordomometzdowd.com

[1-10] [11-20] [21-26]

about | contact  Other archives ( Real Estate discussion Medical topics )