List Info

Thread: Starting to think about sha-256?




Starting to think about sha-256?
user name
2006-08-27 17:56:07
Recent press[1] is talking about sha-1 collisions again. 
Even though 
the reported attack was against a weakened variant of sha-1
(64, not 80, 
passes), it serves as a useful point to start talking about
the future.

I argue that sha-256 is better suited to git's purposes,
and to modern 
machines, than sha-1.

Upsides to sha-256:
* not just a bit increase, but a stronger algorithm.  there
is more 
mixing, doing a more-than-incrementally better job at
avoiding collisions.
* the bit increase itself provides more hash space,
theoretically 
reducing collisions.
* properly aligned, a set of 32-byte hashes won't straddle
CPU cachelines.

Downsides to sha-256:
* git protocol/storage format change implications.
* increase in storage size (20 to 32 bytes per hash).
* fewer hand-optimized algorithm variants have been
implemented.
* likely more CPU cycles per hash, though I haven't
measured.

Wikimedia page has lotsa info: 
ht
tp://en.wikipedia.org/wiki/Secure_Hash_Algorithm

Maybe sha-256 could be considered for the next major-rev of
git?

	Jeff


[1] http://www
.heise-security.co.uk/news/77244

-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomovger.kernel.org
More majordomo info at  http://vge
r.kernel.org/majordomo-info.html
Starting to think about sha-256?
user name
2006-08-27 20:30:12
Jeff Garzik <jeffgarzik.org> writes:

> Downsides to sha-256:
> * git protocol/storage format change implications.

The only which really matters, I think.

> Maybe sha-256 could be considered for the next
major-rev of git?

Not sure, but _if_ we want it we should do it sooner rather
than
later.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomovger.kernel.org
More majordomo info at  http://vge
r.kernel.org/majordomo-info.html
Starting to think about sha-256?
user name
2006-08-27 20:46:57

On Sun, 27 Aug 2006, Krzysztof Halasa wrote:
> 
> > Maybe sha-256 could be considered for the next
major-rev of git?
> 
> Not sure, but _if_ we want it we should do it sooner
rather than
> later.

Modifying git-convert-objects.c to rewrite the regular sha1
into a sha256 
should be fairly straightforward. It's never been used
since the early 
days (and has limits like a maximum of a million objects etc
that can need 
fixing), but it shouldn't be "fundamentally
hard" per se.

		Linus
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomovger.kernel.org
More majordomo info at  http://vge
r.kernel.org/majordomo-info.html
Starting to think about sha-256?
user name
2006-08-27 21:14:34
Linus Torvalds <torvaldsosdl.org> writes:

> Modifying git-convert-objects.c to rewrite the regular
sha1 into a sha256 
> should be fairly straightforward. It's never been used
since the early 
> days (and has limits like a maximum of a million
objects etc that can need 
> fixing), but it shouldn't be "fundamentally
hard" per se.

Sure. I was rather thinking of rapidly increasing number of
git
repositories, each with growing history.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomovger.kernel.org
More majordomo info at  http://vge
r.kernel.org/majordomo-info.html
Starting to think about sha-256?
user name
2006-08-27 22:02:39
Hi,

On Sun, 27 Aug 2006, Linus Torvalds wrote:

> On Sun, 27 Aug 2006, Krzysztof Halasa wrote:
> > 
> > > Maybe sha-256 could be considered for the
next major-rev of git?
> > 
> > Not sure, but _if_ we want it we should do it
sooner rather than
> > later.
> 
> Modifying git-convert-objects.c to rewrite the regular
sha1 into a sha256 
> should be fairly straightforward. It's never been used
since the early 
> days (and has limits like a maximum of a million
objects etc that can need 
> fixing), but it shouldn't be "fundamentally
hard" per se.

But what about signed tags? (This issue has come up before,
but never has 
been adressed.)

I also thought about supporting hybrid hashes, i.e. that
older objects 
still can be hashed with SHA-1. Alas, a simple thought
experiment 
demonstrates how silly that idea is: most of the objects
will not change 
between two revisions, and they'd have to be rehashed with
SHA-256 (or 
whatever we decide upon) anyway, so hybrids would do no
good.

A better idea would be to increment the repository version,
and expect 
SHA-1 for version 1, SHA-256 for version >= 2.

However, I could imagine that we do not need this huge
change (it would 
break _many_ setups). The breakthrough was announced last
Tuesday, and it 
involved 75% payload, i.e. to fake a new -- say -- git.c,
one would need 
to enlarge git.c by a factor 4, and you would see a lot of
gibberish 
inside some comment. (Note that I did not listen to the talk
myself, this 
is all deducted from the scarce information which is
available via the 
'net.)

Even if the breakthrough really comes to full SHA-1, you
still have to add 
_at least_ 20 bytes of gibberish. Which would be harder to
spot, but it 
would be spotted.

This made me think about the use of hashes in git. Why do we
need a hash 
here (in no particular order):

1) integrity checking,
2) fast lookup,
3) identifying objects (related to (2)),
4) trust.

Except for (4), I do not see why SHA-1 -- even if broken --
should not be 
adequate. It is not like somebody found out that all JPGs
tend to have 
similar hashes so that collisions are more likely.

And thinking about trust: The hash is augmented by thinking
persons. It is 
not like you blindly trust a person forever. You build up
trust, and once 
you were failed, the trust is lost, and very hard to build
up again. So, 
you just would try to get all objects again from somebody
you still trust, 
and never pull from the loser^H^H^H^H^Huntrusted person
again. Ever.

Besides, as has been pointed out several times, a dishonest
person could 
try to sneak bad code into your repository _regardless_ of a
secure hash.

So: Do we really need a secure hash, or do we need an
adequate hash, and 
just happen to use one which was intended as a secure hash,
but no longer 
is?

Ciao,
Dscho

-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomovger.kernel.org
More majordomo info at  http://vge
r.kernel.org/majordomo-info.html
Starting to think about sha-256?
user name
2006-08-27 22:35:20

On Mon, 28 Aug 2006, Johannes Schindelin wrote:
> > 
> > Modifying git-convert-objects.c to rewrite the
regular sha1 into a sha256 
> > should be fairly straightforward. It's never been
used since the early 
> > days (and has limits like a maximum of a million
objects etc that can need 
> > fixing), but it shouldn't be "fundamentally
hard" per se.
> 
> But what about signed tags? (This issue has come up
before, but never has 
> been adressed.)

Signed tags fundamentally have to be re-signed. That's by
design: if 
somebody could rewrite an archive and signed tags would
still be accepted 
to have the right signature, that would be a _serious_ sign
of a totally 
broken security model.

The git security model isn't broken.

> I also thought about supporting hybrid hashes, i.e.
that older objects 
> still can be hashed with SHA-1. Alas, a simple thought
experiment 
> demonstrates how silly that idea is: most of the
objects will not change 
> between two revisions, and they'd have to be rehashed
with SHA-256 (or 
> whatever we decide upon) anyway, so hybrids would do no
good.

Indeed. Hybrids would not only do no good, but they would
actually 
_actively_ hurt things, because they'd fundamentally break
the notion that 
the hash being identical means that the object (blob, tree,
subtree) is 
the same.

So allowing two names for the same object is very
fundamentally wrong in 
git-speak. 

> A better idea would be to increment the repository
version, and expect 
> SHA-1 for version 1, SHA-256 for version >= 2.

Yes. It would be reasonably painful for users, though (as
Krzysztof 
correctly points out). Every client would have to convert
when a 
repository they track is converted.

> Even if the breakthrough really comes to full SHA-1,
you still have to add 
> _at least_ 20 bytes of gibberish. Which would be harder
to spot, but it 
> would be spotted.

Yeah, I don't think this is at all critical, especially
since git really 
on a security level doesn't _depend_ on the hashes being
cryptographically 
secure. As I explained early on (ie over a year ago, back
when the whole 
design of git was being discussed), the _security_ of git
actually depends 
on not cryptographic hashes, but simply on everybody being
able to secure 
their own _private_ repository.

So the only thing git really _requires_ is a hash that is
_unique_ for the 
developer (and there we are talking not of an _attacker_,
but a benign 
participant).

That said, the cryptographic security of SHA-1 is obviously
a real bonus. 
So I'd be disappointed if SHA-1 can be broken more easily
(and I obviously 
already argued against using MD5, exactly because generating
duplicates of 
that is fairly easy). But it's not "fundamentally
required" in git per se.

[ The one exception: the "signed tags" security
does depend on the hashes 
  being cryptographically strong. So again, breaking SHA-1
would not mean 
  that git stops working, but it _would_ potentially mean
that if you 
  don't trust your own _private_ repository, the signed tag
may no longer 
  protect you entirely ]

> This made me think about the use of hashes in git. Why
do we need a hash 
> here (in no particular order):
> 
> 1) integrity checking,
> 2) fast lookup,
> 3) identifying objects (related to (2)),
> 4) trust.
> 
> Except for (4), I do not see why SHA-1 -- even if
broken -- should not be 
> adequate. It is not like somebody found out that all
JPGs tend to have 
> similar hashes so that collisions are more likely.

Correct. I'm pretty sure we had exactly this discussion
around May 2005, 
but I'm too lazy to search ;)

		Linus
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomovger.kernel.org
More majordomo info at  http://vge
r.kernel.org/majordomo-info.html
Starting to think about sha-256?
user name
2006-08-28 17:27:29
--On Sunday, August 27, 2006 03:35:20 PM -0700 Linus
Torvalds 
<torvaldsosdl.org> wrote:
>
> On Mon, 28 Aug 2006, Johannes Schindelin wrote:
>> Even if the breakthrough really comes to full
SHA-1, you still have to
>> add  _at least_ 20 bytes of gibberish. Which would
be harder to spot,
>> but it  would be spotted.
>
> Yeah, I don't think this is at all critical,
especially since git really
> on a security level doesn't _depend_ on the hashes
being
> cryptographically  secure. As I explained early on (ie
over a year ago,
> back when the whole  design of git was being
discussed), the _security_
> of git actually depends  on not cryptographic hashes,
but simply on
> everybody being able to secure  their own _private_
repository.
>
> So the only thing git really _requires_ is a hash that
is _unique_ for
> the  developer (and there we are talking not of an
_attacker_, but a
> benign  participant).
>
> That said, the cryptographic security of SHA-1 is
obviously a real bonus.
> So I'd be disappointed if SHA-1 can be broken more
easily (and I
> obviously  already argued against using MD5, exactly
because generating
> duplicates of  that is fairly easy). But it's not
"fundamentally
> required" in git per se.


>> This made me think about the use of hashes in git.
Why do we need a hash
>> here (in no particular order):
>>
>> 1) integrity checking,
>> 2) fast lookup,
>> 3) identifying objects (related to (2)),
>> 4) trust.
>>
>> Except for (4), I do not see why SHA-1 -- even if
broken -- should not
>> be  adequate. It is not like somebody found out
that all JPGs tend to
>> have  similar hashes so that collisions are more
likely.
>
> Correct. I'm pretty sure we had exactly this
discussion around May 2005,
> but I'm too lazy to search ;)

just to double check.

if you already have a file A in git with hash X is there any
condition 
where a remote file with hash X (but different contents)
would overwrite 
the local version?

what would happen if you ended up with two packs that both
contained a file 
with hash X but with different contents and then did a
repack on them? 
(either packs from different sources, or packs downloaded
through some 
mechanism other then the git protocol are two ways this
could happen that I 
can think of)

David Lang
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomovger.kernel.org
More majordomo info at  http://vge
r.kernel.org/majordomo-info.html
Starting to think about sha-256?
user name
2006-08-28 17:56:01

On Mon, 28 Aug 2006, David Lang wrote:
> 
> just to double check.
> 
> if you already have a file A in git with hash X is
there any condition where a
> remote file with hash X (but different contents) would
overwrite the local
> version?

Nope. If it has the same SHA1, it means that when we receive
the object 
from the other end, we will _not_ overwrite the object we
already have.

So what happens is that if we ever see a collision, the
"earlier" object 
in any particular repository will always end up overriding.
But note that 
"earlier" is obviously per-repository, in the
sense that the git object 
network generates a DAG that is not fully ordered, so while
different 
repositories will agree about what is "earlier"
in the case of direct 
ancestry, if the object came through separate and not
directly related 
branches, two different repos may obviously have gotten the
two objects in 
different order.

However, the "earlier will override" is very
much what you want from a 
security standpoint: remember that the git model is that you
should 
primarily trust only your _own_ repository. So if you do a
"git pull", the 
new incoming objects are by definition less trustworthy than
the objects 
you already have, and as such it would be wrong to allow a
new object to 
replace an old one.

So you have two cases of collision:

 - the inadvertent kind, where you somehow are very very
unlucky, and two 
   files end up having the same SHA1. At that point, what
happens is that 
   when you commit that file (or do a
"git-update-index" to move it into 
   the index, but not committed yet), the SHA1 of the new
contents will be 
   computed, but since it matches an old object, a new
object won't be 
   created, and the commit-or-index ends up pointing to the
_old_ object.

   You won't notice immediately (since the index will match
the old object 
   SHA1, and that means that something like "git
diff" will use the 
   checked-out copy), but if you ever do a tree-level diff
(or you 
   do a clone or pull, or force a checkout) you'll suddenly
notice that 
   that file has changed to something _completely_ different
than what you 
   expected. So you would generally notice this kind of
collision fairly 
   quickly.

   In related news, the question is what to do about the
inadvertent 
   collision.. First off, let me remind people that the
inadvertent kind 
   of collision is really really _really_ damn unlikely, so
we'll quite 
   likely never ever see it in the full history of the
universe. But _if_ 
   it happens, it's not the end of the world: what you'd
most likely have 
   to do is just change the file that collided slightly, and
just force a 
   new commit with the changed contents (add a comment
saying "/* This 
   line added to avoid collision */") and then teach
git about the magic 
   SHA1 that has been shown to be dangerous.

   So over a couple of million years, maybe we'll have to
add one or two 
   "poisoned" SHA1 values to git. It's very
unlikely to be a maintenance 
   problem ;)

 - The attacker kind of collision because somebody broke (or
brute-forced) 
   SHA1.

   This one is clearly a _lot_ more likely than the
inadvertent kind, but 
   by definition it's always a "remote"
repository. If the attacker had 
   access to the local repository, he'd have much easier
ways to screw you 
   up.

   So in this case, the collision is entirely a non-issue:
you'll get a 
   "bad" repository that is different from what
the attacker intended, but 
   since you'll never actually use his colliding object,
it's _literally_ 
   no different from the attacker just not having found a
collision at 
   all, but just using the object you already had (ie it's
100% equivalent 
   to the "trivial" collision of the identical
file generating the same 
   SHA1).

> what would happen if you ended up with two packs that
both contained a file
> with hash X but with different contents and then did a
repack on them? (either
> packs from different sources, or packs downloaded
through some mechanism other
> then the git protocol are two ways this could happen
that I can think of)

See above. The only _dangerous_ kind of collision is the
inadvertent kind, 
but that's obviously also the very very unlikely kind.

			Linus
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomovger.kernel.org
More majordomo info at  http://vge
r.kernel.org/majordomo-info.html
Starting to think about sha-256?
user name
2006-08-28 18:06:27

On Mon, 28 Aug 2006, Linus Torvalds wrote:
> 
>  - The attacker kind of collision because somebody
broke (or brute-forced) 
>    SHA1.
> 
>    This one is clearly a _lot_ more likely than the
inadvertent kind, but 
>    by definition it's always a "remote"
repository. If the attacker had 
>    access to the local repository, he'd have much
easier ways to screw you 
>    up.
> 
>    So in this case, the collision is entirely a
non-issue: you'll get a 
>    "bad" repository that is different from
what the attacker intended, but 
>    since you'll never actually use his colliding
object, it's _literally_ 
>    no different from the attacker just not having found
a collision at 
>    all, but just using the object you already had (ie
it's 100% equivalent 
>    to the "trivial" collision of the
identical file generating the same 
>    SHA1).

Btw, this is obviously only true for the native git protocol
itself.

If the attacker can fool you into generating the new file
_yourself_, he 
can cause your checked-out copy to not match the git object
database any 
more.

In other words, one "interesting" attack vector
is to feed you the 
colliding SHA1 not through a git-to-git transfer, but by
generating a 
_patch_ that when applied will generate the collision, so
that when you 
then commit that patch, you get something else than you
expected.

And _this_ is where it's important that the hash that git
uses be a 
non-trivial one - ie we don't want people to be able to
generate two files 
that look superficially "ok".

So here's the rule: If you ever get a patch that looks like
line-noise, 
especially from somebody you don't trust, DON'T APPLY IT!

Now, that is obviously something you should never do
_regardless_ of any 
git issues, so I don't think this is really a problem
either. If you apply 
patches from people you don't have a good reason to trust
without 
sanity-checking them, you deserve whatever you get, and
quite frankly, a 
SHA1 hash collision is the _least_ of your problems ;)

(This ends up boiling down to one common issue: it's
generally _much_ 
easier to attack a project through _other_ means than
through a hash 
collision. And I pretty much guarantee that that is the case
even if we 
were to use a much weaker hash, like MD5. Hash collisions
fundamentally 
just aren't good attack vectors, and it's a hell of a lot
easier to try 
to insert bad code by other means)

			Linus
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomovger.kernel.org
More majordomo info at  http://vge
r.kernel.org/majordomo-info.html
Starting to think about sha-256?
user name
2006-08-28 18:32:52
On Mon, Aug 28, 2006 at 10:56:01AM -0700, Linus Torvalds
wrote:

> However, the "earlier will override" is
very much what you want from a 
> security standpoint: remember that the git model is
that you should 
> primarily trust only your _own_ repository. So if you
do a "git pull", the 

This concept breaks down somewhat if you are pulling from
two
repositories (one good and one evil). If I pull from the
evil repo
first, that will become my "earlier" object, and
I will never get the
colliding object from the good repo.

Executing such an attack might not be that hard, either
(once we get
over that little hump of creating collisions at will!). The
owner of
'evil' has to know a SHA1 that will be in 'good' before
it makes it to
'good'. However, I imagine we frequently see SHA1s migrate
from more
central repos (like .../torvalds/linux-2.6.git) to less
central ones
(subsystem / port maintainers, etc).

-Peff
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomovger.kernel.org
More majordomo info at  http://vge
r.kernel.org/majordomo-info.html
[1-10] [11-17]

about | contact  Other archives ( Real Estate discussion Medical topics )