|
List Info
Thread: Starting to think about sha-256?
|
|
| Starting to think about sha-256? |

|
2006-08-27 17:56:07 |
Recent press[1] is talking about sha-1 collisions again.
Even though
the reported attack was against a weakened variant of sha-1
(64, not 80,
passes), it serves as a useful point to start talking about
the future.
I argue that sha-256 is better suited to git's purposes,
and to modern
machines, than sha-1.
Upsides to sha-256:
* not just a bit increase, but a stronger algorithm. there
is more
mixing, doing a more-than-incrementally better job at
avoiding collisions.
* the bit increase itself provides more hash space,
theoretically
reducing collisions.
* properly aligned, a set of 32-byte hashes won't straddle
CPU cachelines.
Downsides to sha-256:
* git protocol/storage format change implications.
* increase in storage size (20 to 32 bytes per hash).
* fewer hand-optimized algorithm variants have been
implemented.
* likely more CPU cycles per hash, though I haven't
measured.
Wikimedia page has lotsa info:
ht
tp://en.wikipedia.org/wiki/Secure_Hash_Algorithm
Maybe sha-256 could be considered for the next major-rev of
git?
Jeff
[1] http://www
.heise-security.co.uk/news/77244
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
| Starting to think about sha-256? |

|
2006-08-27 20:30:12 |
Jeff Garzik <jeff garzik.org> writes:
> Downsides to sha-256:
> * git protocol/storage format change implications.
The only which really matters, I think.
> Maybe sha-256 could be considered for the next
major-rev of git?
Not sure, but _if_ we want it we should do it sooner rather
than
later.
--
Krzysztof Halasa
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
| Starting to think about sha-256? |

|
2006-08-27 20:46:57 |
On Sun, 27 Aug 2006, Krzysztof Halasa wrote:
>
> > Maybe sha-256 could be considered for the next
major-rev of git?
>
> Not sure, but _if_ we want it we should do it sooner
rather than
> later.
Modifying git-convert-objects.c to rewrite the regular sha1
into a sha256
should be fairly straightforward. It's never been used
since the early
days (and has limits like a maximum of a million objects etc
that can need
fixing), but it shouldn't be "fundamentally
hard" per se.
Linus
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
| Starting to think about sha-256? |

|
2006-08-27 21:14:34 |
Linus Torvalds <torvalds osdl.org> writes:
> Modifying git-convert-objects.c to rewrite the regular
sha1 into a sha256
> should be fairly straightforward. It's never been used
since the early
> days (and has limits like a maximum of a million
objects etc that can need
> fixing), but it shouldn't be "fundamentally
hard" per se.
Sure. I was rather thinking of rapidly increasing number of
git
repositories, each with growing history.
--
Krzysztof Halasa
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
| Starting to think about sha-256? |

|
2006-08-27 22:02:39 |
Hi,
On Sun, 27 Aug 2006, Linus Torvalds wrote:
> On Sun, 27 Aug 2006, Krzysztof Halasa wrote:
> >
> > > Maybe sha-256 could be considered for the
next major-rev of git?
> >
> > Not sure, but _if_ we want it we should do it
sooner rather than
> > later.
>
> Modifying git-convert-objects.c to rewrite the regular
sha1 into a sha256
> should be fairly straightforward. It's never been used
since the early
> days (and has limits like a maximum of a million
objects etc that can need
> fixing), but it shouldn't be "fundamentally
hard" per se.
But what about signed tags? (This issue has come up before,
but never has
been adressed.)
I also thought about supporting hybrid hashes, i.e. that
older objects
still can be hashed with SHA-1. Alas, a simple thought
experiment
demonstrates how silly that idea is: most of the objects
will not change
between two revisions, and they'd have to be rehashed with
SHA-256 (or
whatever we decide upon) anyway, so hybrids would do no
good.
A better idea would be to increment the repository version,
and expect
SHA-1 for version 1, SHA-256 for version >= 2.
However, I could imagine that we do not need this huge
change (it would
break _many_ setups). The breakthrough was announced last
Tuesday, and it
involved 75% payload, i.e. to fake a new -- say -- git.c,
one would need
to enlarge git.c by a factor 4, and you would see a lot of
gibberish
inside some comment. (Note that I did not listen to the talk
myself, this
is all deducted from the scarce information which is
available via the
'net.)
Even if the breakthrough really comes to full SHA-1, you
still have to add
_at least_ 20 bytes of gibberish. Which would be harder to
spot, but it
would be spotted.
This made me think about the use of hashes in git. Why do we
need a hash
here (in no particular order):
1) integrity checking,
2) fast lookup,
3) identifying objects (related to (2)),
4) trust.
Except for (4), I do not see why SHA-1 -- even if broken --
should not be
adequate. It is not like somebody found out that all JPGs
tend to have
similar hashes so that collisions are more likely.
And thinking about trust: The hash is augmented by thinking
persons. It is
not like you blindly trust a person forever. You build up
trust, and once
you were failed, the trust is lost, and very hard to build
up again. So,
you just would try to get all objects again from somebody
you still trust,
and never pull from the loser^H^H^H^H^Huntrusted person
again. Ever.
Besides, as has been pointed out several times, a dishonest
person could
try to sneak bad code into your repository _regardless_ of a
secure hash.
So: Do we really need a secure hash, or do we need an
adequate hash, and
just happen to use one which was intended as a secure hash,
but no longer
is?
Ciao,
Dscho
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
| Starting to think about sha-256? |

|
2006-08-27 22:35:20 |
On Mon, 28 Aug 2006, Johannes Schindelin wrote:
> >
> > Modifying git-convert-objects.c to rewrite the
regular sha1 into a sha256
> > should be fairly straightforward. It's never been
used since the early
> > days (and has limits like a maximum of a million
objects etc that can need
> > fixing), but it shouldn't be "fundamentally
hard" per se.
>
> But what about signed tags? (This issue has come up
before, but never has
> been adressed.)
Signed tags fundamentally have to be re-signed. That's by
design: if
somebody could rewrite an archive and signed tags would
still be accepted
to have the right signature, that would be a _serious_ sign
of a totally
broken security model.
The git security model isn't broken.
> I also thought about supporting hybrid hashes, i.e.
that older objects
> still can be hashed with SHA-1. Alas, a simple thought
experiment
> demonstrates how silly that idea is: most of the
objects will not change
> between two revisions, and they'd have to be rehashed
with SHA-256 (or
> whatever we decide upon) anyway, so hybrids would do no
good.
Indeed. Hybrids would not only do no good, but they would
actually
_actively_ hurt things, because they'd fundamentally break
the notion that
the hash being identical means that the object (blob, tree,
subtree) is
the same.
So allowing two names for the same object is very
fundamentally wrong in
git-speak.
> A better idea would be to increment the repository
version, and expect
> SHA-1 for version 1, SHA-256 for version >= 2.
Yes. It would be reasonably painful for users, though (as
Krzysztof
correctly points out). Every client would have to convert
when a
repository they track is converted.
> Even if the breakthrough really comes to full SHA-1,
you still have to add
> _at least_ 20 bytes of gibberish. Which would be harder
to spot, but it
> would be spotted.
Yeah, I don't think this is at all critical, especially
since git really
on a security level doesn't _depend_ on the hashes being
cryptographically
secure. As I explained early on (ie over a year ago, back
when the whole
design of git was being discussed), the _security_ of git
actually depends
on not cryptographic hashes, but simply on everybody being
able to secure
their own _private_ repository.
So the only thing git really _requires_ is a hash that is
_unique_ for the
developer (and there we are talking not of an _attacker_,
but a benign
participant).
That said, the cryptographic security of SHA-1 is obviously
a real bonus.
So I'd be disappointed if SHA-1 can be broken more easily
(and I obviously
already argued against using MD5, exactly because generating
duplicates of
that is fairly easy). But it's not "fundamentally
required" in git per se.
[ The one exception: the "signed tags" security
does depend on the hashes
being cryptographically strong. So again, breaking SHA-1
would not mean
that git stops working, but it _would_ potentially mean
that if you
don't trust your own _private_ repository, the signed tag
may no longer
protect you entirely ]
> This made me think about the use of hashes in git. Why
do we need a hash
> here (in no particular order):
>
> 1) integrity checking,
> 2) fast lookup,
> 3) identifying objects (related to (2)),
> 4) trust.
>
> Except for (4), I do not see why SHA-1 -- even if
broken -- should not be
> adequate. It is not like somebody found out that all
JPGs tend to have
> similar hashes so that collisions are more likely.
Correct. I'm pretty sure we had exactly this discussion
around May 2005,
but I'm too lazy to search ;)
Linus
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
| Starting to think about sha-256? |

|
2006-08-28 17:27:29 |
--On Sunday, August 27, 2006 03:35:20 PM -0700 Linus
Torvalds
<torvalds osdl.org> wrote:
>
> On Mon, 28 Aug 2006, Johannes Schindelin wrote:
>> Even if the breakthrough really comes to full
SHA-1, you still have to
>> add _at least_ 20 bytes of gibberish. Which would
be harder to spot,
>> but it would be spotted.
>
> Yeah, I don't think this is at all critical,
especially since git really
> on a security level doesn't _depend_ on the hashes
being
> cryptographically secure. As I explained early on (ie
over a year ago,
> back when the whole design of git was being
discussed), the _security_
> of git actually depends on not cryptographic hashes,
but simply on
> everybody being able to secure their own _private_
repository.
>
> So the only thing git really _requires_ is a hash that
is _unique_ for
> the developer (and there we are talking not of an
_attacker_, but a
> benign participant).
>
> That said, the cryptographic security of SHA-1 is
obviously a real bonus.
> So I'd be disappointed if SHA-1 can be broken more
easily (and I
> obviously already argued against using MD5, exactly
because generating
> duplicates of that is fairly easy). But it's not
"fundamentally
> required" in git per se.
>> This made me think about the use of hashes in git.
Why do we need a hash
>> here (in no particular order):
>>
>> 1) integrity checking,
>> 2) fast lookup,
>> 3) identifying objects (related to (2)),
>> 4) trust.
>>
>> Except for (4), I do not see why SHA-1 -- even if
broken -- should not
>> be adequate. It is not like somebody found out
that all JPGs tend to
>> have similar hashes so that collisions are more
likely.
>
> Correct. I'm pretty sure we had exactly this
discussion around May 2005,
> but I'm too lazy to search ;)
just to double check.
if you already have a file A in git with hash X is there any
condition
where a remote file with hash X (but different contents)
would overwrite
the local version?
what would happen if you ended up with two packs that both
contained a file
with hash X but with different contents and then did a
repack on them?
(either packs from different sources, or packs downloaded
through some
mechanism other then the git protocol are two ways this
could happen that I
can think of)
David Lang
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
| Starting to think about sha-256? |

|
2006-08-28 17:56:01 |
On Mon, 28 Aug 2006, David Lang wrote:
>
> just to double check.
>
> if you already have a file A in git with hash X is
there any condition where a
> remote file with hash X (but different contents) would
overwrite the local
> version?
Nope. If it has the same SHA1, it means that when we receive
the object
from the other end, we will _not_ overwrite the object we
already have.
So what happens is that if we ever see a collision, the
"earlier" object
in any particular repository will always end up overriding.
But note that
"earlier" is obviously per-repository, in the
sense that the git object
network generates a DAG that is not fully ordered, so while
different
repositories will agree about what is "earlier"
in the case of direct
ancestry, if the object came through separate and not
directly related
branches, two different repos may obviously have gotten the
two objects in
different order.
However, the "earlier will override" is very
much what you want from a
security standpoint: remember that the git model is that you
should
primarily trust only your _own_ repository. So if you do a
"git pull", the
new incoming objects are by definition less trustworthy than
the objects
you already have, and as such it would be wrong to allow a
new object to
replace an old one.
So you have two cases of collision:
- the inadvertent kind, where you somehow are very very
unlucky, and two
files end up having the same SHA1. At that point, what
happens is that
when you commit that file (or do a
"git-update-index" to move it into
the index, but not committed yet), the SHA1 of the new
contents will be
computed, but since it matches an old object, a new
object won't be
created, and the commit-or-index ends up pointing to the
_old_ object.
You won't notice immediately (since the index will match
the old object
SHA1, and that means that something like "git
diff" will use the
checked-out copy), but if you ever do a tree-level diff
(or you
do a clone or pull, or force a checkout) you'll suddenly
notice that
that file has changed to something _completely_ different
than what you
expected. So you would generally notice this kind of
collision fairly
quickly.
In related news, the question is what to do about the
inadvertent
collision.. First off, let me remind people that the
inadvertent kind
of collision is really really _really_ damn unlikely, so
we'll quite
likely never ever see it in the full history of the
universe. But _if_
it happens, it's not the end of the world: what you'd
most likely have
to do is just change the file that collided slightly, and
just force a
new commit with the changed contents (add a comment
saying "/* This
line added to avoid collision */") and then teach
git about the magic
SHA1 that has been shown to be dangerous.
So over a couple of million years, maybe we'll have to
add one or two
"poisoned" SHA1 values to git. It's very
unlikely to be a maintenance
problem ;)
- The attacker kind of collision because somebody broke (or
brute-forced)
SHA1.
This one is clearly a _lot_ more likely than the
inadvertent kind, but
by definition it's always a "remote"
repository. If the attacker had
access to the local repository, he'd have much easier
ways to screw you
up.
So in this case, the collision is entirely a non-issue:
you'll get a
"bad" repository that is different from what
the attacker intended, but
since you'll never actually use his colliding object,
it's _literally_
no different from the attacker just not having found a
collision at
all, but just using the object you already had (ie it's
100% equivalent
to the "trivial" collision of the identical
file generating the same
SHA1).
> what would happen if you ended up with two packs that
both contained a file
> with hash X but with different contents and then did a
repack on them? (either
> packs from different sources, or packs downloaded
through some mechanism other
> then the git protocol are two ways this could happen
that I can think of)
See above. The only _dangerous_ kind of collision is the
inadvertent kind,
but that's obviously also the very very unlikely kind.
Linus
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
| Starting to think about sha-256? |

|
2006-08-28 18:06:27 |
On Mon, 28 Aug 2006, Linus Torvalds wrote:
>
> - The attacker kind of collision because somebody
broke (or brute-forced)
> SHA1.
>
> This one is clearly a _lot_ more likely than the
inadvertent kind, but
> by definition it's always a "remote"
repository. If the attacker had
> access to the local repository, he'd have much
easier ways to screw you
> up.
>
> So in this case, the collision is entirely a
non-issue: you'll get a
> "bad" repository that is different from
what the attacker intended, but
> since you'll never actually use his colliding
object, it's _literally_
> no different from the attacker just not having found
a collision at
> all, but just using the object you already had (ie
it's 100% equivalent
> to the "trivial" collision of the
identical file generating the same
> SHA1).
Btw, this is obviously only true for the native git protocol
itself.
If the attacker can fool you into generating the new file
_yourself_, he
can cause your checked-out copy to not match the git object
database any
more.
In other words, one "interesting" attack vector
is to feed you the
colliding SHA1 not through a git-to-git transfer, but by
generating a
_patch_ that when applied will generate the collision, so
that when you
then commit that patch, you get something else than you
expected.
And _this_ is where it's important that the hash that git
uses be a
non-trivial one - ie we don't want people to be able to
generate two files
that look superficially "ok".
So here's the rule: If you ever get a patch that looks like
line-noise,
especially from somebody you don't trust, DON'T APPLY IT!
Now, that is obviously something you should never do
_regardless_ of any
git issues, so I don't think this is really a problem
either. If you apply
patches from people you don't have a good reason to trust
without
sanity-checking them, you deserve whatever you get, and
quite frankly, a
SHA1 hash collision is the _least_ of your problems ;)
(This ends up boiling down to one common issue: it's
generally _much_
easier to attack a project through _other_ means than
through a hash
collision. And I pretty much guarantee that that is the case
even if we
were to use a much weaker hash, like MD5. Hash collisions
fundamentally
just aren't good attack vectors, and it's a hell of a lot
easier to try
to insert bad code by other means)
Linus
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
| Starting to think about sha-256? |

|
2006-08-28 18:32:52 |
On Mon, Aug 28, 2006 at 10:56:01AM -0700, Linus Torvalds
wrote:
> However, the "earlier will override" is
very much what you want from a
> security standpoint: remember that the git model is
that you should
> primarily trust only your _own_ repository. So if you
do a "git pull", the
This concept breaks down somewhat if you are pulling from
two
repositories (one good and one evil). If I pull from the
evil repo
first, that will become my "earlier" object, and
I will never get the
colliding object from the good repo.
Executing such an attack might not be that hard, either
(once we get
over that little hump of creating collisions at will!). The
owner of
'evil' has to know a SHA1 that will be in 'good' before
it makes it to
'good'. However, I imagine we frequently see SHA1s migrate
from more
central repos (like .../torvalds/linux-2.6.git) to less
central ones
(subsystem / port maintainers, etc).
-Peff
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
|
|