|
List Info
Thread: Packfile can't be mapped
|
|
| Packfile can't be mapped |

|
2006-08-28 01:04:01 |
git-repack can't handle my 1.75GB pack file. I am running
x86 with 3GB
address space.
-rw-rw-r-- 1 jonsmirl jonsmirl 47221712 Aug 27 20:29
testme.idx
-rw-rw-r-- 1 jonsmirl jonsmirl 1754317619 Aug 27 20:29
testme.pack
[jonsmirl jonsmirl t1]$ git-repack -a -f --window=50
--depth=5000
Generating pack...
Done counting 1963325 objects.
fatal: packfile .git/objects/pack/testme.pack cannot be
mapped.
[jonsmirl jonsmirl t1]$
It is built from Mozilla CVS but it is an intermediate stage
of our
work. The fast-import tool isn't diffing directory tree
which makes
the pack much bigger than it needs to be. Shawn is working
on the
packing code.
---------------------------------------------------
Alloc'd objects: 1968000 ( 1892000 overflow )
Total objects: 1967527 ( 41856 duplicates)
blobs : 633842 ( 0 duplicates)
trees : 1131208 ( 41856 duplicates)
commits: 200921 ( 0 duplicates)
tags : 1556 ( 0 duplicates)
Total branches: 1600 ( 7985 loads )
marks: 1048576 ( 200921 unique )
atoms: 56803
Memory total: 66908 KiB
pools: 5408 KiB
objects: 61500 KiB
Pack remaps: 9501
---------------------------------------------------
Pack size: 1713200 KiB
Index size: 46114 KiB
--
Jon Smirl
jonsmirl gmail.com
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
| Packfile can't be mapped |

|
2006-08-28 02:47:20 |
Jon Smirl <jonsmirl gmail.com> wrote:
> git-repack can't handle my 1.75GB pack file. I am
running x86 with 3GB
> address space.
>
> -rw-rw-r-- 1 jonsmirl jonsmirl 47221712 Aug 27 20:29
testme.idx
> -rw-rw-r-- 1 jonsmirl jonsmirl 1754317619 Aug 27 20:29
testme.pack
>
> [jonsmirl jonsmirl t1]$ git-repack -a -f --window=50
--depth=5000
> Generating pack...
> Done counting 1963325 objects.
> fatal: packfile .git/objects/pack/testme.pack cannot be
mapped.
> [jonsmirl jonsmirl t1]$
>
> It is built from Mozilla CVS but it is an intermediate
stage of our
> work. The fast-import tool isn't diffing directory
tree which makes
> the pack much bigger than it needs to be. Shawn is
working on the
> packing code.
I'm going to try to get tree deltas written to the pack
sometime this
week. That should compact this intermediate pack down to
something
that git-pack-objects would be able to successfully mmap
into a
32 bit address space. A complete repack with no delta reuse
will
hopefully generate a pack closer to 400 MB in size. But I
know
Jon would like to get that pack even smaller.
I should point out that the input stream to fast-import was
20 GB
(completely decompressed revisions from RCS) plus all commit
data.
The original CVS ,v files are around 3 GB. An archive
.tar.gz'ing
the ,v files is around 550 MB. Going to only 1.7 GB without
tree
or commit deltas is certainly pretty good.
> ---------------------------------------------------
> Alloc'd objects: 1968000 ( 1892000 overflow )
> Total objects: 1967527 ( 41856 duplicates)
> blobs : 633842 ( 0 duplicates)
> trees : 1131208 ( 41856 duplicates)
> commits: 200921 ( 0 duplicates)
> tags : 1556 ( 0 duplicates)
> Total branches: 1600 ( 7985 loads )
> marks: 1048576 ( 200921 unique )
> atoms: 56803
> Memory total: 66908 KiB
> pools: 5408 KiB
> objects: 61500 KiB
> Pack remaps: 9501
> ---------------------------------------------------
> Pack size: 1713200 KiB
> Index size: 46114 KiB
All of that says that aside from the 1.7 GB output file
fast-import
ran extremely well. About 1.9 million objects were written
into
the output pack file, with 41k duplicate trees (duplicate
blobs
were removed by cvs2svn prior to fast-import so they don't
appear).
200k commits were created across 1600 branches. And we did
it in
only 67 MB of memory.
We also had ~8000 LRU cache misses related to our branch
data;
this just means that cvs2svn likes to frequently jump around
between branches rather than import an entire branch at a
time.
Boosting the size of the LRU cache (at the expense of
needing more
memory) should reduce those cache misses as well as 'Pack
remaps'.
I'd also like to clean up that pack remapping code and move
it
into sha1_file.c. Its an implementation of partial pack
mapping
and it is apparently working quite well for us in
fast-import.
It may help GIT deal with very large packs (e.g. 1.7 GB) on
smaller
address space systems (e.g. 32 bit).
We're not confident that this import is completely valid
yet.
We have a few translation issues we're still working on.
But now
that we have a complete pack going from start to finish we
can start
to focus on those issues. Especially since this entire
process
(,v to .pack) is less than half a day to run.
--
Shawn.
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
| Packfile can't be mapped |

|
2006-08-28 04:27:02 |
On Sun, 27 Aug 2006, Shawn Pearce wrote:
> I'm going to try to get tree deltas written to the
pack sometime this
> week. That should compact this intermediate pack down
to something
> that git-pack-objects would be able to successfully
mmap into a
> 32 bit address space. A complete repack with no delta
reuse will
> hopefully generate a pack closer to 400 MB in size.
But I know
> Jon would like to get that pack even smaller.
One thing to consider in your code (if you didn't implement
that
already) is to _not_ attempt any delta on any object whose
size is
smaller than 50 bytes, and then limit the maximum delta size
to
object_size/2 - 20 (use that for the last argument to
diff-delta() and
store the undeltified object when diff-delta returns NULL).
This way
you'll avoid creating delta objects that are most likely to
end up being
_larger_ than the undeltified object.
> I should point out that the input stream to fast-import
was 20 GB
> (completely decompressed revisions from RCS) plus all
commit data.
> The original CVS ,v files are around 3 GB. An archive
.tar.gz'ing
> the ,v files is around 550 MB. Going to only 1.7 GB
without tree
> or commit deltas is certainly pretty good.
Good job indeed. Oh and you probably should not bother
trying to
deltify commit objects at all since that would be a waste of
time.
Nicolas
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
| Packfile can't be mapped |

|
2006-08-28 05:33:01 |
Nicolas Pitre <nico cam.org> wrote:
> On Sun, 27 Aug 2006, Shawn Pearce wrote:
>
> > I'm going to try to get tree deltas written to
the pack sometime this
> > week. That should compact this intermediate pack
down to something
> > that git-pack-objects would be able to
successfully mmap into a
> > 32 bit address space. A complete repack with no
delta reuse will
> > hopefully generate a pack closer to 400 MB in
size. But I know
> > Jon would like to get that pack even smaller.
>
> One thing to consider in your code (if you didn't
implement that
> already) is to _not_ attempt any delta on any object
whose size is
> smaller than 50 bytes, and then limit the maximum delta
size to
> object_size/2 - 20 (use that for the last argument to
diff-delta() and
> store the undeltified object when diff-delta returns
NULL). This way
> you'll avoid creating delta objects that are most
likely to end up being
> _larger_ than the undeltified object.
I haven't tried this. Should be trivial to implement.
Thanks for
the suggestion.
> > I should point out that the input stream to
fast-import was 20 GB
> > (completely decompressed revisions from RCS) plus
all commit data.
> > The original CVS ,v files are around 3 GB. An
archive .tar.gz'ing
> > the ,v files is around 550 MB. Going to only 1.7
GB without tree
> > or commit deltas is certainly pretty good.
>
> Good job indeed. Oh and you probably should not bother
trying to
> deltify commit objects at all since that would be a
waste of time.
I wasn't going to bother even trying to delta the commits.
In this
import the 200k commits isn't a very large percentage of
the data.
As I'm sure you are well aware its pretty much a waste time
to try
with the commits, especially with an
"intermediate" pack such as
this one.
--
Shawn.
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
| Packfile can't be mapped |

|
2006-08-28 16:42:22 |
Nicolas Pitre <nico cam.org> wrote:
> On Sun, 27 Aug 2006, Shawn Pearce wrote:
>
> > I'm going to try to get tree deltas written to
the pack sometime this
> > week. That should compact this intermediate pack
down to something
> > that git-pack-objects would be able to
successfully mmap into a
> > 32 bit address space. A complete repack with no
delta reuse will
> > hopefully generate a pack closer to 400 MB in
size. But I know
> > Jon would like to get that pack even smaller.
>
> One thing to consider in your code (if you didn't
implement that
> already) is to _not_ attempt any delta on any object
whose size is
> smaller than 50 bytes, and then limit the maximum delta
size to
> object_size/2 - 20 (use that for the last argument to
diff-delta() and
> store the undeltified object when diff-delta returns
NULL). This way
> you'll avoid creating delta objects that are most
likely to end up being
> _larger_ than the undeltified object.
So I added Nico's suggestions to fast-import and ran it on
a small
subset of the Mozilla repository (3424 blobs):
naive always delta: 6652 KiB
Nico's suggestion: 6842 KiB
So Nico's suggestion of limiting delta size to
(orig_len/2)-20 or
not using deltas on blobs < 50 bytes actually added 190
KB to the
output pack. Since this sample is probably fairly
representative
of the rest of the repository's blobs I'm thinking we may
see a 2.8%
increase in size over the current 930 MB blob pack. That's
another
26 MB in our intermediate pack. I don't think this
suggestion is
really worth including in fast-import right now...
--
Shawn.
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
| Packfile can't be mapped |

|
2006-08-28 17:19:31 |
On Mon, 28 Aug 2006, Shawn Pearce wrote:
> Nicolas Pitre <nico cam.org> wrote:
> > On Sun, 27 Aug 2006, Shawn Pearce wrote:
> >
> > > I'm going to try to get tree deltas written
to the pack sometime this
> > > week. That should compact this intermediate
pack down to something
> > > that git-pack-objects would be able to
successfully mmap into a
> > > 32 bit address space. A complete repack with
no delta reuse will
> > > hopefully generate a pack closer to 400 MB in
size. But I know
> > > Jon would like to get that pack even smaller.
> >
> > One thing to consider in your code (if you didn't
implement that
> > already) is to _not_ attempt any delta on any
object whose size is
> > smaller than 50 bytes, and then limit the maximum
delta size to
> > object_size/2 - 20 (use that for the last argument
to diff-delta() and
> > store the undeltified object when diff-delta
returns NULL). This way
> > you'll avoid creating delta objects that are most
likely to end up being
> > _larger_ than the undeltified object.
>
> So I added Nico's suggestions to fast-import and ran
it on a small
> subset of the Mozilla repository (3424 blobs):
>
> naive always delta: 6652 KiB
> Nico's suggestion: 6842 KiB
Hmmm...
> So Nico's suggestion of limiting delta size to
(orig_len/2)-20 or
> not using deltas on blobs < 50 bytes actually added
190 KB to the
> output pack. Since this sample is probably fairly
representative
> of the rest of the repository's blobs I'm thinking we
may see a 2.8%
> increase in size over the current 930 MB blob pack.
That's another
> 26 MB in our intermediate pack. I don't think this
suggestion is
> really worth including in fast-import right now...
The above is based on the assumption that undeltified blobs
usually
deflates to 50% the undeflated size or more, and that pure
object data
deflates better than delta data. Then there is the 20 byte
base object
reference overhead for any deltas. The 20 bytes is a hard
fact. The
50% factor is a wild guess. What I forgot to consider in
the above
formula is the fact that delta data gets deflated as well so
the /2
divisor is probably a bit too much (you could try
orig_len*2/3 - 20, or
orig-len - 20, and adjust the initial treshold so the limit
value
doesn't go negative).
If you are IO bound (I recall Jon mentioning something to
that effect)
then you could probably use some CPU cycles to always
deflate the
object, deflate the resulting delta, and pick the smallest
between the
two (don't forget the additional 20 bytes in the delta
case). Maybe the
increased CPU usage won't justify this solution though.
Nicolas
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
| Packfile can't be mapped |

|
2006-08-29 04:52:39 |
Shawn Pearce <spearce spearce.org> wrote:
> I'm going to try to get tree deltas written to the
pack sometime this
> week.
I was able to implement and with Jon Smirl's help debug the
tree
delta code in fast-import.
Earlier this evening Jon sent me the following:
> git-fast-import statistics:
>
------------------------------------------------------------
---------
> Alloc'd objects: 1980000 ( 0 overflow )
> Total objects: 1967527 ( 41856 duplicates
)
> blobs : 633842 ( 0 duplicates
576219 deltas)
> trees : 1131208 ( 41856 duplicates
1019741 deltas)
> commits: 200921 ( 0 duplicates
0 deltas)
> tags : 1556 ( 0 duplicates
0 deltas)
> Total branches: 1600 ( 2228 loads )
> marks: 1048576 ( 200921 unique )
> atoms: 56803
> Memory total: 75213 KiB
> pools: 13338 KiB
> objects: 61875 KiB
> Pack remaps: 658
> Pack size: 895983 KiB
> Index size: 46114 KiB
>
------------------------------------------------------------
---------
Compared to our last attempt:
> > Pack size: 1713200 KiB
> > Index size: 46114 KiB
This tree delta version came out pretty good. The pack with
tree
deltas is 874 MiB. Quite a reduction in size. fast-import
takes
about 20 minutes to convert its 20 GiB input file into this
874 MiB
pack. Producing the 20 GiB input file from the 3 GiB CVS ,v
files takes about 4 hours with Jon's modified cvs2svn.
Jon has started a `git-repack -a -f` with aggressive depth
and
window sizes. He estimated it may need another 2.5 hours to
process.
Hopefully I'll hear more details tomorrow.
--
Shawn.
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
| Packfile can't be mapped |

|
2006-08-29 05:33:12 |
Shawn Pearce <spearce spearce.org> wrote:
> This tree delta version came out pretty good. The pack
with tree
> deltas is 874 MiB. Quite a reduction in size.
fast-import takes
> about 20 minutes to convert its 20 GiB input file into
this 874 MiB
> pack. Producing the 20 GiB input file from the 3 GiB
CVS ,v
> files takes about 4 hours with Jon's modified cvs2svn.
>
> Jon has started a `git-repack -a -f` with aggressive
depth and
> window sizes. He estimated it may need another 2.5
hours to process.
> Hopefully I'll hear more details tomorrow.
I just heard from Jon:
> git-repack -a -f --window=50 --depth=5000
> 100% CPU for 60 minutes
> 1.2GB resident memory
> Final pack size is 451,203,363 bytes.
So with very agressive delta depth and window sizes
git-repack took
a while to run but came very close to the best packed size
from
previous Mozilla CVS import attempts. I think we'd still
like to
make the final historical pack smaller than that.
--
Shawn.
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomo vger.kernel.org
More majordomo info at http://vge
r.kernel.org/majordomo-info.html
|
|
[1-8]
|
|
|
about | contact Other archives ( Real Estate discussion Medical topics )
|