List Info

Thread: efficient cloning




efficient cloning
user name
2006-03-19 23:18:49
James Cloos <cloosjhcloos.com> writes:

> I presume I need to clone -s -l the local alternate,
re-parent it to
> the new URL and grab anything missing, but how can I
assure that it
> results in exactly the same repo as this script?

"The same repo as this script" is a very poor
way to define what
you really want.  What is "git-repack -a -d -s"?

Guessing what you perhaps are trying to do:

 - You have /gits/linux-2.6/.git on your local disk that is
a
   reasonably recent copy of the upstream Linux 2.6
repository.

 - You want to clone from whatever $1 is (maybe a subsystem
   tree, but we cannot tell from your question) to a new
   directory $2.

 - Presumably you know whatever $1 is is related to Linus
   repository and would want to take advantage of the fact
that
   it shares many objects with /gits/linux-2.6/.git

It might be worth adding a --reference flag to git-clone
like
this patch does.

However, this patch alone does not reduce the transferred
data
during cloning any smaller if you are using the
"$1" repository
over git native transport (including a local repository),
because the current clone-pack does not look at existing
refs
(it was written assuming that there is _nothing_ in the
cloned
repository at the beginning).  That needs a separate
enhancements.  Maybe it would be a good idea to deprecate
clone-pack altogether, use fetch-pack -k, and implement the
"copy upstream refs to our refs" logic in
git-clone.sh.  We need
to do something like that if/when we are switching to use
$GIT_DIR/refs/remotes/ to store tracking branches outside
refs/heads anyway.

The rsync transport has been deprecated for some time, and
it
does not handle alternates correctly anyway, so this patch
does
not have any impact on that.

But if you are going to "$1" over http
transport, this patch
would help because we stash away the existing refs obtained
from
the reference repository under $GIT_DIR/refs/reference-tmp
while
we run the fetch.

---
diff --git a/git-clone.sh b/git-clone.sh
index 4ed861d..73fb03c 100755
--- a/git-clone.sh
+++ b/git-clone.sh
 -9,7
+9,7 
 unset CDPATH
 
 usage() {
-	echo >&2 "Usage: $0 [--bare] [-l [-s]] [-q]
[-u <upload-pack>] [-o <name>] [-n] <repo>
[<dir>]"
+	echo >&2 "Usage: $0 [--reference
<reference-repo>] [--bare] [-l [-s]] [-q] [-u
<upload-pack>] [-o <name>] [-n] <repo>
[<dir>]"
 	exit 1
 }
 
 -56,6
+56,7  upload_pack=
 bare=
 origin=origin
 origin_override=
+reference=
 while
 	case "$#,$1" in
 	0,*) break ;;
 -68,6
+69,11  while
        
*,-s|*,--s|*,--sh|*,--sha|*,--shar|*,--share|*,--shared) 
           local_shared=yes; use_local=yes ;;
 	*,-q|*,--quiet) quiet=-q ;;
+	*,--reference=*)
+	  reference=`expr "$1" : '-[^=]*=\(.*\)'`
;;
+	*,--reference)
+	  case "$#" in 1) usage ;; esac
+	  reference="$1" ;;
 	1,-o) usage;;
 	*,-o)
 		git-check-ref-format "$2" || {
 -130,6
+136,23  yes)
 	GIT_DIR="$D/.git" ;;
 esac
 
+# If given a reference we would first add that one; it has
to name a
+# local repository that resembles the one being cloned.
+if test -d "$reference"
+then
+	reference=$(cd "$reference" && pwd)
+	if test -d "$reference/.git/objects"
+	then
+		reference="$reference/.git"
+	fi
+	echo "$reference/objects"
>"$GIT_DIR/objects/info/alternates"
+	# Pretend we know about these heads - clone-pack does not
+	# honor them currently, but that can be rectified later.
+	mkdir "$GIT_DIR/refs/reference-tmp" 
+	(cd "$reference" && tar cf - refs) |
+	(cd "$GIT_DIR/refs/reference-tmp" &&
tar xf -)
+fi
+
 # We do local magic only when the user tells us to.
 case "$local,$use_local" in
 yes,yes)
 -229,6
+252,7  yes,yes)
 esac
 
 cd "$D" || exit
+test -d "$GIT_DIR/refs/reference-tmp"
&& rm -fr "$GIT_DIR/refs/reference-tmp"
 
 if test -f "$GIT_DIR/HEAD" && test -z
"$bare"
 then

-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomovger.kernel.org
More majordomo info at  http://vge
r.kernel.org/majordomo-info.html
efficient cloning
user name
2006-03-20 00:32:22
>>>>> "Junio" == Junio C Hamano
<junkiocox.net> writes:

Junio> "The same repo as this script" is a
very poor way to define what
Junio> you really want. 

I don't think so.  Getting the same values in files like
FETCH_HEAD,
ORIG_HEAD, branches/*, remotes/*,  info/* et al is not
obvious.
Especially, eg, all of the same Push/Pull lines.

Junio> What is "git-repack -a -d -s"?

A typo.  I of course meant -a -d -l.

Junio> It might be worth adding a --reference flag to
git-clone like
Junio> this patch does.

That is essentially what I tried (except only the name of
the flag; I
prefer your choice).  I didn't include the reference-tmp
logic, but
otherwise it looks about the same.

Junio> However, this patch alone does not reduce the
transferred data
Junio> during cloning any smaller if you are using the
"$1" repository
Junio> over git native transport (including a local
repository),
Junio> because the current clone-pack does not look at
existing refs

Exactly the wall I ran into.  And I really only need it for
git://.

Junio> Maybe it would be a good idea to deprecate
Junio> clone-pack altogether, use fetch-pack -k, and
implement the
Junio> "copy upstream refs to our refs" logic
in git-clone.sh.  We need
Junio> to do something like that if/when we are switching
to use
Junio> $GIT_DIR/refs/remotes/ to store tracking branches
outside
Junio> refs/heads anyway.

And it looks like you've shown me the door in that wall.

I'll have to read up on fetch-pack as opposed to
clone-pack.

-JimC
-- 
James H. Cloos, Jr. <cloosjhcloos.com>
-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomovger.kernel.org
More majordomo info at  http://vge
r.kernel.org/majordomo-info.html
efficient cloning
user name
2006-03-20 01:55:56
James Cloos <cloosjhcloos.com> writes:

> Junio> Maybe it would be a good idea to deprecate
> Junio> clone-pack altogether, use fetch-pack -k, and
implement the
> Junio> "copy upstream refs to our refs"
logic in git-clone.sh.  We need
> Junio> to do something like that if/when we are
switching to use
> Junio> $GIT_DIR/refs/remotes/ to store tracking
branches outside
> Junio> refs/heads anyway.
>
> And it looks like you've shown me the door in that
wall.

I was going to write that myself, but unfortunately will be
offline for the rest of the evening -- interrupted by a
surprise
visitor from India who is only visiting for a few days.

So in case you are really in a rush, and in a mood to build
on
top of my WIP, here is one.

* fetch-pack.c is modified so that you can say:

	git fetch-pack --all -k $1

  to get the list of "git ls-remote $1"
equivalent while
  fetching everything from the remote.

* Change git-clone.sh to use git-fetch-pack --all -k instead
of
  git-clone-pack; the output from fetch-pack is munged
further
  by a script that implements "copy the refs to the
same
  location while figuring out where the HEAD is".  The
latter
  part in my WIP is incomplete so --use-separate-remote
option
  probably would not work right now.

---
diff --git a/fetch-pack.c b/fetch-pack.c
index 535de10..2d0a626 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
 -7,8
+7,9 
 static int keep_pack;
 static int quiet;
 static int verbose;
+static int fetch_all;
 static const char fetch_pack_usage[] =
-"git-fetch-pack [-q] [-v] [-k] [--thin]
[--exec=upload-pack] [host:]directory
<refs>...";
+"git-fetch-pack [--all] [-q] [-v] [-k] [--thin]
[--exec=upload-pack] [host:]directory
<refs>...";
 static const char *exec = "git-upload-pack";
 
 #define COMPLETE	(1U << 0)
 -266,8
+267,9  static void filter_refs(struct ref **ref
 	for (prev = NULL, current = *refs; current; current =
next) {
 		next = current->next;
 		if ((!memcmp(current->name, "refs/", 5)
&&
-					check_ref_format(current->name + 5)) ||
-				!path_match(current->name, nr_match, match)) {
+		     check_ref_format(current->name + 5)) ||
+		    (!fetch_all &&
+		     !path_match(current->name, nr_match, match))) {
 			if (prev == NULL)
 				*refs = next;
 			else
 -426,6
+428,10  int main(int argc, char **argv)
 				use_thin_pack = 1;
 				continue;
 			}
+			if (!strcmp("--all", arg)) {
+				fetch_all = 1;
+				continue;
+			}
 			if (!strcmp("-v", arg)) {
 				verbose = 1;
 				continue;
diff --git a/git-clone.sh b/git-clone.sh
index 4ed861d..718029b 100755
--- a/git-clone.sh
+++ b/git-clone.sh
 -9,7
+9,7 
 unset CDPATH
 
 usage() {
-	echo >&2 "Usage: $0 [--bare] [-l [-s]] [-q]
[-u <upload-pack>] [-o <name>] [-n] <repo>
[<dir>]"
+	echo >&2 "Usage: $0 [--reference
<reference-repo>] [--bare] [-l [-s]] [-q] [-u
<upload-pack>] [-o <name>] [-n] <repo>
[<dir>]"
 	exit 1
 }
 
 -40,22
+40,74  Perhaps git-update-server-info needs to 
 	do
 		name=`expr "$refname" : 'refs/\(.*\)'`
&&
 		case "$name" in
-		*^*)	;;
-		*)
-			git-http-fetch -v -a -w "$name"
"$name" "$1/" || exit 1
+		*^*)	continue;;
 		esac
+		if test -n "$use_separate_remote" &&
+		   branch_name=`expr "$name" :
'heads/\(.*\)'`
+		then
+			tname="remotes/$branch_name"
+		else
+			tname=$name
+		fi
+		git-http-fetch -v -a -w "$tname"
"$name" "$1/" || exit 1
 	done <"$clone_tmp/refs"
 	rm -fr "$clone_tmp"
 }
 
+# A Perl script to read git-fetch -k output and store the
+# remote branches.
+copy_refs='
+use File::Path qw(mkpath);
+my $refs_file = $ARGV[0];
+my $use_separate_remote = $ARGV[1];
+my $git_dir = $ARGV[2];
+
+my $branch_top = ($use_separate_remote ?
"heads" : "remotes");
+my $tag_top = "tags";
+my $head = undef;
+
+sub store {
+	my ($sha1, $name, $top) = _;
+	$name = "$git_dir/refs/$top/$name";
+	mkpath(dirname($name));
+	open O, ">", "$name";
+	print O "$sha1\n";
+	close O;
+}
+
+open FH, "<", $refs_file;
+while (<FH>) {
+	my ($sha1, $name) = /^([0-9a-f]) (.*)$/;
+	if ($name eq "HEAD") {
+		$head = $sha1;
+		next;
+	}
+	if ($name =~ s/^refs\/heads\//) {
+		if (!defined $head && $name eq
"master") {
+			$head = $sha1;
+		}
+		store_branch($sha1, $name, $branch_top);
+		next;
+	}
+	if ($name =~ s/^refs\/tags\//) {
+		store_tag($sha1, $name, $tag_top);
+		next;
+	}
+}
+close FH;
+'
+
+
 quiet=
 use_local=no
 local_shared=no
 no_checkout=
 upload_pack=
 bare=
+reference=
 origin=origin
 origin_override=
+use_separate_remote=
 while
 	case "$#,$1" in
 	0,*) break ;;
 -68,7
+120,14  while
        
*,-s|*,--s|*,--sh|*,--sha|*,--shar|*,--share|*,--shared) 
           local_shared=yes; use_local=yes ;;
 	*,-q|*,--quiet) quiet=-q ;;
+	*,--use-separate-remote)
+		use_separate_remote=t ;;
 	1,-o) usage;;
+	1,--reference) usage ;;
+	*,--reference)
+		shift; reference="$2" ;;
+	*,--reference=*)
+		reference=`expr "$1" :
'--reference=\(.*\)'` ;;
 	*,-o)
 		git-check-ref-format "$2" || {
 		    echo >&2 "'$2' is not suitable for a
branch name"
 -130,6
+189,26  yes)
 	GIT_DIR="$D/.git" ;;
 esac
 
+if -n "$reference"
+then
+	if test -d "$reference
+	then
+		if test -d "$reference/.git/objects"
+		then
+			reference="$reference/.git"
+		fi
+		reference=(cd "$reference" && pwd)
+		echo "$reference/objects"
>"$GIT_DIR/objects/info/alternates"
+		(cd "$reference" && tar cf - refs) |
+		(cd "$GIT_DIR/refs &&
+		 mkdir reference-tmp &&
+		 cd reference-tmp &&
+		 tar xf -)
+	else
+		echo >&2 "$reference: not a local
directory." && usage
+	fi
+fi
+
 # We do local magic only when the user tells us to.
 case "$local,$use_local" in
 yes,yes)
 -217,17
+296,22  yes,yes)
 		;;
 	*)
 		cd "$D" && case
"$upload_pack" in
-		'') git-clone-pack $quiet "$repo" ;;
-		*) git-clone-pack $quiet "$upload_pack"
"$repo" ;;
-		esac || {
+		'') git-fetch-pack -k $quiet "$repo" ;;
+		*) git-fetch-pack -k $quiet "$upload_pack"
"$repo" ;;
+		esac >"$GIT_DIR/FETCH_HEAD" || {
 			echo >&2 "clone-pack from '$repo'
failed."
 			exit 1
 		}
+		# Now figure out where the remote HEAD points at.
+		perl -e "$copy_refs"
"$GIT_DIR/FETCH_HEAD" \
+			"$use_separate_remote"
"$GIT_DIR"
 		;;
 	esac
 	;;
 esac
 
+test -d "$GIT_DIR/refs/reference-tmp"
&& rm -fr "$GIT_DIR/refs/reference-tmp"
+
 cd "$D" || exit
 
 if test -f "$GIT_DIR/HEAD" && test -z
"$bare"

-
To unsubscribe from this list: send the line
"unsubscribe git" in
the body of a message to majordomovger.kernel.org
More majordomo info at  http://vge
r.kernel.org/majordomo-info.html
[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )