|
List Info
Thread: Question on cvs2svn derived Author data
|
|
| Question on cvs2svn derived Author data |

|
2007-10-05 21:06:20 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Folks,
I'm in the process of converting the Samba svn repositories
to git. As part of the conversion, I'm importing the
historical cvs repos to git as well using cvs2svn.
I have a question about how cvs2svn generates the
"Author"
data in the git output. Right now, I'm seeing
commit f0f8c7730c39601c082b78e0ff3f5514e3e4ce32
Author: jerry <jerry>
Date: Wed Jan 13 00:40:04 1999 +0000
....
Is there a way to use an authors file like git-svnimport?
Entries in the file (for those not familiar) are in the
form.
jerry = Gerald Carter <jerry samba.org>
So all cvs commits by "jerry" are attributed in
the git repo
to "Gerald Carter <jerry samba.org>".
I can't find anything in the options for cvs2svn-trunk. Nor
can
I find specifically in the code where the Author
information
is generated for the commit.
Any pointers are appreciated and please CC me since I'm not
subscribed (although I can follow the list archives if
necessary).
Thanks.
cheers, jerry
- --
============================================================
=========
Samba ------- http://www.samba.org
Centeris ----------- http://www.centeris.com
"What man is a man who does not make the world
better?" --Balian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHBu2cIR7qMdg1EfYRAp6UAJ93nx8mIh5hXCBjYAG4sCfv3VuPewCf
QdAY
7g9nbb85FHauht//DnzBTwQ=
=tCVX
-----END PGP SIGNATURE-----
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe cvs2svn.tigris.org
For additional commands, e-mail: dev-help cvs2svn.tigris.org
|
|
| Re: Question on cvs2svn derived Author
data |

|
2007-10-05 21:17:17 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Gerald (Jerry) Carter wrote:
> Is there a way to use an authors file like
git-svnimport?
> Entries in the file (for those not familiar) are in the
form.
>
> jerry = Gerald Carter <jerry samba.org>
>
> So all cvs commits by "jerry" are attributed
in the git repo
> to "Gerald Carter <jerry samba.org>".
I probably should have read up on the git-fast-import
format
before posting, but everyone knows the fater way to find an
answer is to send the question to a public ml.
So it appears the easiest way to fix this right now is to
just edit the git-dump.dat file directly. But support for
an authors file would be a big plus wrt ease of use.
Sorry for the noise.
cheers, jerry
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHBvAsIR7qMdg1EfYRAjNLAKCotL0VXLkf+OqpUpTftMjiILSmOACf
Yl5w
89HZrBAj0kU0SnPVTSMzLt0=
=c0o3
-----END PGP SIGNATURE-----
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe cvs2svn.tigris.org
For additional commands, e-mail: dev-help cvs2svn.tigris.org
|
|
| Re: Question on cvs2svn derived Author
data |

|
2007-10-05 21:56:22 |
"Gerald (Jerry) Carter" <jerry samba.org> writes:
> I probably should have read up on the git-fast-import
format
> before posting, but everyone knows the fater way to
find an
> answer is to send the question to a public ml.
>
> So it appears the easiest way to fix this right now is
to
> just edit the git-dump.dat file directly. But support
for
> an authors file would be a big plus wrt ease of use.
I haven't done cvs2svn development in a long time, and
Michael
Haggerty has really cleaned up the code since then.
So the following incomplete, untested patch is *definitely*
not meant
to be committed to the repository. It's just an attempt to
lower the
barrier for someone else who might have time to do this the
Right Way
(perhaps you, Gerald?) Sheesh, supporting an authors
mapping file
ought to be pretty easy . And if
anyone does clean this up, make
it more object-oriented, etc, and commit it, I'll learn a
lot about
the new architecture from seeing the diff.
Here's a stab in the dark:
[[[
Support mapping commit author shortnames to expansions via a
map file.
Suggested by: Gerald (Jerry) Carter
<jerrysamba.org>
* cvs2svn_lib/context.py
(Ctx.set_defaults): New context parameter 'authors_file'.
* cvs2svn_lib/run_options.py
(usage_message_template): Document new --authors option.
(RunOptions.__init__): Accept new --authors option.
(RunOptions.process_remaining_options): New variable
authors_file.
* cvs2svn_lib/git_output_option.py
(GitOutputOption.__init__): Read in authors map if
available.
(GitOutputOption._get_author): Try to expand author name.
]]]
============================================================
=======
--- cvs2svn_lib/run_options.py (revision 4194)
+++ cvs2svn_lib/run_options.py (working copy)
 -130,6
+130,9 
--cvs-revnums record CVS revision numbers as
file properties
--mime-types=FILE specify an apache-style
mime.types file for
setting svn:mime-type
+ --authors=FILE expand commit authors
according to FILE
+ (each line is "shortname
= EXPANSION", e.g.:
+ "jrandom = J. Random
<jrandom example.com>)"
--eol-from-mime-type set svn:eol-style from mime
type if known
--auto-props=FILE set file properties from the
auto-props section
of a file in svn config
format
 -215,6
+218,7 
"username=",
"cvs-revnums",
"mime-types=",
+ "authors=",
"auto-props=",
"eol-from-mime-type",
"default-eol=",
"keywords-off",
 -343,6
+347,7 
use_internal_co = False
symbol_strategy_default = 'strict'
mime_types_file = None
+ authors_file = None
auto_props_file = None
auto_props_ignore_case = True
eol_from_mime_type = False
 -422,6
+427,8 
ctx.svn_property_setters.append(CVSRevisionNumberSetter())
elif opt == '--mime-types':
mime_types_file = value
+ elif opt == '--authors':
+ authors_file = value
elif opt == '--auto-props':
auto_props_file = value
elif opt == '--auto-props-ignore-case':
 -594,6
+601,8 
if mime_types_file:
ctx.svn_property_setters.append(MimeMapper(mime_types_file))
+ ctx.authors_file = authors_file
+
ctx.svn_property_setters.append(CVSBinaryFileEOLStyleSetter(
))
ctx.svn_property_setters.append(CVSBinaryFileDefaultMimeType
Setter())
Index: cvs2svn_lib/git_output_option.py
============================================================
=======
--- cvs2svn_lib/git_output_option.py (revision 4194)
+++ cvs2svn_lib/git_output_option.py (working copy)
 -50,6
+50,21 
def __init__(self, dump_filename):
# The file to which to write the git-fast-import
commands:
self.dump_filename = dump_filename
+ self.authors_map = { }
+ if Ctx().authors_file:
+ import re
+ authors_line_re =
re.compile("^s*(S+)s*=s*(.*)$")
+ for line in file(Ctx().authors_file):
+ if line.startswith("#"):
+ continue
+ # format of a line is something like:
+ # jrandom = J. Random ...rest of expansion, maybe
email address...
+ m = authors_line_re.match(line)
+ if m is None:
+ continue
+ shortname = m.group(1)
+ expansion = m.group(2)
+ self.authors_map[shortname] = expansion
def register_artifacts(self, which_pass):
# These artifacts are needed for SymbolingsReader:
 -99,6
+114,8 
def _get_author(svn_commit):
author = svn_commit.get_author()
+ # ###TODO: Expand before or after UTF8 conversion?
Tricky question...
+ author = self.authors_map.get(author, author)
try:
author = Ctx().utf8_encoder(author)
except UnicodeError:
Index: cvs2svn_lib/context.py
============================================================
=======
--- cvs2svn_lib/context.py (revision 4194)
+++ cvs2svn_lib/context.py (working copy)
 -55,6
+55,7 
self.symbol_strategy_rules = []
self.symbol_info_filename = None
self.username = None
+ self.authors_file = None
self.svn_property_setters = []
self.tmpdir = 'cvs2svn-tmp'
self.skip_cleanup = False
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe cvs2svn.tigris.org
For additional commands, e-mail: dev-help cvs2svn.tigris.org
|
|
| Re: Question on cvs2svn derived Author
data |

|
2007-10-05 22:25:38 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Karl Fogel wrote:
> "Gerald (Jerry) Carter" <jerry samba.org> writes:
>>
>> So it appears the easiest way to fix this right now
is to
>> just edit the git-dump.dat file directly. But
support for
>> an authors file would be a big plus wrt ease of
use.
>
> I haven't done cvs2svn development in a long time, and
Michael
> Haggerty has really cleaned up the code since then.
>
> So the following incomplete, untested patch is
*definitely* not meant
> to be committed to the repository. It's just an
attempt to lower the
> barrier for someone else who might have time to do this
the Right Way
> (perhaps you, Gerald?) Sheesh, supporting an authors
mapping file
> ought to be pretty easy . And if
anyone does clean this up, make
> it more object-oriented, etc, and commit it, I'll learn
a lot about
> the new architecture from seeing the diff.
>
> Here's a stab in the dark:
Thanks Karl. I'll looks at this. It does help me to
understand the
cvs2svn code some.
For the archives, I was able past the immediate need by
editing
the git-dump.dat file generated for git-fast-import using a
quick
(and ugly) perl script:
cheers, jerry
- ----
#!/usr/bin/perl -w
my %authors;
open(AUTHORS, "< $ARGV[0]") || die !;
while (<AUTHORS>) {
chomp($_);
$_ =~ s/s+=s+/=/;
($name, $userinfo) = split(/=/, $_);
$authors{$name} = $userinfo;
}
close(AUTHORS);
open(INDATA, "< $ARGV[1]") || die !;
while (<INDATA>) {
chomp($_);
if ($_ =~ /^committer /) {
lineinfo = split(/ /, $_);
if (!defined($authors{$lineinfo[1]})) {
die "No entry for "$lineinfo[1]" in
$ARGV[0]!n";
}
($realname, $emailaddr) = split(/</,
$authors{$lineinfo[1]});
$realname =~ s/s+$//g;
$emailaddr =~ s/^s+//;
print "committer $realname <$emailaddr
$lineinfo[3] $lineinfo[4]n"
} else {
print "$_n";
}
}
close(INDATA);
- ----
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHBwAyIR7qMdg1EfYRAuulAKDeD3lf9JWtZrU5uyuaaLVAdFqbnACf
Uq+G
8/h0b+LccA/bNWwzbXdzdlI=
=MZUz
-----END PGP SIGNATURE-----
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe cvs2svn.tigris.org
For additional commands, e-mail: dev-help cvs2svn.tigris.org
|
|
| Re: Question on cvs2svn derived Author
data |

|
2007-10-06 06:57:37 |
Karl Fogel wrote:
> "Gerald (Jerry) Carter" <jerry samba.org> writes:
>> I probably should have read up on the
git-fast-import format
>> before posting, but everyone knows the fater way to
find an
>> answer is to send the question to a public ml.
>>
>> So it appears the easiest way to fix this right now
is to
>> just edit the git-dump.dat file directly. But
support for
>> an authors file would be a big plus wrt ease of
use.
>
> I haven't done cvs2svn development in a long time, and
Michael
> Haggerty has really cleaned up the code since then.
>
> So the following incomplete, untested patch is
*definitely* not meant
> to be committed to the repository. It's just an
attempt to lower the
> barrier for someone else who might have time to do this
the Right Way
> (perhaps you, Gerald?) Sheesh, supporting an authors
mapping file
> ought to be pretty easy . And if
anyone does clean this up, make
> it more object-oriented, etc, and commit it, I'll learn
a lot about
> the new architecture from seeing the diff.
Good to have you back! Your patch looks pretty sensible. I
have some
comments below.
It might make sense to offer this option for
"2svn" conversions, too.
Obviously people would have to convert author names to
something
compatible with SVN, but it would allow old CVS usernames to
be updated
to whatever is current. If so, the 2svn functionality would
either be
implemented in SVNOutputOptions, or possibly both versions
could be
implemented by changing SVNCommit.get_author() (depending on
encoding
issues).
> Here's a stab in the dark:
>
> [[[
> Support mapping commit author shortnames to expansions
via a map file.
>
> Suggested by: Gerald (Jerry) Carter
<jerrysamba.org>
>
> * cvs2svn_lib/context.py
> (Ctx.set_defaults): New context parameter
'authors_file'.
>
> * cvs2svn_lib/run_options.py
> (usage_message_template): Document new --authors
option.
> (RunOptions.__init__): Accept new --authors option.
> (RunOptions.process_remaining_options): New variable
authors_file.
>
> * cvs2svn_lib/git_output_option.py
> (GitOutputOption.__init__): Read in authors map if
available.
> (GitOutputOption._get_author): Try to expand author
name.
> ]]]
So far, git conversions can only be started by the --options
method.
Therefore it is not very useful to include an --authors
option (unless
support is added for this feature for 2svn conversions,
too).
In any case, an commented example should be added to
test-data/main-cvsrepos/cvs2svn-git.options.
If support is added for SVN, then a commented example should
be added to
./cvs2svn-example.options. In this case, the option would
also need to
be documented in www/cvs2svn.html and cvs2svn.1. (Oh the
misery of
duplicated documentation!)
>
============================================================
=======
> --- cvs2svn_lib/run_options.py (revision 4194)
> +++ cvs2svn_lib/run_options.py (working copy)
>  -130,6 +130,9 
> --cvs-revnums record CVS revision
numbers as file properties
> --mime-types=FILE specify an apache-style
mime.types file for
> setting svn:mime-type
> + --authors=FILE expand commit authors
according to FILE
> + (each line is
"shortname = EXPANSION", e.g.:
> + "jrandom = J. Random
<jrandom example.com>)"
> --eol-from-mime-type set svn:eol-style from
mime type if known
> --auto-props=FILE set file properties from
the auto-props section
> of a file in svn config
format
>  -215,6 +218,7 
> "username=",
> "cvs-revnums",
> "mime-types=",
> + "authors=",
> "auto-props=",
> "eol-from-mime-type",
"default-eol=",
> "keywords-off",
>  -343,6 +347,7 
> use_internal_co = False
> symbol_strategy_default = 'strict'
> mime_types_file = None
> + authors_file = None
> auto_props_file = None
> auto_props_ignore_case = True
> eol_from_mime_type = False
>  -422,6 +427,8 
>
ctx.svn_property_setters.append(CVSRevisionNumberSetter())
> elif opt == '--mime-types':
> mime_types_file = value
> + elif opt == '--authors':
> + authors_file = value
> elif opt == '--auto-props':
> auto_props_file = value
> elif opt == '--auto-props-ignore-case':
>  -594,6 +601,8 
> if mime_types_file:
>
ctx.svn_property_setters.append(MimeMapper(mime_types_file))
>
> + ctx.authors_file = authors_file
> +
>
ctx.svn_property_setters.append(CVSBinaryFileEOLStyleSetter(
))
>
>
ctx.svn_property_setters.append(CVSBinaryFileDefaultMimeType
Setter())
It should be verified either that the option is not set more
than once,
or that multiple authors files are supported.
> Index: cvs2svn_lib/git_output_option.py
>
============================================================
=======
> --- cvs2svn_lib/git_output_option.py (revision 4194)
> +++ cvs2svn_lib/git_output_option.py (working copy)
>  -50,6 +50,21 
> def __init__(self, dump_filename):
> # The file to which to write the git-fast-import
commands:
> self.dump_filename = dump_filename
> + self.authors_map = { }
> + if Ctx().authors_file:
> + import re
> + authors_line_re =
re.compile("^s*(S+)s*=s*(.*)$")
> + for line in file(Ctx().authors_file):
> + if line.startswith("#"):
> + continue
> + # format of a line is something like:
> + # jrandom = J. Random ...rest of expansion,
maybe email address...
> + m = authors_line_re.match(line)
> + if m is None:
> + continue
> + shortname = m.group(1)
> + expansion = m.group(2)
> + self.authors_map[shortname] = expansion
>
You shouldn't be shy about importing "re" at the
top of the file. It is
used enough other places in the code that it is sure to be
imported
anyway, so the extra "import" only costs a
dictionary lookup.
The question of what regexp to support for author names is
more
complicated. git requires an author name of the form
"Some Name
<email>"; it is a requirement (at least of
git-fast-import) that the
angled brackets are there. Please note that the callers of
_get_author() do things like
self.f.write(
'committer %s <%s> %d +0000n' % (author,
author, svn_commit.date,)
)
It would probably make more sense that the expansion
author -> "%s <%s>" %
(author,author,)
be done within _get_author(), for author names that are not
translated
via the author's file. For author names that are in the
file, we should
probably require that the replacement string be in the
correct "* <*>"
format. In this case the regular expression should probably
be changed
to something like
authors_line_re = re.compile("^s*(S+)s*=s*(.*S
<S+>)s*$")
This regular expression also strips off trailing whitespace.
The
git-fast-import documentation has details about what exactly
is allowed.
> def register_artifacts(self, which_pass):
> # These artifacts are needed for
SymbolingsReader:
>  -99,6 +114,8 
>
> def _get_author(svn_commit):
> author = svn_commit.get_author()
> + # ###TODO: Expand before or after UTF8 conversion?
Tricky question...
> + author = self.authors_map.get(author, author)
> try:
> author = Ctx().utf8_encoder(author)
> except UnicodeError:
Yes, indeed tricky. Doing it before UTF8 translation means
that the
replacement value goes through our somewhat ambiguous
utf8_encoder()
(where UTF8 is not necessarily even one of the accepted
input formats).
Doing afterwards means that the authors file has to be in
UTF8, and
introduces the possibility of failed matches because strings
don't have
a unique UTF8 encodings.
Perhaps it should be done *instead of* the UTF8 translation?
Something like
> author = svn_commit.get_author()
> if author in self.authors_map:
> author = self.authors_map[author]
> else:
> try:
> author = Ctx().utf8_encoder(author)
> except UnicodeError:
In this case when populating the author table it should be
verified that
the replacement values are all valid UTF8.
Michael
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe cvs2svn.tigris.org
For additional commands, e-mail: dev-help cvs2svn.tigris.org
|
|
[1-5]
|
|