|
List Info
Thread: RFC -- enhancing cvs2svn to have a notion of spans of mergeable commits
|
|
| RFC -- enhancing cvs2svn to have a
notion of spans of mergeable commits |

|
2007-07-14 20:48:01 |
(This is a repost of a report from 2004 that seems to be
missing from
the cv2svn archives. The problem is not unique to my
projects.)
This is a request for comment and cooperation on a proposed
new cvs2svn
feature, which will involve some substantial reworking of
its
infrastructure.
First let me explain the use case. I'm looking at
converting my
RCS-managed projects to Subversion. I've previously
explained how I
use C-x v s in Emacs to put a release tag on an entire
collection of
files. This is not just an idiosyncracy of mine; it's a
natural thing
to do given the limitations of RCS, and (as I noted
previously) I
wrote C-x v s originally with that in mind. The *very first
person* I
mentioned that I was doing RCS-to-SVN conversions to, though
not an
Emacs user, turns out to have been doing the same thing for
the same
reasons and to have exactly the same use case I do.
Here is a fairly typical RCS log header for one of my
projects:
RCS file: RCS/main.c,v
Working file: main.c
head: 1.89
branch:
locks: strict
access list:
symbolic names:
2-5: 1.89
2-4: 1.86
2-3: 1.85
2-2: 1.84
2-1: 1.84
2-0: 1.79
1-6bis: 1.63
1-6: 1.61
1-5: 1.53
1-4: 1.47
1-3: 1.36
1-2: 1.26
1-1: 1.22
1-0: 1.22
keyword substitution: kv
total revisions: 89
============================================================
=================
Now, when I convert this to Subversion using cvs2svn it does
no fewer
than 420 commits. I understand why -- I tend to do lots of
little
commits, and in a system where every commit bumps the
revision number
on the whole tree that number is going to get large -- but
it still
seems ridiculous. I don't want that much noise in my
version history,
not for an archive!
What I want, for archival purposes, is for cvs2svn to do one
commit
per *symbol*, with each commit getting a property that is
the value of
the original RCS release tag (after symbol transforms). The
commit
message should contain the concatenation of the RCS log
entries,
each suitably decorated with the right file/version pairs.
So. I want to add a policy option to cvs2svn, called
something like
--commit-on-symbol, to do this thing. I dived into the
cvs2svn design
notes and code to figure out how, and --- it's not going to
be simple.
The problem, at least as it appears to me, is that the
middle passes of
cvs2svn have wired deep into them the invariant of a
one-to-one
correspondence between Subversion commits and RCS cliques of
*identical*
commits (that is, same author and same log message digest on
same branch).
But that's OK. For my purposes in writing
--commit-on-symbol, I could step
in at a later phase after the Subversion commits have been
collected and
associated with branches. Here is a sketch of how I'd do
--commit-on-symbol:
1. After collecting all symbols' file-to-version association
pairs,
build a dictionary mapping each symbol to the latest date of
any
revision attached to the symbol.
2. Now ruthlessly nuke all symbol information so it doesn't
create tags
in passes 1-7.
3. Proceed to the end of pass 7, collecting a list of 'pure'
Subversion commits associated with the trunk and branches.
4. Now, introduce a new "commit-merge" pass 7a.
In this pass, 'pure'
Subversion commits are grouped into spans if they are
mergeable.
Under the default policy, no spans are ever mergeable and
the list
of commits turns into a list of singleton spans. Under
--commit-on-symbol, two spans can merge if:
a) They're adjacent on the same branch
b) They have the same author
c) They fall in the same date range is defined by the
dictionary
we built in step 1.
6. Now pass 8 runs. Pass 8 needs to be enhanced so that it
knows how
to do a single commit per *span*. If the span is a
singleton,
then pass 8 does the only pure commit in it. If the span
has
several pure commits merged into it, their deltas have to be
applied
in order and a commit message has to be generated that
includes each
one of the RCS commit messages for the pure commits,
decorated with
filenames and a timestamp.
The reason I've gone through all this argument is to show
that the
problem of implementing --commit-on-symbol really needs to
be broken
into an upper half and a lower half.
The lower half consists of introducing pass 7a and modifying
pass
8 to be aware of spans produced by merging. The lower half
need
not actually care what policy is being implemented by
merging;
all it ever does is call a predicate to tell it whether two
spans
of pure commits should be merged.
The upper half of the problem consists of writing the
particular
predicate that implements --commit-on-symbol behavior.
I now suggest (and this is the real point of this RFC) that
implementing the lower half would be quite valuable even if,
after
experimentation, --commit-on-symbol turns out to be too
blunt an
instrument. The lower half would support trying out a large
class of
different and potentially useful conversion strategies.
In fact, the Subversion team's favorite Borg design pattern
has a
place here. Each conversion strategy couild be a Borg class
with two
methods -- a mergeability test taking two spans as
arguments, and a
commit-message generator taking a span as an argument. The
default
Borg would have a mergeability test always returning False,
and a
commit-message generator that simply concatenates the RCS
log messages
of each commit in its argument span (since it only ever gets
called on
singleton spas, this gives the behavior you'd expect).
I'm ready to write the --commit-on-symbol Borg. Does this
design seem
good? Can I interest anyone who knows the cvs2svn code
better than me
in doing the span infrastructure so it will work?
--
<a href="http://www.ca
tb.org/~esr/">Eric S. Raymond</a>
What, then is law [government]? It is the collective
organization of
the individual right to lawful defense."
-- Frederic Bastiat, "The Law"
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe cvs2svn.tigris.org
For additional commands, e-mail: dev-help cvs2svn.tigris.org
|
|
| Re: RFC -- enhancing cvs2svn to have a
notion of spans of mergeable commits |

|
2007-07-15 06:28:13 |
Eric S. Raymond wrote:
> Here is a fairly typical RCS log header for one of my
projects:
> [...]
> total revisions: 89
>
> Now, when I convert this to Subversion using cvs2svn it
does no fewer
> than 420 commits. I understand why -- I tend to do
lots of little
> commits, and in a system where every commit bumps the
revision number
> on the whole tree that number is going to get large --
but it still
> seems ridiculous. I don't want that much noise in my
version history,
> not for an archive!
But the revision number is not really significant. Integers
are cheap
> What I want, for archival purposes, is for cvs2svn to
do one commit
> per *symbol*, with each commit getting a property that
is the value of
> the original RCS release tag (after symbol transforms).
The commit
> message should contain the concatenation of the RCS log
entries,
> each suitably decorated with the right file/version
pairs.
With the current cvs2svn, it would be utterly trivial to
discard
intermediate CVS revisions. But creating joined log entries
is a bit
trickier.
> So. I want to add a policy option to cvs2svn, called
something like
> --commit-on-symbol, to do this thing. I dived into the
cvs2svn design
> notes and code to figure out how, and --- it's not
going to be simple.
> The problem, at least as it appears to me, is that the
middle passes of
> cvs2svn have wired deep into them the invariant of a
one-to-one
> correspondence between Subversion commits and RCS
cliques of *identical*
> commits (that is, same author and same log message
digest on same branch).
I think you will find that cvs2svn 2.0 is much more hackable
than
earlier versions. It works less by "magic" and
more by keeping track of
information in well-defined data structures that can be
mutated to
affect the output. In fact, several mutations already occur
in
CollectRevsPass and FilterSymbolsPass.
But your changes would have to occur later, after the CVS
revisions have
been collected into repository-wide commits. I would
suggest
implementing them either in SortRevisionSummaryPass or in an
extra pass
inserted directly after that one. You would probably want
this pass to
also create a new MetadataDatabase containing the merged log
messages,
and store the metadata_id directly in the
OrderedChangesets.
> [...]
> The reason I've gone through all this argument is to
show that the
> problem of implementing --commit-on-symbol really needs
to be broken
> into an upper half and a lower half.
>
> The lower half consists of introducing pass 7a and
modifying pass
> 8 to be aware of spans produced by merging. The lower
half need
> not actually care what policy is being implemented by
merging;
> all it ever does is call a predicate to tell it whether
two spans
> of pure commits should be merged.
>
> The upper half of the problem consists of writing the
particular
> predicate that implements --commit-on-symbol behavior.
>
> I now suggest (and this is the real point of this RFC)
that
> implementing the lower half would be quite valuable
even if, after
> experimentation, --commit-on-symbol turns out to be too
blunt an
> instrument. The lower half would support trying out a
large class of
> different and potentially useful conversion
strategies.
I think that the "lower half" would be pretty easy
to add to cvs2svn 2.0
In fact, if you just glue the changesets together, create a
new
MetadataDatabase with the glued-together log messages, and
discard all
but the last CVSRevision on each affected file, it might
Just Work (TM).
(But I'm sure I've forgotten something...)
> I'm ready to write the --commit-on-symbol Borg. Does
this design seem
> good? Can I interest anyone who knows the cvs2svn code
better than me
> in doing the span infrastructure so it will work?
I'd be happy to share information with you and review your
design and
patches, but I'm afraid there are other projects that I'd
prefer to work
on in the (very limited) time I can spend on cvs2svn.
Feel free to drop by our IRC channel if you want some live
discussion.
Michael
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe cvs2svn.tigris.org
For additional commands, e-mail: dev-help cvs2svn.tigris.org
|
|
| Re: RFC -- enhancing cvs2svn to have a
notion of spans of mergeable commits |

|
2007-07-15 08:40:28 |
Michael Haggerty <mhagger alum.mit.edu>:
> But the revision number is not really significant.
Integers are cheap
*Archival* purposes, Michael.
> I think that the "lower half" would be pretty
easy to add to cvs2svn 2.0
I'll try to budget some time to do that, then.
--
<a href="http://www.ca
tb.org/~esr/">Eric S. Raymond</a>
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe cvs2svn.tigris.org
For additional commands, e-mail: dev-help cvs2svn.tigris.org
|
|
[1-3]
|
|