List Info

Thread: Souce of unlabeled branches




Souce of unlabeled branches
user name
2006-08-23 21:54:02
This thread is getting very developy, so please direct
followup emails
to devcvs2svn.tigris.org.

Jon Smirl wrote:
> On 8/23/06, Michael Haggerty <mhaggeralum.mit.edu> wrote:
>> I don't understand how this could work.  When a
branch is created, the
>> only change is in the header:
>>
>>     symbols mybranch:1.13.2;
>>
>> with no timestamp (and for your unlabeled branches,
this is not
>> present).  When commits are made on the branch,
then timestamped
>> revisions with numbers 1.13.2.1, 1.13.2.2, etc
appear; but these
>> timestamps can be much later than when the branch
was actually created.
> 
> The idea is to look at the commits being done on the
unlabel branches.
> If the commits all belong to a single change set, then
the unlabel
> branches those commits occured on should be combined
into a single
> unlabeled branch. This won't completely fix unlabeled
branches, but it
> should significantly reduce the number of them.
> 
>> If a second unlabeled branch exists in the same
file, then there might
>> be additional revisions 1.16.4.1, 1.16.4.2 etc (for
example).
>>
>> How do I know which of these branches should go
together with unlabeled
>> branch "1.64.2.1, 1.64.2.2" in another
file.  There simply isn't enough
>> information available.
> 
> If CVS commits 1.13.2.1 and 1.64.2.1 end up grouped
into a single SVN
> change set, there should have only been one unlabeled
branch, not two.

Ah, I see your idea.  This could definitely resolve a lot of
the ambiguity.

There are some complications though:

1. The first commits on two separate branches are not
necessarily part
of the same CVS commit.  (For example if I create a branch,
then on
Monday change file1, on Tuesday file2, then on Wednesday
file1 and file2
at the same time.)  But if any two commits on two unlabeled
branches are
in the same commitset, then that would be a good indication
that the
branches should have the same name.

2. These associations should be transitive: if commits on an
unlabeled
branch in file1 and file2 are together in one commitset, and
on file2
and file3 in another commitset, then all three files can
probably be put
in the same unlabeled branch.

3. These associations are not airtight, because:

   a. It is possible that cvs2svn groups CVS commits
together in
commitsets even though they weren't really made in the same
CVS commit.

   b. It is possible that a CVS user made a single CVS
commit that
spanned more than one branch.

Because of 3, there has to be an extra mechanism to prevent
associative
"loops" like the following: revision 1.2.2.1 of
file1 is grouped with
revision 1.2.2.1 of file2, but 1.2.2.2 of file2 is grouped
with 1.2.4.1
of file1, which would falsely imply that branch 1.2.2 and
1.2.4 of file1
should get the same branch name.

How could this be implemented?

Naively, on could run AggregateRevsPass once to get a
preliminary
grouping of CVSRevisions into SVNRevisions, then look for
commits on
unlabeled branches that are grouped together into single
SVNRevisions.
The corresponding pair of CVSBranch objects (essentially
(file,branch_rev) tuples) would be considered equal and thus
CVSBranches
would be formed into equivalence classes.  Each equivalence
class would
be assigned a new branch name, the branch tags renamed, and
then
AggregateRevsPass would be run again.  Any equivalence
classes that
contained more than one CVSBranch from a single file could
simply be
ignored with a warning.

An even more efficient association mechanism would be tags
or
sub-branches on unlabeled branches, because such things
would typically
appear in all or most of the files on the unlabeled branch.

I think that you couldn't get away with running
AggregateRevsPass only
once because the first pass would generate too many commits
because of
unlabeled branches being fragmented into many different
names.
Therefore the number of SVN commits would be less after the
second pass.

This all sounds fairly reasonable.  Even if the resynthesis
of unlabeled
commits made mistakes, it wouldn't really be worse than the
current
situation where arbitrary unlabeled branch commits are
grouped together.
 Often there wouldn't be enough filewise associations to
bind all of the
corresponding unlabeled branches together, but again,
that's not worse
than now.  And the AggregateRevsPass is not terribly
expensive.

That being said, I'm still skeptical that it would worth
all this
effort.  I expect that unlabeled branches are almost always
junk that
somebody tried unsuccessfully to erase from the CVS
repository.

Michael

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribecvs2svn.tigris.org
For additional commands, e-mail: dev-helpcvs2svn.tigris.org

Souce of unlabeled branches
user name
2006-08-23 23:00:31
On 8/23/06, Michael Haggerty <mhaggeralum.mit.edu> wrote:
> This thread is getting very developy, so please direct
followup emails
> to devcvs2svn.tigris.org.
>
> Jon Smirl wrote:
> > On 8/23/06, Michael Haggerty <mhaggeralum.mit.edu> wrote:
> >> I don't understand how this could work.  When
a branch is created, the
> >> only change is in the header:
> >>
> >>     symbols mybranch:1.13.2;
> >>
> >> with no timestamp (and for your unlabeled
branches, this is not
> >> present).  When commits are made on the
branch, then timestamped
> >> revisions with numbers 1.13.2.1, 1.13.2.2, etc
appear; but these
> >> timestamps can be much later than when the
branch was actually created.
> >
> > The idea is to look at the commits being done on
the unlabel branches.
> > If the commits all belong to a single change set,
then the unlabel
> > branches those commits occured on should be
combined into a single
> > unlabeled branch. This won't completely fix
unlabeled branches, but it
> > should significantly reduce the number of them.
> >
> >> If a second unlabeled branch exists in the
same file, then there might
> >> be additional revisions 1.16.4.1, 1.16.4.2 etc
(for example).
> >>
> >> How do I know which of these branches should
go together with unlabeled
> >> branch "1.64.2.1, 1.64.2.2" in
another file.  There simply isn't enough
> >> information available.
> >
> > If CVS commits 1.13.2.1 and 1.64.2.1 end up
grouped into a single SVN
> > change set, there should have only been one
unlabeled branch, not two.
>
> Ah, I see your idea.  This could definitely resolve a
lot of the ambiguity.
>
> There are some complications though:
>
> 1. The first commits on two separate branches are not
necessarily part
> of the same CVS commit.  (For example if I create a
branch, then on
> Monday change file1, on Tuesday file2, then on
Wednesday file1 and file2
> at the same time.)  But if any two commits on two
unlabeled branches are
> in the same commitset, then that would be a good
indication that the
> branches should have the same name.
>
> 2. These associations should be transitive: if commits
on an unlabeled
> branch in file1 and file2 are together in one
commitset, and on file2
> and file3 in another commitset, then all three files
can probably be put
> in the same unlabeled branch.
>
> 3. These associations are not airtight, because:
>
>    a. It is possible that cvs2svn groups CVS commits
together in
> commitsets even though they weren't really made in the
same CVS commit.
>
>    b. It is possible that a CVS user made a single CVS
commit that
> spanned more than one branch.
>
> Because of 3, there has to be an extra mechanism to
prevent associative
> "loops" like the following: revision
1.2.2.1 of file1 is grouped with
> revision 1.2.2.1 of file2, but 1.2.2.2 of file2 is
grouped with 1.2.4.1
> of file1, which would falsely imply that branch 1.2.2
and 1.2.4 of file1
> should get the same branch name.
>
> How could this be implemented?
>
> Naively, on could run AggregateRevsPass once to get a
preliminary
> grouping of CVSRevisions into SVNRevisions, then look
for commits on
> unlabeled branches that are grouped together into
single SVNRevisions.
> The corresponding pair of CVSBranch objects
(essentially
> (file,branch_rev) tuples) would be considered equal and
thus CVSBranches
> would be formed into equivalence classes.  Each
equivalence class would
> be assigned a new branch name, the branch tags renamed,
and then
> AggregateRevsPass would be run again.  Any equivalence
classes that
> contained more than one CVSBranch from a single file
could simply be
> ignored with a warning.
>
> An even more efficient association mechanism would be
tags or
> sub-branches on unlabeled branches, because such things
would typically
> appear in all or most of the files on the unlabeled
branch.
>
> I think that you couldn't get away with running
AggregateRevsPass only
> once because the first pass would generate too many
commits because of
> unlabeled branches being fragmented into many different
names.
> Therefore the number of SVN commits would be less after
the second pass.
>
> This all sounds fairly reasonable.  Even if the
resynthesis of unlabeled
> commits made mistakes, it wouldn't really be worse
than the current
> situation where arbitrary unlabeled branch commits are
grouped together.
>  Often there wouldn't be enough filewise associations
to bind all of the
> corresponding unlabeled branches together, but again,
that's not worse
> than now.  And the AggregateRevsPass is not terribly
expensive.
>
> That being said, I'm still skeptical that it would
worth all this
> effort.  I expect that unlabeled branches are almost
always junk that
> somebody tried unsuccessfully to erase from the CVS
repository.

As a quick test, could all of the commits to the unlabeled
branches be
combined into a single branch which would then have it's
changesets
grouped? I could then manually inspect those change sets and
assess
how big the problem really is. The way things are now I
can't tell if
a base tag has been deleted once or 700 times. If I end up
with 3
change sets on the unlabeled branch it is a small problem;
it's a big
one if I end up with 500 change sets there.


>
> Michael
>


-- 
Jon Smirl
jonsmirlgmail.com

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribecvs2svn.tigris.org
For additional commands, e-mail: dev-helpcvs2svn.tigris.org

Souce of unlabeled branches
user name
2006-08-24 05:56:51
fwiw, in the kde repo we had quite some of those unnamed
branches. i
found that from many of those, properly named symbols
sprout. so i
created a quite simple symbol name derivator that names a
branch
__KDE_1_0_RELEASE if it finds a KDE_1_0_RELEASE tag on it as
the first
one. 
mike: that code is in the cvs2svn-kde tar-ball i sent you.


-- 
Hi! I'm a .signature virus! Copy me into your ~/.signature,
please!
--
Chaos, panic, and disorder - my work here is done.

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribecvs2svn.tigris.org
For additional commands, e-mail: dev-helpcvs2svn.tigris.org

Souce of unlabeled branches
user name
2006-08-24 13:59:24
On 8/24/06, Oswald Buddenhagen <ossikde.org> wrote:
> fwiw, in the kde repo we had quite some of those
unnamed branches. i
> found that from many of those, properly named symbols
sprout. so i
> created a quite simple symbol name derivator that names
a branch
> __KDE_1_0_RELEASE if it finds a KDE_1_0_RELEASE tag on
it as the first
> one.
> mike: that code is in the cvs2svn-kde tar-ball i sent
you. 

I have the same situation in Mozilla CVS. There are proper
looking
symbols on my unamed branches also.

How many times has this happed in KDE? Could it be the
result of
someone deleting a symbol or is it more widespread (maybe a
broken
tool at some point)?



-- 
Jon Smirl
jonsmirlgmail.com

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribecvs2svn.tigris.org
For additional commands, e-mail: dev-helpcvs2svn.tigris.org

Souce of unlabeled branches
user name
2006-08-24 16:04:56
On Thu, Aug 24, 2006 at 09:59:24AM -0400, Jon Smirl wrote:
> On 8/24/06, Oswald Buddenhagen <ossikde.org> wrote:
> >fwiw, in the kde repo we had quite some of those
unnamed branches. i
> >found that from many of those, properly named
symbols sprout. so i
> >created a quite simple symbol name derivator that
names a branch
> >__KDE_1_0_RELEASE if it finds a KDE_1_0_RELEASE tag
on it as the first
> >one.
> 
> I have the same situation in Mozilla CVS. There are
proper looking
> symbols on my unamed branches also.
> 
> How many times has this happed in KDE?
>
lots of times.

> Could it be the result of someone deleting a symbol or
is it more
> widespread (maybe a broken tool at some point)?
> 
i think it's all due to manual interaction.

-- 
Hi! I'm a .signature virus! Copy me into your ~/.signature,
please!
--
Chaos, panic, and disorder - my work here is done.

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribecvs2svn.tigris.org
For additional commands, e-mail: dev-helpcvs2svn.tigris.org

[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )