List Info

Thread: RE: Question regarding Microsoft patent on file synchronization




RE: Question regarding Microsoft patent on file synchronization
user name
2007-04-22 12:58:34
This thread leads me to another question.

I was in the process of setting up a Microsoft Distributed
File System with
Replication at the same time that I've been working on
setting up Subversion
for Source control.

I was thinking of the two as completely different things,
but now that I've
learning about setting up Subversion with Apache for WebDAV
I'm starting to
question that logic.

I'm beginning to think that Subversion might be able to
handle both
situations for me.

My primary reasons for wanting to setup MS DSF with
replication were:

1) Location redundancy for critical files
2) centralization of files for backup purposes
3) better "performance" when access files over
WAN.

So, now I'm thinking of setting up the following in
Subversion:

1) setup a repository and to use as the shared
"drive"
2) Configure Apache to serve that repository was a WebDav
drive
3) Checkout the repository in step 1 to a WC on a server at
our remote
office with a script on that server to perform an Update
every few minutes
4) Grant only read level access to all users for the WC
created in step 3
5) Setup a job that dumps the repository from step 1 to our
backup server
just before scheduled tap backups of that server

My thinking here is that I should be able to meet most of my
objects with
such a configuration.

Does anyone have any comments about whether such a
configuration sounds like
good idea or not and how it might compare to implementing
Microsoft DFS with
Replication?

Regards,
Tom Malia


-----Original Message-----
From: Karl Fogel [mailto:kfogelred-bean.com] 
Sent: Saturday, April 21, 2007 3:10 PM
To: Sean McCarthy
Cc: userssubversion.tigris.org; devsubversion.tigris.org
Subject: Re: Question regarding Microsoft patent on file
synchronization

Sean McCarthy <smccarthyintegraas.com> writes:
> I'm not sure if it is the right list to post, but
reviewing some
> patent documents our company found this patent document
from Microsoft
> relating to file synchronization:
>
>
http://patft.uspto.gov/netac
gi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetah

tml%2FPTO%2Fsearch-adv.htm&r=75&f=G&l=50&d=P
TXT&s1=microsoft.ASNM.&p=2&OS=AN
/microsoft&RS=AN/microsoft
>
> While regarding the binary file synchronization
(possibly for file
> systems) this patent shares a lot of points in common
with the way
> subversion makes files synchronized.
>
> The patent was filed on November 2003 and granted as a
patent on April
> 10, 2007. It is clear that the claims were made after
Subversion
> conception and that it shares the same concepts with
other open source
> projects as 'rsync' that dates back to the filing (even
1999).
>
> We are just wondering if this affects in any way to
Subversion and a
> possible patent infringement retaliation from
Microsoft.

Thank you for posting about this, Sean.  I've CC'd dev.

I am fairly sure this patent would not withstand a challenge
from an
even mildly competent garden slug.  It describes techniques
that have
been known and practiced in the field of version control for
decades.
I am not too worried about Subversion being subject to an
infringement
suit based on this patent.

For entertainment value, here's a translation of Claim #1
(of 20) into
plain English, or at least into English that's usual for
this field:

   A method of maintaining an updated file, comprising: 
   
   - storing copies A and B of a base file at client1 and at
client2
   
   - receiving change C1 to client1:A
           and change C2 to client2:A
   
   - determining DIFF1 == client1:A<->client1:B
             and DIFF2 == client2:A<->client2:B
   
   - transmitting DIFF1 and DIFF2 to a server
   
   - receiving either DIFF1 or DIFF2 at the server first in
time
   
   - iff the base file on the server is the same as the base
file
     stored at the client associated with the diff received
first,
     server accepts that diff; otherwise server rejects the
diff
   
   - server rejects the diff received second in time [here,
they did
     not spell out the fact that the server should reject
the diff
     received second whether or not it rejected the one it
received
     first, because either way the server's file is now
different:
     either it applied the change from the diff it received
first, or
     it did not apply that change because the client's base
copy was
     out of date anyway]
   
   - transmitting a third diff from server to the client
that sent the
     diff that was received second [this "third
diff" is simply the
     diff needed to bring the other client up to date with
the change
     that the server accepted]
   
   - applying the third diff to the second copy of the base
file
     stored at the client [also known as "merging
upstream changes
     into a locally modified file"]

As you can see, Claim 1 simply describes the standard
commit/update
algorithm in Subversion.  Note that Subversion didn't invent
this; we
took it -- with only trivial optimization changes -- from
CVS, which
has been using it since 1986 IIRC, and CVS didn't invent it
either.
I'm not sure the patent is claiming that part is original,
though.

The remaining claims go on to describe how to handle the
out-of-date
case, by keeping multiple base copies (text-bases, we might
call them)
at the clients, using them to reconstruct the latest server
file,
reconstructing the diff that expresses the local change
using the new
data, and retransmitting that to the server.  Very roughly:

   1. client-working-copy-1 starts out same as
client-text-base-1

   2. client-working-copy-1 gets a local change, now differs
from
      client-text-base-1

   3. client sends diff to server, but server notices that
      client-working-copy-1 is out-of-date.

   4. server transmits a diff to bring client up-to-date

   5. client creats client-text-base-2, then applies that
diff to it
      to create updated-client-text-base-2

   6. client can now retransmit its local change, by
creating
      client-working-copy-2 as a copy of
updated-client-text-base-2,
      taking the diff from client-text-base-1 to
client-working-copy-1
      and applying it t client-working-copy-2, and then
transmitting
      (to the server) the diff from
update-client-text-base-2 to
      client-working-copy-2.

Yawn.  The method described is not even as clever as what
rsync does.

The distinctiveness claimed for "binary diffs" is
spurious.  All diffs
are binary diffs; textual diffs are just binary diffs that
treat
certain character sequences (LF, CRLF, etc) specially, using
them as
anchor points for finding range boundaries.  But you can
find range
boundaries without any anchors at all (rsync does it, so do
we), and
yes, you can do fuzzy application of diffs without anchors
too.

In my professional opinion, this patent should not have been
granted.

-Karl

------------------------------------------------------------
---------
To unsubscribe, e-mail: users-unsubscribesubversion.tigris.org
For additional commands, e-mail: users-helpsubversion.tigris.org



------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribesubversion.tigris.org
For additional commands, e-mail: dev-helpsubversion.tigris.org


[1]

about | contact  Other archives ( Real Estate discussion Medical topics )