List Info

Thread: Re: cvs2svn on steroides




Re: cvs2svn on steroides
user name
2007-05-13 07:05:30
On Sat, May 12, 2007 at 11:49:00PM +0200, Michael Haggerty
wrote:
> I'm not happy that the choice between an anydb and a
FSDatabase is
> made based on whether the filename contains a dot. 
Encoding behavior
> based on magic encoding of parameters is not a very
good idea.
>
still, given traditional data file naming conventions, it's
sort of
natural. no, i don't insist.

> It would be cleaner either to have an explicit
parameter telling what
> kind of DB to use, or to change Database to delegate to
another
> "SimpleDatabase" that can be either an
AbstractDatabase or an
> FSDatabase.
> 
AbstractDatabase is essentially our DictMixin replacement,
so it is
needed for both backends.
making the backend choice in the ADb c'tor explicit is sure
an option.

> 1. Storing hundreds of thousands of files in a single
directory will be
> absolutely fatal on some (most?) filesystems.
> 
the comment already says this. ;)
the only still used file system that would actually have a
problem is
(v)fat. somehow i don't think anybody will do a bigger
conversion on
such a platform ...

btw, ever looked into an fsfs repo? :=)

> 2. Won't FSDatabase have a large space overhead on
filesystems that
> allocate space in block-sized chunks?  (On the order of
2k for each entry.)
> 
yes, the average slack is 2k on most FSs. assuming an
average file size
of 6k (*), that makes 33% overhead. surprise surpise -
that's the same
as BDB (as measured with a pretty big part of the kde
repo).

(*) this number is pulled out of thin air. a *very* quick
scan indicates
an average around 8-10k, which is obviously more favorable
towards fsdb.

> On the other hand, FSDatabase has been written
generally enough to
> accept arbitrary paths as keys.  If a path is used
(i.e., a string
> containing '/' or maybe os.sep?) then the value of that
key is stored
> in a subdirectory.  This code seems overkill if the
only keys that
> will be used are stringified integers.
>
yes, obviously it was written for a different usage
scenario.
*if* one'd go for key hashing, directory creation would be
needed again.
for now, i ripped it out.

> Anyway, this code hasn't been written in a
platform-neutral way (for
> example, a Windows user might write key 'a/b' but the
iterator will
> return r'ab').
>
based on usage scenarios so far, that would be yet another
thing to
document, not to fix. anyway, not relevant any more.

> You should read and write the files in binary mode.
> 
indeed.

the new patch includes all changes except
"unmagifying" the
instantiation.

-- 
Hi! I'm a .signature virus! Copy me into your ~/.signature,
please!
--
Chaos, panic, and disorder - my work here is done.

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribecvs2svn.tigris.org
For additional commands, e-mail: dev-helpcvs2svn.tigris.org
  
Re: cvs2svn on steroides
user name
2007-05-13 08:37:24
Oswald Buddenhagen wrote:
> On Sat, May 12, 2007 at 11:49:00PM +0200, Michael
Haggerty wrote:
>> 1. Storing hundreds of thousands of files in a
single directory will be
>> absolutely fatal on some (most?) filesystems.
>>
> the comment already says this. ;)
> the only still used file system that would actually
have a problem is
> (v)fat. somehow i don't think anybody will do a bigger
conversion on
> such a platform ...
> 
> btw, ever looked into an fsfs repo? :=)

Hmmm, I always thought that ext2 and ext3 have problems with
lots of
files in a directory.  Indeed, subversion 1.5 seems to be
switching to a
deeper directory structure, though mostly to help out FAT
and NTFS.
See, for example,

    http://www.farside.org.uk/200704/tree_structured_fsfs

However, ext2 seems to have problems *writing* such
directories.  Note
the final comment: "I can only assume that ext2 has a
really big problem
with inserting new files into large directories"
associated with a time
increase from 10 minutes to 10 hours when writing huge
numbers of files
in hierarchical vs. one big directory.

Michael

------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribecvs2svn.tigris.org
For additional commands, e-mail: dev-helpcvs2svn.tigris.org


[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )