List Info

Thread: Tie::File w/ in-memory file support




Tie::File w/ in-memory file support
user name
2006-12-22 15:00:06
MJD and Porters,

Since Tie-File is now in perl core, I am mailing both mjd
and p5p.

Many Tie modules that tie perl variables to files also
support in- 
memory storage like DB_File.  Unfortunately Tie::File was
not one of  
those even after it is incorporated to Perl 5.8 core  This
is doubly  
unfortunate because Perl 5.8 has in-memory file support of
its own,  
thanks to late Nick-Ing Simmons.

Below is an attempt to change that.  Since Perl is now
shipped with  
Tie::File 0.97 while the version on CPAN is 0.96, I have
made 2  
tarballs  + 2 diffs available so Tie::File can be supported
by both  
ends.

http://www.dan.co.jp/~dankogai/cpan/Tie-File-0.96-0.9
7.diff
http://www.dan.co.jp/~dankogai/cpan/Tie-File-0.97.tar.gz


http://www.dan.co.jp/~dankogai/cpan/Tie-File-0.97-0.9
8.diff
http://www.dan.co.jp/~dankogai/cpan/Tie-File-0.98.tar.gz


Here is a little benchmark on my MacBook Pro.  Though not as
fast as  
DB_File, it is more portable thanks to being pure-perl and
more  
versatile than my humble Tie::Array::Pack, which can handle
fixed- 
length records only.

n=1000
           Rate    T::F  T::F-m DB_File T::A::P  native
T::F    3.84/s      --    -79%    -97%    -98%   -100%
T::F-m  18.3/s    376%      --    -88%    -89%    -99%
DB_File  153/s   3890%    739%      --     -6%    -93%
T::A::P  163/s   4153%    794%      7%      --    -92%
native  2047/s  53189%  11106%   1236%   1153%      --
n=10000
           Rate    T::F  T::F-m DB_File T::A::P  native
T::F    1.05/s      --    -41%    -93%    -94%    -99%
T::F-m  1.78/s     70%      --    -89%    -89%    -99%
DB_File 15.6/s   1392%    780%      --     -3%    -91%
T::A::P 16.2/s   1445%    812%      4%      --    -91%
native   176/s  16712%   9817%   1027%    988%      --

I wonder what's gonna happen next but I would appreciate if
Tie::File  
is dually supported like many other core modules.  I have
checked  
that both CPAN-abled 0.97.tar.gz (same source as core,
including t/*)  
and 0.98.tar.gz work on both ends so the easiest solution is
for MJD  
to upload and update CPAN again.  What do you say, MJD?

Dan the Tied
Tie::File w/ in-memory file support
user name
2006-12-22 21:33:35
Dan Kogai wrote:
> MJD and Porters,
> 
> Since Tie-File is now in perl core, I am mailing both
mjd and p5p.
> 
> Many Tie modules that tie perl variables to files also
support in-memory
> storage like DB_File.  Unfortunately Tie::File was not
one of those even
> after it is incorporated to Perl 5.8 core  This is
doubly unfortunate
> because Perl 5.8 has in-memory file support of its own,
thanks to late
> Nick-Ing Simmons.

I have an easier implementation of in-memory Tie::File. 
Here it is.  Ready?

  array;

What am I missing here?

Tie::File w/ in-memory file support
user name
2006-12-23 02:39:36
> Many Tie modules that tie perl variables to files also
support in- 
> memory storage like DB_File.  Unfortunately Tie::File
was not one of  
> those even after it is incorporated to Perl 5.8 core 
This is doubly  
> unfortunate because Perl 5.8 has in-memory file support
of its own,  
> thanks to late Nick-Ing Simmons.

I don't understand the point.  What is it good for?

> I wonder what's gonna happen next but I would
appreciate if Tie::File  
> is dually supported like many other core modules.  I
have checked  
> that both CPAN-abled 0.97.tar.gz (same source as core,
including t/*)  
> and 0.98.tar.gz work on both ends so the easiest
solution is for MJD  
> to upload and update CPAN again.  What do you say, MJD?

Yes.  I didn't realize that the core and CPAN versions were
not the
same.   Thanks for pointing this out.

Tie::File w/ in-memory file support
user name
2006-12-23 21:44:58
On 12/22/06, Michael G Schwern <schwerngmail.com> wrote:
> I have an easier implementation of in-memory Tie::File.
 Here it is.  Ready?
>
>   array;
>
> What am I missing here?

After mulling this question for a day and a half, powered by
faith
that the OP is
not that blind, my working theory is that he wants to be
able to access his data
as a scalar or an array interchangeably, without having to
make all the scalar
accesses of the form  { local $"=$delimiter; "array" } and all the
array acceses
of the form [split /$delimter/, $scalar]->[indexing
expression] and so
on.  My theory
is that the OP wants to bind a scalar and an array to each
other with
a specifiable
delimiter and have all accesses to either Just Work.

Setting that up with an invocation syntax something like

   use UnifyScalarArray;
   unify $scalar, array; # optional $delimiter arg
   ... # all accesses to either are immediately relected in
the other

is trivial with Tie and is trivial with Overload and is
almost trivial
with source filtering.

Proposed semantics:

    unify has a ($;$) prototype

    when only one arg is presented, that arg is taken to be
the name
of a package
    variable that will get both its scalar and array
unified.

    no more than one of the arguments may have anything in
it before unification

    both args get tied to classes referring to the same
underlying
object, containing
    fields for the array, the scalar, and the delimiter

    when there is a write to one, the other gets cleared

    when there is a read on one, it is constructed if it is
undefined,
and the constructed
    scalar or array is cached

If anyone wants to see this I'll implement it with Tie.

Please pitch improvements to the proposed nomenclature off
this list, either
directly or possibly on module-authors.

David Nicol
Tie::File w/ in-memory file support
user name
2006-12-23 23:48:40
On Dec 23, 2006, at 11:39 , Mark Jason Dominus wrote:
>> Many Tie modules that tie perl variables to files
also support in-
>> memory storage like DB_File.  Unfortunately
Tie::File was not one of
>> those even after it is incorporated to Perl 5.8
core  This is doubly
>> unfortunate because Perl 5.8 has in-memory file
support of its own,
>> thanks to late Nick-Ing Simmons.
>
> I don't understand the point.  What is it good for?

Practical?  Doubtful, I admit but that does not mean
pointless.


On 12/22/06, Michael G Schwern <schwerngmail.com> wrote:
> I have an easier implementation of in-memory Tie::File.
 Here it  
> is.  Ready?
>
>   array;
>
> What am I missing here?

You may have missed a lot of memory; Each entry in the array
takes 20  
bytes or more on a typical platform (16 bytes + overhead),
which is  
not much for most cases.  But when it comes to storing lots
of  
numbers it is a huge memory hog.  Try those.
(Sorry the example here is by DB_File for speed)

% perl -e '$a[$_]=$_ for(1..1e6); system "ps",
"v$$"'
   PID STAT      TIME  SL  RE PAGEIN   VSZ   RSS   LIM TSIZ
%CPU %MEM  
COMMAND
6948 S+     0:00.37   0   0      0 26352 25560     -    8 
0.0  1.2  
perl -e $a

% perl -MDB_File -e 'tie a, "DB_File",
undef,undef,undef,$DB_RECNO;  
$a[$_] = $_ for(1..1e6); system "ps",
"v$$"'
   PID STAT      TIME  SL  RE PAGEIN   VSZ   RSS   LIM TSIZ
%CPU %MEM  
COMMAND
6962 S+     0:05.11   0   5      2  3204  2676     -    8
82.5  0.1  
perl -MDB_

DB_File is unsupported in some platforms but Tie::File is.

On Dec 24, 2006, at 06:44 , David Nicol wrote:
> After mulling this question for a day and a half,
powered by faith
> that the OP is
> not that blind, my working theory is that he wants to
be able to  
> access his data
> as a scalar or an array interchangeably, without having
to make all  
> the scalar
> accesses of the form  { local $"=$delimiter;
"array" } and all the
> array acceses
> of the form [split /$delimter/, $scalar]->[indexing
expression] and so
> on.  My theory
> is that the OP wants to bind a scalar and an array to
each other with
> a specifiable
> delimiter and have all accesses to either Just Work.

I didn't think about that but that is also a point.

At any rate,  I started using open FH, ">"
$scalar occasionally to  
save memory.  So I thought it would be nice if Tie::File
also  
supports that.  At any rate it doesn't hurt to support it so
far as  
it doesn't break the current version.

Dan the Tied

Tie::File w/ in-memory file support
user name
2006-12-24 02:20:02
> On Dec 23, 2006, at 11:39 , Mark Jason Dominus wrote:
> >> Many Tie modules that tie perl variables to
files also support in-
> >> memory storage like DB_File.  Unfortunately
Tie::File was not one of
> >> those even after it is incorporated to Perl
5.8 core  This is doubly
> >> unfortunate because Perl 5.8 has in-memory
file support of its own,
> >> thanks to late Nick-Ing Simmons.
> >
> > I don't understand the point.  What is it good
for?
> 
> Practical?  Doubtful, I admit but that does not mean
pointless.

Okay, so what's the point?  Was your discussion below was
supposed to
make it clear?  If so, I still don't get it.

> DB_File is unsupported in some platforms but Tie::File
is.

Tie;:File is not going to save you any memory there.

> At any rate,  I started using open FH, ">"
$scalar occasionally to  
> save memory.  So I thought it would be nice if
Tie::File also  
> supports that.

It already does:

        open FH, ">", $scalar;
        tie rec, 'Tie::File', *FH;

So now there are two major outstanding questions:

1. Why would anyone want to do this, and
2. What does your patch provide that is not provided by the
sample above?

Tie::File w/ in-memory file support
user name
2006-12-24 20:26:06
Dan Kogai wrote:
> On 12/22/06, Michael G Schwern <schwerngmail.com> wrote:
>> I have an easier implementation of in-memory
Tie::File.  Here it is. 
>> Ready?
>>
>>   array;
>>
>> What am I missing here?
> 
> You may have missed a lot of memory; Each entry in the
array takes 20
> bytes or more on a typical platform (16 bytes +
overhead), which is not
> much for most cases.  But when it comes to storing lots
of numbers it is
> a huge memory hog.

May I suggest the oft-maligned vec()?  Or Tie::VecArray if
you want an array interface?  Or Bit::Vector?  Or PDL?

$ perl -MTie::VecArray -wle 'tie a, 'Tie::VecArray', 32,
"";  $a[$_] = $_ for 1..1e6; system
"ps", "v$$"'
  PID STAT      TIME  SL  RE PAGEIN      VSZ    RSS   LIM   
 TSIZ %CPU %MEM COMMAND
 2091 S+     0:04.60   0 1944      0    32748   3280     -  
     0  97.5 -0.2 pe
$ perl -MDB_File -wle 'tie a, 'DB_File', undef, undef,
undef, $DB_RECNO;  $a[$_] = $_ for 1..1e6; system
"ps", "v$$"'
  PID STAT      TIME  SL  RE PAGEIN      VSZ    RSS   LIM   
 TSIZ %CPU %MEM COMMAND
 2217 S+     0:04.38   0 1968      0    28600   2060     -  
     0  99.0 -0.1 pe
$ perl -wle '$a[$_] = $_ for 1..1e6; system "ps",
"v$$"'Name "main::a" used only once:
possible typo at -e line 1.
  PID STAT      TIME  SL  RE PAGEIN      VSZ    RSS   LIM   
 TSIZ %CPU %MEM COMMAND
 2280 R+     0:00.31   0 1848      0    44832  20840     -  
     0  58.2 -1.0 pe


Now what about speed?  Here's some performance numbers for
Tie::File.

$ time perl -MTie::File -le 'open FILE, "+>",
$foo;  tie a, "Tie::File", *FILE or die;  $a[$_] = $_
for 1..1e3; system "ps", "v$$"'
  PID STAT      TIME  SL  RE PAGEIN      VSZ    RSS   LIM   
 TSIZ %CPU %MEM COMMAND
 2558 S+     0:00.57   0 2048      0    28776   2972     -  
     0  85.8 -0.1 pe

real    0m0.580s
user    0m0.559s
sys     0m0.018s

That's 1e3, now 1e4.

$ time perl -MTie::File -wle 'open FILE, "+>",
$foo;  tie a, "Tie::File", *FILE or die;  $a[$_] = $_
for 1..1e4; system "ps", "v$$"'
Name "main::foo" used only once: possible typo at
-e line 1.
Use of uninitialized value in <HANDLE> at
/usr/local/perl/5.8.8/lib/Tie/File.pm line 917.
Use of uninitialized value in <HANDLE> at
/usr/local/perl/5.8.8/lib/Tie/File.pm line 917.
  PID STAT      TIME  SL  RE PAGEIN      VSZ    RSS   LIM   
 TSIZ %CPU %MEM COMMAND
 2560 S+     0:45.10   0 2048      0    30300   6152     -  
     0  98.1 -0.3 pe

real    0m45.683s
user    0m44.966s
sys     0m0.158s

Here's the table.

1e3	  .580s
2e3	 1.985s	
4e3	 7.497s
8e3	29.224s
1e4	45.683s

That's O(n**2) time and that sucks.  Tie::VecArray and
DB_File are both O(n) and complete 1e6 in less than 5
seconds (loop assignment overhead is about half a second).

Adding in-memory access to Tie::File just because it can be
made efficient for numbers doesn't make sense.  Normally
TMTOWTDI is good, but its for files, its optimized for
files, special casing for efficient in-memory access will
just complicate things (efficient array-to-file access is
hard enough), there's already better ways to do it and they
can all have the same interface (an array).  

Its a nice, focused, single-purpose module.
[1-7]

about | contact  Other archives ( Real Estate discussion Medical topics )