|
List Info
Thread: Tie::File w/ in-memory file support
|
|
| Tie::File w/ in-memory file support |

|
2006-12-22 15:00:06 |
MJD and Porters,
Since Tie-File is now in perl core, I am mailing both mjd
and p5p.
Many Tie modules that tie perl variables to files also
support in-
memory storage like DB_File. Unfortunately Tie::File was
not one of
those even after it is incorporated to Perl 5.8 core This
is doubly
unfortunate because Perl 5.8 has in-memory file support of
its own,
thanks to late Nick-Ing Simmons.
Below is an attempt to change that. Since Perl is now
shipped with
Tie::File 0.97 while the version on CPAN is 0.96, I have
made 2
tarballs + 2 diffs available so Tie::File can be supported
by both
ends.
http://www.dan.co.jp/~dankogai/cpan/Tie-File-0.96-0.9
7.diff
http://www.dan.co.jp/~dankogai/cpan/Tie-File-0.97.tar.gz
http://www.dan.co.jp/~dankogai/cpan/Tie-File-0.97-0.9
8.diff
http://www.dan.co.jp/~dankogai/cpan/Tie-File-0.98.tar.gz
Here is a little benchmark on my MacBook Pro. Though not as
fast as
DB_File, it is more portable thanks to being pure-perl and
more
versatile than my humble Tie::Array::Pack, which can handle
fixed-
length records only.
n=1000
Rate T::F T::F-m DB_File T::A::P native
T::F 3.84/s -- -79% -97% -98% -100%
T::F-m 18.3/s 376% -- -88% -89% -99%
DB_File 153/s 3890% 739% -- -6% -93%
T::A::P 163/s 4153% 794% 7% -- -92%
native 2047/s 53189% 11106% 1236% 1153% --
n=10000
Rate T::F T::F-m DB_File T::A::P native
T::F 1.05/s -- -41% -93% -94% -99%
T::F-m 1.78/s 70% -- -89% -89% -99%
DB_File 15.6/s 1392% 780% -- -3% -91%
T::A::P 16.2/s 1445% 812% 4% -- -91%
native 176/s 16712% 9817% 1027% 988% --
I wonder what's gonna happen next but I would appreciate if
Tie::File
is dually supported like many other core modules. I have
checked
that both CPAN-abled 0.97.tar.gz (same source as core,
including t/*)
and 0.98.tar.gz work on both ends so the easiest solution is
for MJD
to upload and update CPAN again. What do you say, MJD?
Dan the Tied
|
|
| Tie::File w/ in-memory file support |

|
2006-12-22 21:33:35 |
Dan Kogai wrote:
> MJD and Porters,
>
> Since Tie-File is now in perl core, I am mailing both
mjd and p5p.
>
> Many Tie modules that tie perl variables to files also
support in-memory
> storage like DB_File. Unfortunately Tie::File was not
one of those even
> after it is incorporated to Perl 5.8 core This is
doubly unfortunate
> because Perl 5.8 has in-memory file support of its own,
thanks to late
> Nick-Ing Simmons.
I have an easier implementation of in-memory Tie::File.
Here it is. Ready?
array;
What am I missing here?
|
|
| Tie::File w/ in-memory file support |

|
2006-12-23 02:39:36 |
> Many Tie modules that tie perl variables to files also
support in-
> memory storage like DB_File. Unfortunately Tie::File
was not one of
> those even after it is incorporated to Perl 5.8 core
This is doubly
> unfortunate because Perl 5.8 has in-memory file support
of its own,
> thanks to late Nick-Ing Simmons.
I don't understand the point. What is it good for?
> I wonder what's gonna happen next but I would
appreciate if Tie::File
> is dually supported like many other core modules. I
have checked
> that both CPAN-abled 0.97.tar.gz (same source as core,
including t/*)
> and 0.98.tar.gz work on both ends so the easiest
solution is for MJD
> to upload and update CPAN again. What do you say, MJD?
Yes. I didn't realize that the core and CPAN versions were
not the
same. Thanks for pointing this out.
|
|
| Tie::File w/ in-memory file support |

|
2006-12-23 21:44:58 |
On 12/22/06, Michael G Schwern <schwern gmail.com> wrote:
> I have an easier implementation of in-memory Tie::File.
Here it is. Ready?
>
> array;
>
> What am I missing here?
After mulling this question for a day and a half, powered by
faith
that the OP is
not that blind, my working theory is that he wants to be
able to access his data
as a scalar or an array interchangeably, without having to
make all the scalar
accesses of the form { local $"=$delimiter; " array" } and all the
array acceses
of the form [split /$delimter/, $scalar]->[indexing
expression] and so
on. My theory
is that the OP wants to bind a scalar and an array to each
other with
a specifiable
delimiter and have all accesses to either Just Work.
Setting that up with an invocation syntax something like
use UnifyScalarArray;
unify $scalar, array; # optional $delimiter arg
... # all accesses to either are immediately relected in
the other
is trivial with Tie and is trivial with Overload and is
almost trivial
with source filtering.
Proposed semantics:
unify has a ($; $) prototype
when only one arg is presented, that arg is taken to be
the name
of a package
variable that will get both its scalar and array
unified.
no more than one of the arguments may have anything in
it before unification
both args get tied to classes referring to the same
underlying
object, containing
fields for the array, the scalar, and the delimiter
when there is a write to one, the other gets cleared
when there is a read on one, it is constructed if it is
undefined,
and the constructed
scalar or array is cached
If anyone wants to see this I'll implement it with Tie.
Please pitch improvements to the proposed nomenclature off
this list, either
directly or possibly on module-authors.
David Nicol
|
|
| Tie::File w/ in-memory file support |

|
2006-12-23 23:48:40 |
On Dec 23, 2006, at 11:39 , Mark Jason Dominus wrote:
>> Many Tie modules that tie perl variables to files
also support in-
>> memory storage like DB_File. Unfortunately
Tie::File was not one of
>> those even after it is incorporated to Perl 5.8
core This is doubly
>> unfortunate because Perl 5.8 has in-memory file
support of its own,
>> thanks to late Nick-Ing Simmons.
>
> I don't understand the point. What is it good for?
Practical? Doubtful, I admit but that does not mean
pointless.
On 12/22/06, Michael G Schwern <schwern gmail.com> wrote:
> I have an easier implementation of in-memory Tie::File.
Here it
> is. Ready?
>
> array;
>
> What am I missing here?
You may have missed a lot of memory; Each entry in the array
takes 20
bytes or more on a typical platform (16 bytes + overhead),
which is
not much for most cases. But when it comes to storing lots
of
numbers it is a huge memory hog. Try those.
(Sorry the example here is by DB_File for speed)
% perl -e '$a[$_]=$_ for(1..1e6); system "ps",
"v$$"'
PID STAT TIME SL RE PAGEIN VSZ RSS LIM TSIZ
%CPU %MEM
COMMAND
6948 S+ 0:00.37 0 0 0 26352 25560 - 8
0.0 1.2
perl -e $a
% perl -MDB_File -e 'tie a, "DB_File",
undef,undef,undef,$DB_RECNO;
$a[$_] = $_ for(1..1e6); system "ps",
"v$$"'
PID STAT TIME SL RE PAGEIN VSZ RSS LIM TSIZ
%CPU %MEM
COMMAND
6962 S+ 0:05.11 0 5 2 3204 2676 - 8
82.5 0.1
perl -MDB_
DB_File is unsupported in some platforms but Tie::File is.
On Dec 24, 2006, at 06:44 , David Nicol wrote:
> After mulling this question for a day and a half,
powered by faith
> that the OP is
> not that blind, my working theory is that he wants to
be able to
> access his data
> as a scalar or an array interchangeably, without having
to make all
> the scalar
> accesses of the form { local $"=$delimiter;
" array" } and all the
> array acceses
> of the form [split /$delimter/, $scalar]->[indexing
expression] and so
> on. My theory
> is that the OP wants to bind a scalar and an array to
each other with
> a specifiable
> delimiter and have all accesses to either Just Work.
I didn't think about that but that is also a point.
At any rate, I started using open FH, ">"
$scalar occasionally to
save memory. So I thought it would be nice if Tie::File
also
supports that. At any rate it doesn't hurt to support it so
far as
it doesn't break the current version.
Dan the Tied
|
|
| Tie::File w/ in-memory file support |

|
2006-12-24 02:20:02 |
> On Dec 23, 2006, at 11:39 , Mark Jason Dominus wrote:
> >> Many Tie modules that tie perl variables to
files also support in-
> >> memory storage like DB_File. Unfortunately
Tie::File was not one of
> >> those even after it is incorporated to Perl
5.8 core This is doubly
> >> unfortunate because Perl 5.8 has in-memory
file support of its own,
> >> thanks to late Nick-Ing Simmons.
> >
> > I don't understand the point. What is it good
for?
>
> Practical? Doubtful, I admit but that does not mean
pointless.
Okay, so what's the point? Was your discussion below was
supposed to
make it clear? If so, I still don't get it.
> DB_File is unsupported in some platforms but Tie::File
is.
Tie;:File is not going to save you any memory there.
> At any rate, I started using open FH, ">"
$scalar occasionally to
> save memory. So I thought it would be nice if
Tie::File also
> supports that.
It already does:
open FH, ">", $scalar;
tie rec, 'Tie::File', *FH;
So now there are two major outstanding questions:
1. Why would anyone want to do this, and
2. What does your patch provide that is not provided by the
sample above?
|
|
| Tie::File w/ in-memory file support |

|
2006-12-24 20:26:06 |
Dan Kogai wrote:
> On 12/22/06, Michael G Schwern <schwern gmail.com> wrote:
>> I have an easier implementation of in-memory
Tie::File. Here it is.
>> Ready?
>>
>> array;
>>
>> What am I missing here?
>
> You may have missed a lot of memory; Each entry in the
array takes 20
> bytes or more on a typical platform (16 bytes +
overhead), which is not
> much for most cases. But when it comes to storing lots
of numbers it is
> a huge memory hog.
May I suggest the oft-maligned vec()? Or Tie::VecArray if
you want an array interface? Or Bit::Vector? Or PDL?
$ perl -MTie::VecArray -wle 'tie a, 'Tie::VecArray', 32,
""; $a[$_] = $_ for 1..1e6; system
"ps", "v$$"'
PID STAT TIME SL RE PAGEIN VSZ RSS LIM
TSIZ %CPU %MEM COMMAND
2091 S+ 0:04.60 0 1944 0 32748 3280 -
0 97.5 -0.2 pe
$ perl -MDB_File -wle 'tie a, 'DB_File', undef, undef,
undef, $DB_RECNO; $a[$_] = $_ for 1..1e6; system
"ps", "v$$"'
PID STAT TIME SL RE PAGEIN VSZ RSS LIM
TSIZ %CPU %MEM COMMAND
2217 S+ 0:04.38 0 1968 0 28600 2060 -
0 99.0 -0.1 pe
$ perl -wle '$a[$_] = $_ for 1..1e6; system "ps",
"v$$"'Name "main::a" used only once:
possible typo at -e line 1.
PID STAT TIME SL RE PAGEIN VSZ RSS LIM
TSIZ %CPU %MEM COMMAND
2280 R+ 0:00.31 0 1848 0 44832 20840 -
0 58.2 -1.0 pe
Now what about speed? Here's some performance numbers for
Tie::File.
$ time perl -MTie::File -le 'open FILE, "+>",
$foo; tie a, "Tie::File", *FILE or die; $a[$_] = $_
for 1..1e3; system "ps", "v$$"'
PID STAT TIME SL RE PAGEIN VSZ RSS LIM
TSIZ %CPU %MEM COMMAND
2558 S+ 0:00.57 0 2048 0 28776 2972 -
0 85.8 -0.1 pe
real 0m0.580s
user 0m0.559s
sys 0m0.018s
That's 1e3, now 1e4.
$ time perl -MTie::File -wle 'open FILE, "+>",
$foo; tie a, "Tie::File", *FILE or die; $a[$_] = $_
for 1..1e4; system "ps", "v$$"'
Name "main::foo" used only once: possible typo at
-e line 1.
Use of uninitialized value in <HANDLE> at
/usr/local/perl/5.8.8/lib/Tie/File.pm line 917.
Use of uninitialized value in <HANDLE> at
/usr/local/perl/5.8.8/lib/Tie/File.pm line 917.
PID STAT TIME SL RE PAGEIN VSZ RSS LIM
TSIZ %CPU %MEM COMMAND
2560 S+ 0:45.10 0 2048 0 30300 6152 -
0 98.1 -0.3 pe
real 0m45.683s
user 0m44.966s
sys 0m0.158s
Here's the table.
1e3 .580s
2e3 1.985s
4e3 7.497s
8e3 29.224s
1e4 45.683s
That's O(n**2) time and that sucks. Tie::VecArray and
DB_File are both O(n) and complete 1e6 in less than 5
seconds (loop assignment overhead is about half a second).
Adding in-memory access to Tie::File just because it can be
made efficient for numbers doesn't make sense. Normally
TMTOWTDI is good, but its for files, its optimized for
files, special casing for efficient in-memory access will
just complicate things (efficient array-to-file access is
hard enough), there's already better ways to do it and they
can all have the same interface (an array).
Its a nice, focused, single-purpose module.
|
|
[1-7]
|
|