List Info

Thread: archzoom




archzoom
user name
2006-09-29 18:22:18
On an Arch-related topic, the Savannah people disabled Archzoom (the Arch archive web browser) because it "takes too much CPU."  I wonder if the CPU load is due to Archzoom's exec'ing tla to carry out queries...

It would have been good to propose a more lightweight Arch web browser as a Google Summer of Code project... anyway, maybe next year...


--
Andy Tai, ataiatai.org">ataiatai.org
archzoom
user name
2006-10-09 12:16:51
Hi,

"Andy Tai" <ataignu.org> writes:

> On an Arch-related topic, the Savannah people disabled
Archzoom (the Arch
> archive web browser) because it "takes too much
CPU."  I wonder if the CPU
> load is due to Archzoom's exec'ing tla to carry out
queries...

Yes, it seems to be the case from what I heard about it. 
But I think it
would be nice if people with experience on that topic (i.e.,
the
Savannah and Gna! folks) could share it with us, and tell us
exactly
what they think goes wrong (it is true that `tla' is not
lightning fast,
especially without a revlib).

Thanks,
Ludovic.


_______________________________________________
Gnu-arch-users mailing list
Gnu-arch-usersgnu.org

http://lists.gnu.org/mailman/listinfo/gnu-arch-users

GNU arch home page:
http://sav
annah.gnu.org/projects/gnu-arch/
archzoom
user name
2006-10-10 02:30:23
ludovic.courteslaas.fr (Ludovic Court.ANhs) writes:
> what they think goes wrong (it is true that `tla' is
not lightning fast,
> especially without a revlib).

More accurately, tla is dog-slow and consumes cpu/disk-io
like crazy for
most operations (it's slightly better about network-io in
terms of bytes
transferred, but goes to town with the worst latency ever).
I still use
tla, mind you, but my number one complaint is its insane
inefficiency;
maybe darcs is slower, I dunno.

-Miles
-- 
$B<+$i$r6u$K$7$F!"?4$r3+$/;~!"F;$O3+$+$l$k(B


_______________________________________________
Gnu-arch-users mailing list
Gnu-arch-usersgnu.org

http://lists.gnu.org/mailman/listinfo/gnu-arch-users

GNU arch home page:
http://sav
annah.gnu.org/projects/gnu-arch/
archzoom
user name
2006-10-10 13:32:08
Hi,

Miles Bader <miles.badernecel.com> writes:

> ludovic.courteslaas.fr (Ludovic Courtès) writes:
>> what they think goes wrong (it is true that `tla'
is not lightning fast,
>> especially without a revlib).
>
> More accurately, tla is dog-slow and consumes
cpu/disk-io like crazy for
> most operations (it's slightly better about network-io
in terms of bytes
> transferred, but goes to town with the worst latency
ever). I still use
> tla, mind you, but my number one complaint is its
insane inefficiency;
> maybe darcs is slower, I dunno.

I think you already mentioned that, according to you, this
inefficiency
was more an implementation issue rather than a design issue,
is that
correct?

I did the following experiment:

  $ strace -o ,,s -e stat,stat64,open tla changes
  [...]
  $ wc -l ,,s 
  7881 ,,s

  $ tla inventory --source |wc -l
  1038
  $ tla inventory --all |wc -l
  2794
  $ find . -name * |wc -l
  6433

(To be fair, the revision in question was already in the
revlib,
otherwise the number of `open ()' calls yielded by `tla
changes' amount
to ~17000 since it has to feed my greedy library.)

In the end, it looks like there is not *so* much I/O
inefficiency due to
the implementation itself.  The inventory mechanism implies
that all
files in the tree must be scanned, and the ID-tagging
mechanism (I'm
using `tagline' here) implies that all the `.arch-ids'
directories plus
all the source files must be scanned (roughly).  Although
more flexible,
Arch's ID-tagging mechanism probably yields more I/O than
"manifests".
Thus, it looks like high disk I/O consumption may be due to
the design
rather than the implementation.

Now, it may be the case that the real performance bottleneck
is CPU
consumption rather than disk I/O, I don't know.

Thanks,
Ludovic.


_______________________________________________
Gnu-arch-users mailing list
Gnu-arch-usersgnu.org

http://lists.gnu.org/mailman/listinfo/gnu-arch-users

GNU arch home page:
http://sav
annah.gnu.org/projects/gnu-arch/
archzoom
user name
2006-10-12 02:20:34
ludovic.courteslaas.fr (Ludovic Courtès) writes:
> I think you already mentioned that, according to you,
this inefficiency
> was more an implementation issue rather than a design
issue, is that
> correct?

I'm not really talking about one specific inefficiency, just
my
impression as a heavy user of tla.  As far as I can figure
it's really
the product of many different inefficiencies together, some
due to
implementation issues such as not caching enough, not
pipelining http
requests, or not doing backward deltas, and some inherent to
arch's
design.

There does seem to be a general "don't care"
attitude about doing system
calls in tla though; during the heyday of arch, there were
many possible
optimizations discussed which could potentially greatly
speed up
operation in "common cases", but were never
implemented for various
reasons (or were implemented and not merged).

As to the specific example you gave, a few thoughts:

The basic pattern of system calls when doing tla changes,
using
taglines, is for each _unchanged_ DIR/FILE in the tree:

   three times (working dir, then twice in revision
library):
      lstat64("DIR/FILE")                      = 0
      open("DIR/.arch-ids/FILE.id", O_RDONLY) = -1
ENOENT (No such file or directory)
      open("DIR/FILE", O_RDONLY)              = 4
      fstat64(4)                              = 0
      lseek(4, -1026, SEEK_END)               = 529
      read(4"...", 1025)                      =
1025
      close(4)                                = 0

   in revlib:
      lstat64("ROOT/DIR/FILE") = 0

   in working dir:
      lstat64("ROOT/DIR/FILE") = 0

So basically an "establish file tagging" pass (not
sure why it does the
revlib twice though), and an "actually compare
files" pass.

The "actually compare files" pass is as probably
as good as you can get
but the "establish file tagging" passes are not. 
I think given the
existance of the ,,inode-sigs files, which are basically
"guesses" at
the tree state, one could in many cases replace the all
those
open/reads/etc with a single stat64 for most files, only
using the
open/read to confirm taglines for files which don't match
any "guess".
Then of course you could cache the stat results so the
"actually compare
files" pass needn't do any system calls at all.  Then
"tla changes"
could approach the speed of a simple diff.

[Only doing stats for most files is a big win in many cases
-- in NFS
obviously, but also in normal linux, even with file content
caching
easing the pain of the reads over time:  there's always the
first time
which must hit the disk, and the memory consumed for caching
file-contents in a large tree is often not trivial at all,
leading to
thrashing.]

-Miles
-- 
We live, as we dream -- alone....


_______________________________________________
Gnu-arch-users mailing list
Gnu-arch-usersgnu.org

http://lists.gnu.org/mailman/listinfo/gnu-arch-users

GNU arch home page:
http://sav
annah.gnu.org/projects/gnu-arch/
[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )