|
List Info
Thread: examples of (large) Gentoo clusters
|
|
| examples of (large) Gentoo clusters |

|
2006-12-07 13:12:57 |
From: Bryan Green <bgreen nas.nasa.gov>
Date: Wed, 06 Dec 2006 16:33:12 -0800
"John R. Dunning" writes:
> From: "Daniel van Ham Colchete"
<daniel.colchete gmail.com>
> Date: Tue, 5 Dec 2006 19:15:49 -0200
>
> Question: would you use Lustre 1.6 now or you
would wait until the
> official version is out?
>
> If I had to ship today, I'd probably ship the 1.6b5
code. I find lustre 1.4
> much more of a headache to configure and manage.
Thankfully, I don't have to
> ship today; I expect by the time I do, cfs will
have released the real 1.6
> code.
It is encouraging to hear that you are willing to base a
product on Lustre
1.6.
There are problems either way, but based on my experience, I
believe 1.6 is a
better choice, at least for the kind of situation I'm
expecting to see.
That's based partly on the fact that in my testing I've seen
a pretty small
quotient of out-and-out bugs (though there are a couple
which are pretty
annoying) and partly on the fact that configuration and
management-wise, 1.6
is way easier to deal with. Part of what I expect will be
happening in
deployments is to be building lustrefs's on the fly, under
control of some
kind of configurator thingie. For that kind of task, 1.4
would be much more
difficult to deal with.
We have a test gentoo cluster system which runs with lustre
as its rootfs. It
essentially "just works". I've run numerous
benchmarks and tests on it,
including bonnie, iozone, ltp, and assorted bits of
application code; for the
most part it's been trouble-free, and the performance is
generally pretty
good. There are a few areas where, due to the properties of
lustre, things
run unexpectedly slow, but for my purposes, they're all
things that can be
lived with. What I conclude from all that is that it's good
enough for me to
consider shipping it as part of a product while still being
able to sleep at
night :-}
Are you by any chance willing to share some of your
knowledge about
installing Lustre on Gentoo with others?
Sure.
Are you worrying about the kernel patching and other
software installation
issues, or about how to set up the fs itself once you've got
the software
together?
Very briefly, the kernel-patching issue is an ongoing
headache. Lustre
patches vfs in non-trivial ways. Unfortunately, everybody
else does too. It
becomes a fairly ugly patch-merging problem. If you want, I
can detail the
process I've settled on for coming up with a kernel
patchset, but you won't
like it. There are similar issues around ldiskfs and other
bits, but they're
simpler, at least by comparison.
Once the software is installed, configuring the fs goes
pretty much by the
book. mkfs.lustre, mount -t lustre, lfs, and lctl are your
friends. You'll
have some work to do deciding what your architecture is, in
terms of how many
OSTs of what type, what's the interconnect topology which
will get you the
best throughput etc, but there aren't really any landmines
in there. I've
only worked with the failover stuff a small amount, so can't
really say a lot
about that, but the time I did play with it, it seemed to
work as advertised.
If you are looking for more detail on something specific,
I'm happy to say
what little I know about it.
Perhaps I could make
self-support an option, if it looked like it would be
reliable.
Well, obviously, you should test the bejeezus out of your
configuration before
you declare open season on it. So far I haven't found
reason to believe
lustre is substantially worse than any of the other
open-source software
packages which are used in production situations. I think
that constitutes a
qualified "yes" :-}
--
gentoo-cluster gentoo.org mailing list
|
|
| examples of (large) Gentoo clusters |

|
2006-12-08 03:56:46 |
"John R. Dunning" writes:
> From: Bryan Green <bgreen nas.nasa.gov>
>
> It is encouraging to hear that you are willing to
base a product on Lustre
> 1.6.
>
> There are problems either way, but based on my
experience, I believe 1.6 is a
> better choice, at least for the kind of situation I'm
expecting to see.
> That's based partly on the fact that in my testing I've
seen a pretty small
> quotient of out-and-out bugs (though there are a couple
which are pretty
> annoying) and partly on the fact that configuration and
management-wise, 1.6
> is way easier to deal with. Part of what I expect will
be happening in
> deployments is to be building lustrefs's on the fly,
under control of some
> kind of configurator thingie. For that kind of task,
1.4 would be much more
> difficult to deal with.
>
From my limited experience with 1.6, and even more limited
experience with 1.4, I
wholeheartedly agree with your assessment. Version 1.4
looks like a real headache
to configure. By comparison, 'mount -t lustre' pretty much
characterizes the
simplicity of 1.6.
>
> Are you by any chance willing to share some of your
knowledge about
> installing Lustre on Gentoo with others?
>
> Sure.
>
> Are you worrying about the kernel patching and other
software installation
> issues, or about how to set up the fs itself once
you've got the software
> together?
Kernel patching. For software installation, the lustre
ebuild that was put on
this list recently seemed to do the trick for me, and setup
was pretty easy.
I was able to patch the kernel, but the server was somewhat
unstable. Actually,
my memory is hazy. I used the 'lustre-sources' ebuild,
which effectively packaged
up the patches. It was a 2.6.15 kernel. I also tried to
make a custom kernel for
lustre 1.4, but ultimately hit too many roadblocks. I did
learn a bit about how
to use 'quilt' though.
>
> Very briefly, the kernel-patching issue is an ongoing
headache. Lustre
> patches vfs in non-trivial ways. Unfortunately,
everybody else does too. It
> becomes a fairly ugly patch-merging problem. If you
want, I can detail the
> process I've settled on for coming up with a kernel
patchset, but you won't
> like it. There are similar issues around ldiskfs and
other bits, but they're
> simpler, at least by comparison.
I'd be interested in some of the details - off-list if that
is more appropriate,
though it might be of interest to others on the list as
well. Once you download a
1.6 beta, how do you produce a kernel for Gentoo? Do you
patch a gentoo-sources
kernel, a vanilla-sources kernel, or something else? The
ideal would perhaps be
to have a 'lustre-sources' ebuild in the gentoo-science
overlay.
>
> Perhaps I could make
> self-support an option, if it looked like it would
be reliable.
>
> Well, obviously, you should test the bejeezus out of
your configuration before
> you declare open season on it. So far I haven't found
reason to believe
> lustre is substantially worse than any of the other
open-source software
> packages which are used in production situations. I
think that constitutes a
> qualified "yes" :-}
Are you considering getting support from CFS at some point?
Sorry, you don't have
to answer if that is a sensitive question. But part of this
thread has been the
topic of encouraging CFS to support Gentoo. Interestingly,
my colleague, who is
in charge of installing Lustre (1.4) on our test system, is
talking to CFS about
supporting a vanilla kernel configuration. The reason? We
can't make the system
stable with a SLES kernel. It was stable for a long time
with Gentoo. Now they
seem to have gotten it stable with SLES plus a vanilla
2.6.19 kernel (which of
course does not have the Lustre patches). So they want Suse
to provide a newer
SLES kernel with the Lustre patches, and CFS to support that
configuration.
-bryan
--
gentoo-cluster gentoo.org mailing list
|
|
[1-2]
|
|