|
List Info
Thread: GFS, logs, and 50+ servers
|
|
| GFS, logs, and 50+ servers |

|
2007-06-08 00:43:06 |
|
I've seen some talk about GFS on the list in the past. I'm currently looking at this as a potential solution to deploying Rails apps to 50+ servers. Basically to take advantage of GFS giving you a single disk/file-system across the servers; to help ensure truly one set of files deployed to all servers, faster deploys, etc.
We currently have about 50+ servers, and that will grow. Our application architecture is SOA, so in reality one rails app won't be on all 50 servers, they'll be grouped, say 10-20 servers per service.
I am currently eyeing a GFS setup where we use a server (per group) as a GFS disk, and GNBD across the machines. So, no SAN, no iSCSI, no fiber, etc. It's what I have available, so balancing the advantage of GFS vs. deploying the code to all machines in a more traditional setup.
The servers in this case are 64bit boxes, with dual cores, and GigE (dual, but for this discussion assume a single one, since we split the net on them, etc.). Also, our application file storage is done using a different infrastructure, so it doesn't play into this. Databases are also on different boxes.
I have not used GFS before, so I'm hoping for some input on some of these questions:
- I presume that for the actual Rails application code, since it gets loaded up once in production mode, that say 20 servers pulling that from a single GNBD/GFS file system server would be no biggy. Correct?
- Logs - this seems to be the danger area to me. Assuming we have "high traffic", and that we do quite a bit of logging (we log a lot of info for metrics and ability to follow requests through the SOA architecture, etc.), I worry about 20 servers all writing to a single log on the one GNBD/GFS server. Valid worry, or? Are there alternatives I should look at for logging in such an environment?
- Thoughts, comments, notes on this approach in general?
-- Chris Bailey chris.bailey gmail.com">chris.bailey gmail.com
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Deploying Rails" group. To post to this group, send email to rubyonrails-deployment googlegroups.com To unsubscribe from this group, send email to rubyonrails-deployment-unsubscribe googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-deployment?hl=en -~----------~----~----~----~------~----~------~--~---
|
| Re: GFS, logs, and 50+ servers |
  United States |
2007-06-08 01:08:03 |
On Jun 7, 2007, at 10:43 PM, Chris Bailey wrote:
> I've seen some talk about GFS on the list in the past.
I'm
> currently looking at this as a potential solution to
deploying
> Rails apps to 50+ servers. Basically to take advantage
of GFS
> giving you a single disk/file-system across the
servers; to help
> ensure truly one set of files deployed to all servers,
faster
> deploys, etc.
>
> We currently have about 50+ servers, and that will
grow. Our
> application architecture is SOA, so in reality one
rails app won't
> be on all 50 servers, they'll be grouped, say 10-20
servers per
> service.
>
> I am currently eyeing a GFS setup where we use a server
(per group)
> as a GFS disk, and GNBD across the machines. So, no
SAN, no iSCSI,
> no fiber, etc. It's what I have available, so
balancing the
> advantage of GFS vs. deploying the code to all machines
in a more
> traditional setup.
>
> The servers in this case are 64bit boxes, with dual
cores, and GigE
> (dual, but for this discussion assume a single one,
since we split
> the net on them, etc.). Also, our application file
storage is done
> using a different infrastructure, so it doesn't play
into this.
> Databases are also on different boxes.
>
> I have not used GFS before, so I'm hoping for some
input on some of
> these questions:
>
> - I presume that for the actual Rails application code,
since it
> gets loaded up once in production mode, that say 20
servers pulling
> that from a single GNBD/GFS file system server would be
no biggy.
> Correct?
Yeah it's no biggy.
>
> - Logs - this seems to be the danger area to me.
Assuming we have
> "high traffic", and that we do quite a bit
of logging (we log a
> lot of info for metrics and ability to follow requests
through the
> SOA architecture, etc.), I worry about 20 servers all
writing to a
> single log on the one GNBD/GFS server. Valid worry,
or? Are there
> alternatives I should look at for logging in such an
environment?
GFS has something called context dependant symlink. This
lets you
make symlinks that resolve to a different path based on
stuff like
hostname. So you setup a set of directories names after all
the
hostnames in the cluster. THen make log a symlink to hostname,
observe:
ey00-s00070 ~ # cd /data/ey/shared/
ey00-s00070 shared # ls -lsa
total 32
4 drwxrwxr-x 7 ez ez 3864 Dec 3 2006 .
4 drwxr-xr-x 4 ez ez 3864 Dec 7 15:14 ..
4 drwxrwxrwx 2 ez ez 3864 Jun 1 13:30 ey00-s00070
4 drwxrwxrwx 2 ez ez 3864 Jun 4 22:07 ey00-s00071
4 lrwxrwxrwx 1 ez ez 9 Dec 3 2006 log -> hostname
See how log is a symlink to hostname? After you make a
directory
namesd after all your hostnames that share a filesystem, you
do this
to link them:
$ ln -s hostname log
>
> - Thoughts, comments, notes on this approach in
general?
>
I have many many nodes running sharing GFS filesystems and
it works
great in general, much more robust then NFS. I do it all off
of a SAN
network though so I have no experience with the way you are
trying to
do it with no san.
Cheers-
-- Ezra Zygmuntowicz
-- Lead Rails Evangelist
-- ez engineyard.com
-- Engine Yard, Serious Rails Hosting
-- (866) 518-YARD (9273)
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Deploying Rails" group.
To post to this group, send email to
rubyonrails-deployment googlegroups.com
To unsubscribe from this group, send email to
rubyonrails-deployment-unsubscribe googlegroups.com
For more options, visit this group at http://groups.google.com/group/rubyonrails-deployment
?hl=en
-~----------~----~----~----~------~----~------~--~---
|
|
| Re: GFS, logs, and 50+ servers |

|
2007-06-08 01:33:20 |
|
Thanks Ezra. Actually, I believe it was your talk at RailsConf that inspired this. So thanks again! I suspect that, for the short term, with only say 10-20 servers per GFS disk it may be ok. If that all works out and we scale up then I'm sure we'd go SAN. Thanks for the info on the contextual symlinks, that's very cool.
How does GFS handle immense volume, in terms of say you have these 20 servers writing their logs to a single "disk", and let's say you're being slashdotted/dug(digged?), so there's tons of logging (I guess enough to overwhelm GigE, but I haven't calculated to see if that's realistic to overwhelm in such a case), how does GFS behave?
Obviously I'll have to test all this, but hoping to short circuit any insurmountable problems or bad usage, etc.
On 6/7/07, Ezra Zygmuntowicz
< ezmobius gmail.com">ezmobius gmail.com> wrote:
On Jun 7, 2007, at 10:43 PM, Chris Bailey wrote:
> I've seen some talk about GFS on the list in the past. I9;m > currently looking at this as a potential solution to deploying > Rails apps to 50+ servers. Basically to take advantage of GFS
> giving you a single disk/file-system across the servers; to help > ensure truly one set of files deployed to all servers, faster > deploys, etc. > > We currently have about 50+ servers, and that will grow. Our
> application architecture is SOA, so in reality one rails app won't > be on all 50 servers, they'll be grouped, say 10-20 servers per > service. > > I am currently eyeing a GFS setup where we use a server (per group)
> as a GFS disk, and GNBD across the machines. So, no SAN, no iSCSI, > no fiber, etc. It9;s what I have available, so balancing the > advantage of GFS vs. deploying the code to all machines in a more
> traditional setup. > > The servers in this case are 64bit boxes, with dual cores, and GigE > (dual, but for this discussion assume a single one, since we split > the net on them, etc.). Also, our application file storage is done
> using a different infrastructure, so it doesn't play into this. > Databases are also on different boxes. > > I have not used GFS before, so I'm hoping for some input on some of > these questions:
> > - I presume that for the actual Rails application code, since it > gets loaded up once in production mode, that say 20 servers pulling > that from a single GNBD/GFS file system server would be no biggy.
> Correct?
Yeah it's no biggy.
> > - Logs - this seems to be the danger area to me. Assuming we have > "high traffic", and that we do quite a bit of logging (we log a
> lot of info for metrics and ability to follow requests through the > SOA architecture, etc.), I worry about 20 servers all writing to a > single log on the one GNBD/GFS server. Valid worry, or? Are there
> alternatives I should look at for logging in such an environment?
GFS has something called context dependant symlink. This lets you make symlinks that resolve to a different path based on stuff like
hostname. So you setup a set of directories names after all the hostnames in the cluster. THen make log a symlink to hostname, observe:
ey00-s00070 ~ # cd /data/ey/shared/ ey00-s00070 shared # ls -lsa
total 32 4 drwxrwxr-x 7 ez ez 3864 Dec 3 2006 . 4 drwxr-xr-x 4 ez ez 3864 Dec 7 15:14 .. 4 drwxrwxrwx 2 ez ez 3864 Jun 1 13:30 ey00-s00070 4 drwxrwxrwx 2 ez ez 3864 Jun 4 22:07 ey00-s00071 4 lrwxrwxrwx 1 ez ez 9 Dec 3 2006 log -> hostname
See how log is a symlink to hostname? After you make a directory namesd after all your hostnames that share a filesystem, you do this to link them:
$ ln -s hostname log
> > - Thoughts, comments, notes on this approach in general?
>
I have many many nodes running sharing GFS filesystems and it works great in general, much more robust then NFS. I do it all off of a SAN network though so I have no experience with the way you are trying to
do it with no san.
Cheers-
-- Ezra Zygmuntowicz -- Lead Rails Evangelist -- ez engineyard.com">ez engineyard.com -- Engine Yard, Serious Rails Hosting -- (866) 518-YARD (9273)
-- Chris Bailey chris.bailey gmail.com">chris.bailey gmail.com
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Deploying Rails" group. To post to this group, send email to rubyonrails-deployment googlegroups.com To unsubscribe from this group, send email to rubyonrails-deployment-unsubscribe googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-deployment?hl=en -~----------~----~----~----~------~----~------~--~---
|
| Re: GFS, logs, and 50+ servers |
  United States |
2007-06-08 04:06:18 |
On Jun 7, 11:08 pm, Ezra Zygmuntowicz <ezmob... gmail.com> wrote:
> On Jun 7, 2007, at 10:43 PM, Chris Bailey wrote:
> > The servers in this case are 64bit boxes, with
dual cores, and GigE
> > (dual, but for this discussion assume a single
one, since we split
> > the net on them, etc.). Also, our application
file storage is done
> > using a different infrastructure, so it doesn't
play into this.
> > Databases are also on different boxes.
>
> > I have not used GFS before, so I'm hoping for some
input on some of
> > these questions:
>
> > - I presume that for the actual Rails application
code, since it
> > gets loaded up once in production mode, that say
20 servers pulling
> > that from a single GNBD/GFS file system server
would be no biggy.
> > Correct?
>
> Yeah it's no biggy.
Hate to disagree with one of our own, but you'll find that
the RHCS
has a practical limit of 16 machines per cluster, unless
you're using
the GULM, which is no longer recommended.
At Engine Yard we sidestep this limitation by utilizing a
two-tiered
cluster structure, one for the nodes in the cluster, and one
for each
customer environment.
> > - Logs - this seems to be the danger area to me.
Assuming we have
> > "high traffic", and that we do quite a
bit of logging (we log a
> > lot of info for metrics and ability to follow
requests through the
> > SOA architecture, etc.), I worry about 20 servers
all writing to a
> > single log on the one GNBD/GFS server. Valid
worry, or? Are there
> > alternatives I should look at for logging in such
an environment?
I'd recommend aggregating the logs via something akin to
syslog or
something based upon the wonderful but underutilized Spread
library.
Far simpler and more robust.
GFS is great, don't get me wrong, but I can absolutely
guarantee you
that you'll be disappointed if you put 50 machines into a
single RHCS
cluster.
--
-- Tom Mornini, CTO
-- Engine Yard, Inc.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Deploying Rails" group.
To post to this group, send email to
rubyonrails-deployment googlegroups.com
To unsubscribe from this group, send email to
rubyonrails-deployment-unsubscribe googlegroups.com
For more options, visit this group at http://groups.google.com/group/rubyonrails-deployment
?hl=en
-~----------~----~----~----~------~----~------~--~---
|
|
| Re: GFS, logs, and 50+ servers |
  United States |
2007-06-08 04:48:31 |
I was also going to recommend syslog or something similar.
There was
a post recently about it here:
http://toolmantim.com/article/20
07/6/6/logging_rails_to_syslog_with_sysloglogger
On Jun 8, 11:06 am, "tmorn... engineyard.com"
<tmorn... gmail.com>
wrote:
> On Jun 7, 11:08 pm, Ezra Zygmuntowicz <ezmob... gmail.com> wrote:
>
>
>
> > On Jun 7, 2007, at 10:43 PM, Chris Bailey wrote:
> > > The servers in this case are 64bit boxes,
with dual cores, and GigE
> > > (dual, but for this discussion assume a
single one, since we split
> > > the net on them, etc.). Also, our
application file storage is done
> > > using a different infrastructure, so it
doesn't play into this.
> > > Databases are also on different boxes.
>
> > > I have not used GFS before, so I'm hoping for
some input on some of
> > > these questions:
>
> > > - I presume that for the actual Rails
application code, since it
> > > gets loaded up once in production mode, that
say 20 servers pulling
> > > that from a single GNBD/GFS file system
server would be no biggy.
> > > Correct?
>
> > Yeah it's no biggy.
>
> Hate to disagree with one of our own, but you'll find
that the RHCS
> has a practical limit of 16 machines per cluster,
unless you're using
> the GULM, which is no longer recommended.
>
> At Engine Yard we sidestep this limitation by utilizing
a two-tiered
> cluster structure, one for the nodes in the cluster,
and one for each
> customer environment.
>
> > > - Logs - this seems to be the danger area to
me. Assuming we have
> > > "high traffic", and that we do
quite a bit of logging (we log a
> > > lot of info for metrics and ability to follow
requests through the
> > > SOA architecture, etc.), I worry about 20
servers all writing to a
> > > single log on the one GNBD/GFS server. Valid
worry, or? Are there
> > > alternatives I should look at for logging in
such an environment?
>
> I'd recommend aggregating the logs via something akin
to syslog or
> something based upon the wonderful but underutilized
Spread library.
>
> Far simpler and more robust.
>
> GFS is great, don't get me wrong, but I can absolutely
guarantee you
> that you'll be disappointed if you put 50 machines into
a single RHCS
> cluster.
>
> --
> -- Tom Mornini, CTO
> -- Engine Yard, Inc.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Deploying Rails" group.
To post to this group, send email to
rubyonrails-deployment googlegroups.com
To unsubscribe from this group, send email to
rubyonrails-deployment-unsubscribe googlegroups.com
For more options, visit this group at http://groups.google.com/group/rubyonrails-deployment
?hl=en
-~----------~----~----~----~------~----~------~--~---
|
|
| Re: GFS, logs, and 50+ servers |

|
2007-06-08 18:33:12 |
|
Yes, good point Tom. I had actually meant to switch to syslog logging, but hadn't gotten to it yet. I've heard spread is excellent, and we're beginning to look at it for a few other things as well, so we'll see.
As for servers, the 16 limit is interesting. Or rather, somewhat confounding. I guess they expect you to move to a different system if you are managing a lot more servers as part of a cluster, or that you'd do direct attached storage, etc.?
On 6/8/07, tmornini engineyard.com">tmornini engineyard.com < tmornini gmail.com">tmornini gmail.com> wrote:
On Jun 7, 11:08 pm, Ezra Zygmuntowicz < ezmob... gmail.com">ezmob... gmail.com
> wrote:
> On Jun 7, 2007, at 10:43 PM, Chris Bailey wrote:
> > The servers in this case are 64bit boxes, with dual cores, and GigE > > (dual, but for this discussion assume a single one, since we split
> > the net on them, etc.). Also, our application file storage is done > > using a different infrastructure, so it doesn't play into this. > > Databases are also on different boxes. >
> > I have not used GFS before, so I'm hoping for some input on some of > > these questions: > > > - I presume that for the actual Rails application code, since it > > gets loaded up once in production mode, that say 20 servers pulling
> > that from a single GNBD/GFS file system server would be no biggy. > > Correct? > > Yeah it's no biggy.
Hate to disagree with one of our own, but you'll find that the RHCS
has a practical limit of 16 machines per cluster, unless you're using the GULM, which is no longer recommended. 
At Engine Yard we sidestep this limitation by utilizing a two-tiered cluster structure, one for the nodes in the cluster, and one for each
customer environment.
> > - Logs - this seems to be the danger area to me. Assuming we have > > "high traffic", and that we do quite a bit of logging (we log a > > lot of info for metrics and ability to follow requests through the
> > SOA architecture, etc.), I worry about 20 servers all writing to a > > single log on the one GNBD/GFS server. Valid worry, or? Are there > > alternatives I should look at for logging in such an environment?
I';d recommend aggregating the logs via something akin to syslog or something based upon the wonderful but underutilized Spread library.
Far simpler and more robust.
GFS is great, don't get me wrong, but I can absolutely guarantee you
that you'll be disappointed if you put 50 machines into a single RHCS cluster.
-- -- Tom Mornini, CTO -- Engine Yard, Inc.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Deploying Rails" group. To post to this group, send email to rubyonrails-deployment googlegroups.com To unsubscribe from this group, send email to rubyonrails-deployment-unsubscribe googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-deployment?hl=en -~----------~----~----~----~------~----~------~--~---
|
[1-6]
|
|