|
List Info
Thread: Mirroring/backing-up a large
|
|
| Mirroring/backing-up a large |

|
2006-04-20 16:31:37 |
kashani wrote:
> Running through the Dell storage page you end up
spending $20k (list)
> for their 12 SATA drive NAS device w/ 3year NBD, dual
PS, etc. RAID 6 it
> up and you've got 5TB usable. I'm sure there are
cheaper options (feel
> free to point them out), but I don't think you're
going to save that
> much over going directly to an iSCSI/NFS SAN with a
second or third tier
> vendor... ie not Netapp or EMC. And you've got to
manage x number of
> boxes, don't get volume management, snapshots, etc,
and still have to
> shuffle data around manually for backups or at least
hot storage.
An example of cheaper nas boxes:
http://www.linuxdevices.com/articles/AT3184179979.html
> How about the question, "Is losing 20% of your
data any better than
> losing 100% of your data?" IMO data loss is data
loss whether it's
> complete or partial. Of course assuming you have
backups restoring 20%
> is easier so it's possible I'm wrong here. I'm still
not buying the
> scenario where managing nine single points of failure
is better than
> managing one. And I think I can eliminate all the
single points in a
> single large system easier then rewriting my
application to round robin
> across 15 data stores that contain partial backups of
each other.
Frankly if you are attempting to manage 9TB of data for
customers and
managing 9 systems scares you, you need to start thinking
about a change in
career choices. Where are these 15 data stores you are
talking about?
The point to distributing the load to more devices is not to
limit loss to
20%. It is to make it easier to back up, restore,
replicate, upgrade,
maintain, administer, provide higher availability,
redundancy, etc...
Google "google file system" to bone up on the
concept...
Other options are investing your entire business on a single
point of
failure (single device), sans which can be clustered/raided,
or using some
service like akamai and not dealing with it at all ;)
--
gentoo-server gentoo.org mailing list
|
|
| Mirroring/backing-up a large |

|
2006-04-21 09:17:22 |
First of all, Thank you for the many reply's and
interesting discussions.
Let me tell you what we concluded, after some tests it was
obvious that
10k files per directory was far better then the 50k we use
now.
The longest rsync on 10K files took 15 seconds, avarage took
about 8
seconds. That is with the 9TB system using jfs and the 4TB
using reiserfs.
We intend to use rsync in a combination with marking the
folders dirty.
This method should scale well enough for us, figures
indicate we might
have 100TB by the end of the year.
I believe there is no ultimate solution for a company like
ours, we are
constantly trying to find better solutions, Some new website
feature
requires other hardware setups for optimality, Bottle necks
are common.
therefor what would now be the ultimate solution, might not
be so in a
few weeks.
But we will keep looking to better solutions for storage,
backup and all
other areas. But we will have to do it as problems arise,
resources are
spread a little thin.
The Just in Time concept has penetrated to systemmanagement.
About spreading the storage over several smaller systems, i
do not think
this is the way to go in our case. We are allready managing
100+
servers, and storage needs will be ever increasing. Where is
the end?
20x 5TB systems by the end of the year?
point is, it is not just the 20 servers you need to
maintain, but all
the clients as wel (about half the servers in our case
adding up to a
1000 nfs mounts that could crash and have proofed to crash
for no
apparent reason over time).
I also really liked the overlay idea, i never knew it
existed .
With regards,
Jos Houtman
--
gentoo-server gentoo.org mailing list
|
|
| Mirroring/backing-up a large |

|
2006-04-21 16:55:10 |
jos houtman wrote:
> First of all, Thank you for the many reply's and
interesting discussions.
>
> Let me tell you what we concluded, after some tests it
was obvious that
> 10k files per directory was far better then the 50k we
use now.
> The longest rsync on 10K files took 15 seconds, avarage
took about 8
> seconds. That is with the 9TB system using jfs and the
4TB using reiserfs.
> We intend to use rsync in a combination with marking
the folders dirty.
>
> This method should scale well enough for us, figures
indicate we might
> have 100TB by the end of the year.
>
> I believe there is no ultimate solution for a company
like ours, we are
> constantly trying to find better solutions, Some new
website feature
> requires other hardware setups for optimality, Bottle
necks are common.
> therefor what would now be the ultimate solution, might
not be so in a
> few weeks.
> But we will keep looking to better solutions for
storage, backup and all
> other areas. But we will have to do it as problems
arise, resources are
> spread a little thin.
> The Just in Time concept has penetrated to
systemmanagement.
There are a few things you can try to make what you've got
faster or at
least get them into your plan for the future.
1. Smaller drives have better seek time. Basically the
whole more
spindles per data thing. Dealt with a very large mail system
in '01 and
the change from 36GB drives to 72GB drives decreased I/O
throughput
enough where we had to swap back to the smaller drives.
500GB SATA
drives look great on paper, but 300GB drives might perform
better.
2. Cache more at your web layer and keep I/O off your
storage. Run all
webservers with the most RAM you an afford, if it's in the
local cache
it's not a storage hit, reverse proxy squid your webservers
and set a
local Squid cache to serve files directly from a purpose
built proxy
with fast local disk, a dedicated cache layer doing the same
thing that
you might redirect to, a media cluster that doesn't have
the overhead
that comes with running PHP, Perl, whatever on the main
site, lots of
interesting things here, well that just makes the dite
faster.
3. If 5% of your content is 90% of your bandwidth then a
content
delivery system makes sense. However uploading a data set in
the TB
range is not cost effective.
4. Smaller disk groups on your storage. An EMC engineer
explained this
one to me. Say you've got sixteen drives in your array.
Rather than a
single RAID 5 set, you make three sets of RAID 5 with a
floating hot
spare. Each set has it's own data so when you look for
fileA you hit
drives 1-5 rather than all fifteen. The smaller data set
means you get
less violent random requests across the cluster, each drive
is more
likely to have a cache hit since you aren't support the
whole data set,
and so on.
5. Rumor is that iSCSI is faster and has less overhead. You
might want
to test both NFS and iSCSI. Also don't believe any of the
nonsense about
needing TOE cards or dedicated HBA cards for either. Just be
able to
dedicate an ether interface to storage.
6. Jumbo Frames. Assuming part of your problem is NFS data
ops
switching to jumbo frames would increase packet sizes from
1500 bytes to
9000 bytes and cut your data ops. I just about doubled
throughput by
using jumbo packets with iSCSI back video streaming service.
However
this only works if you have a dedicated storage LAN and set
all servers,
clients, and switch ports to use jumbo frames, MTU 9000.
Using jumbo
frames on the "going out to the Internet side"
is usually problematic.
Also some switches don't support jumbo frames.
7. Graph the hell out of everything. MRTG, Cacti, Excel,
whatever. I
can not stress this one enough. It's saved my ass a number
of times over
the past ten years. Having graphs of load, RAM usage,
storage, local
I/O, network I/O; Mysql queries, scans, table locks, cache
hits, full
table scans, etc; NFS ata ops, Apache processes, etc makes
troubleshooting a million times easier. And it's great for
getting more
money out of management when you can prove the storage is
doing twice
the work it was doing three months ago.
kashani
--
gentoo-server gentoo.org mailing list
|
|
[1-3]
|
|