|
List Info
Thread: Re: ANNOUNCE: DBMail 2.2.2 released
|
|
| Re: ANNOUNCE: DBMail 2.2.2 released |

|
2007-02-05 02:28:01 |
|
*caution* Long rambling post ahead best taken with an ice cold ginger
beer. (and possibly some salt)
localhost" type="cite">
The move to a truly threaded, scalable and HA architecture is a big
change. I don't think its going to be a standard upgrade for most
people. If people want true HA then the level of funkyness is going to
go up pretty drastically. Heck its almost a fork(spoon). Dbmail 2 for
nice small instillations, dbmail 3 for big things. Otherwise you might
be trying to cover too many bases. Something big and scalable probably
wont be easy peasy to install and configure for your average home
user.
What funkyness are you thinking there might be?
I challenge your statement that DBMail is currently for "nice small
inst[a]llations" because there are people using it for very large
systems already. I want to design for and target that level of use
rather than just happen to be able to handle it by accident.
I mean dbmail is (on debian/ubuntu at least) really easy to install and
setup.
unless you start packaging up the stack of applications you depend on,
configured as you depend on them you can run into the realm of
difficulty. Asterisk is a decent example of this i think. Many things
have to be "just so" for it to really work well. Asterisknow however (a
distro for asterisk) packages it all up nicley, drop the image onto the
server boot it and away you go. I am having problems getting its Xen
version running but VMware version worked well.
localhost" type="cite">
Personally I see some "top level daemon" managing the whole thing and
talking via IP to the various front ends and databases. It also being
responsible for managing redundancy of data amongst a pool of
databases and the like.
Yes, no. Some kind of cluster manager may be an inevitable design
decision. Built-in redundancy not so much - I'd rather rely on the
database to do this for us. With respect to pooling databases (no
relation whatsoever to pooling database connections, btw), I don't
presume that I know enough to partition what data goes to which database
server better than the experts in database design.
I was refering to a list of things like "User A is on servers X and Y".
If for some reason you wish to remove server Y from the system the
manager can move the data off that server and into another. Relying on
the database for the clustering can lead to issues with scalability and
reliability. If your application is "cluster aware" in itself then you
can do crazy things like have half your servers mysql and the other
half postgress.
That in it self would have HA weenies drooling i'd think ;->
localhost" type="cite">
If someone hits the hard limits of how much data can go into one of the
database servers we use, then they're definitely doing something where
they can afford to spend the time and money needed to beef up the
software to work around the rather huge size limits in MySQL and
PostgreSQL.
You can scale those out a fair bit but mysql cluster (at the moment) is
in ram tables only, which sucks donkey balls. (i run mysql for my db
btw so.... yeah... sucks to be me in that regard).
I feel (from the armchair, or in this case plastic outdoor dining
chair) that its best if the app knows whats happening. You can still
use mysql and postgres and cluster things up the wazoo if you feel like
doing that. But its simpler (i think) from the end user POV to treat
the databases as the "ideal raid hard drives", want more storage? add
another box. Want more performance? add another box. Want more X ? add
another box. Without having to muck about with setting up cluster stuff
in a database.
localhost" type="cite">
Basically meaning scaling is copy over the Xen image, boot it and tell
the controller that its allowed to use that.
Neat idea! I think it runs exactly contrary to your assertion that it
would make everything very complex. If the cluster manager directed
other cluster members configurations, the tough part would be setting up
cluster membership. Probably involving some public key encryption to
make sure rogue nodes don't join the cluster. That's on the hard side,
but would be fun to work on.
Aaron
The hard part is in making the complex stuff simple ;->
If the system can be made to install and setup easily with lots of
managment goodness (IE I don't have to do anything) then super ;->
Thinking for the managment app C might not be the best thing for it,
perhaps python or some such, as each individual message won't need to
hit it or will only need to do so in a trivial way. And the logic is
likley to become scary ;->.
I have been pondering ways of achieving true high availability, where a
server failure causes 0 disruption to service, even if you are half way
through recieving an email. Though that requires some additional
funkyness (all traffic into the cluster must be broadcast/multicast and
a whole bunch of other dren)
In my "ideal" system the setup is something like this.
Email recieved by a front end server (perhaps by broadcast traffic?
each server can pick which "conversations" its a part of and ignore the
others, it will scale well to a point and much farther than any current
system before you need stuff like dns round robins and proxies (though
proxy would be my 2nd choice)) that server checks its list (in memory)
of all users, if we are accepting the email then it can be passed on to
spam checks and the like.
Email then hits our "stuff it into db" section, that looks at where
that users information is stored. That app sticks the email into the
databases that need it. (so your A grade customers have 3 copies, your
B customers 2 copies and your C customers just the 1). The exact same
entry, so all ID numbers right through the db are the same.
The "Stuff it in the database" app will then notify (directally) any
imap servers that have that user registered to them that a new email
has arrived. At this point the "stuff it in the db" app is finished
with the email.
IMAP servers, are pretty similar to whats around now, difference being
that when a connection comes in, it checks with the manager which
database it should connect to for that user as a part of their
authentication. (manager picks servers based on load). If the server
tries to do a query that fails then it will try again on any other
servers that have that users data (keeping in mind that all ID's are
unique). So if a db server dies or goes offline the user doesn't even
notice. What would be nice is if the managment node could direct the
incoming imap connections to the least loaded server (again i would
like to achieve this with all machines in the pool having the same IP
and just ignoring connections they don't need)
The managment node is responsible for load balancing the servers it has
in its pool. Attach a 386, and it'll get 4 users in its database. The
managment node dynamically manages the users. So as users are added and
their usage patterns become established the load can be moved around
the servers. eg
New User johnny.
The system is pretty busy so he gets put on the least loaded server.
Johnny turns out to be a super power user with assloads of searches and
the like.
Managment node moves some of the less intensive users off that server
and onto others.
All this moving happens live as there is (always) 2 copies of the users
data in the databases.
I can see a system like that scaling as far as you could want it to,
Without the need for funkyness in terms of admin. While at the same
time not *requiring* loads of hardware. On an embedded tiny system
though it is going to run slower than dbmail does now. Theres a bunch
of stuff there your average joe isn't going to use.
A "Corperate" mail system though could be setup to be pretty HA and
high performance with just 2 boxes. The scaling being pretty linear and
all.
The hardest thing is for all that stuff to work out of the box, The
best way to get people to install it at work is make it easy to install
at home. Thats why I use ubuntu.
apt-get install dbmail-hardcore
BTW wrt threads and the like, I prefer threads = processors * 2 type of
approach. Event driven state machines seem to be the most efficient way
of doing things when you get really loaded. You don't have all that
switching between threads and the overhead of hanging on to them all.
To my mind it should make the coding simpler because once you have a
state machine which will run the imap protocol, it should scale pretty
much linearly without needing to worry too much about IPC and the like.
(dbmail 7.9 perhaps?)
|
[1]
|
|
|
about | contact Other archives ( Real Estate discussion Medical topics )
|