hi...
i'm playing around with an app that parses websites and
extracts
information, returning certain information to my system.
my primary issue has to do with how i might architect the
system to place
the information into my database. i'm using/testing with
mysql. my question
has to do with how to scale this kind of system. if i have a
server, that's
spawing 100's of apps with each app firing off a web/page
connection to a
web server, i'm going to have more than enough connections
coming back to
swamp out writing to a mysql server...
so how do other apps/crawlers handle this kind of
situation... basically,
i'm trying to figure out how to implement some kind of
scaling funneling
process/mechanism to allow me to have 10-20 servers crawling
the specific
sites, and returning the information to a database...
any thoughts/comments/pointers on how to deal with this will
be helpful!!
thanks
-bruce
|