Thank you for your replying .
> The simplest answer to your situation would be to split
your
> application up into separate controllers. If your
> application looks like:
>
> Controller: PR1, PR2, PR3
>
> and PR3 needs the corpus-wide results from 1 and 2,
then you
> create two
> controllers:
>
> ControllerA: PR1, PR2
> ControllerB: PR3
>
> and run these two controllers over the corpus one after
the other.
Yes,but some basic calculation needs corpus wide
accumlation.
For example the TF/IDF PR must be separated into PR2(DF
calculation)
PR3(TF/IDF calculation).
I think it is better that parallel controller or mode would
integrated in gate classes.
> I suppose a serial controller that runs PR1 on every
> document, then PR2 on ever document, etc. could be
useful in
> some scenarios, but when you're processing a corpus in
a
> datastore this would generate huge overhead in having
to load
> and save each document once per PR rather than just
once per
> application.
Yes, document by document processing is better high speed
processng.
But in above case we can't avoid the 2 phase calculation in
any configulations.
> Truly parallel processing - running the same PR
instance over
> two documents at the same time - isn't possible in the
GATE
> architecture, as processing resources are (by design)
not
> thread safe. You could get a limited amount of
parallelism
> by creating two copies of ControllerA and passing half
the
> corpus to each one, for example. "Save
application state" is
> very handy for this purpose.
I didn't mean truly parallel processing.
Of cource I hope multi thread manager in GATE because
we are about to have truly multi-core CPUs.
> Ian
>
> --
> Ian Roberts | Department of Computer
Science
> i.roberts dcs.shef.ac.uk | University of Sheffield,
UK
>
|