Dennis Kubes wrote:
> Ok, I have completed the testing and it is working
good. One problem
> though. I noticed that we are using a distributed
cache for the job
> files. If I am creating new job jar files on the fly,
but still copying
> them to the job.jar location, how is this affected by
distributed caching?
The cache will still be effective. Typically, in the course
of a job,
multiple map tasks and multiple reduce tasks run on each
host. The
cache retrieves just a single copy of the job's jar for all
of these tasks.
However, with a new jar per job, the cache will not be
effective across
jobs. But this is not nearly as critical as caching across
tasks in a
job, since there are typically thousands of tasks per job.
One could
attempt to optimize across jobs, but I think that would be
overkill,
especially for the first version of this feature.
Doug
|