List Info

Thread: Multiple Job Jar files?




Multiple Job Jar files?
country flaguser name
United States
2007-07-18 19:13:42
Is it possible to have multiple job jar files being
submitted to hadoop 
at once?  If not, is this a feature that might be useful?

I can see this being useful for custom Nutch development,
having a nutch 
job.jar and a custom.job.jar file.

Dennis Kubes

RE: Multiple Job Jar files?
country flaguser name
United States
2007-07-18 21:19:24
Yes, definitely. There is a JIRA opened precisely for that:
htt
ps://issues.apache.org/jira/browse/HADOOP-1622

Runping Qi


> -----Original Message-----
> From: Dennis Kubes [mailto:kubesapache.org]
> Sent: Wednesday, July 18, 2007 5:14 PM
> To: hadoop-userlucene.apache.org
> Subject: Multiple Job Jar files?
> 
> Is it possible to have multiple job jar files being
submitted to hadoop
> at once?  If not, is this a feature that might be
useful?
> 
> I can see this being useful for custom Nutch
development, having a nutch
> job.jar and a custom.job.jar file.
> 
> Dennis Kubes


Re: Multiple Job Jar files?
country flaguser name
United States
2007-07-18 23:20:39
Ok, I read the JIRA and have been hacking away at this for
the past 
couple of hours.  I have a workable patch for that I just
need to test. 
  It follows what the JIRA proposed to create a master
job.jar file from 
multiple job jar files passed.  I will test and post
tomorrow morning.

Dennis Kubes

Runping Qi wrote:
> Yes, definitely. There is a JIRA opened precisely for
that:
> htt
ps://issues.apache.org/jira/browse/HADOOP-1622
> 
> Runping Qi
> 
> 
>> -----Original Message-----
>> From: Dennis Kubes [mailto:kubesapache.org]
>> Sent: Wednesday, July 18, 2007 5:14 PM
>> To: hadoop-userlucene.apache.org
>> Subject: Multiple Job Jar files?
>>
>> Is it possible to have multiple job jar files being
submitted to hadoop
>> at once?  If not, is this a feature that might be
useful?
>>
>> I can see this being useful for custom Nutch
development, having a nutch
>> job.jar and a custom.job.jar file.
>>
>> Dennis Kubes
> 

Re: Multiple Job Jar files?
country flaguser name
United States
2007-07-19 22:19:39
Ok, I have completed the testing and it is working good. 
One problem 
though.  I noticed that we are using a distributed cache for
the job 
files.  If I am creating new job jar files on the fly, but
still copying 
them to the job.jar location, how is this affected by
distributed caching?

Dennis Kubes

Dennis Kubes wrote:
> Ok, I read the JIRA and have been hacking away at this
for the past 
> couple of hours.  I have a workable patch for that I
just need to test. 
>  It follows what the JIRA proposed to create a master
job.jar file from 
> multiple job jar files passed.  I will test and post
tomorrow morning.
> 
> Dennis Kubes
> 
> Runping Qi wrote:
>> Yes, definitely. There is a JIRA opened precisely
for that:
>> htt
ps://issues.apache.org/jira/browse/HADOOP-1622
>>
>> Runping Qi
>>
>>
>>> -----Original Message-----
>>> From: Dennis Kubes [mailto:kubesapache.org]
>>> Sent: Wednesday, July 18, 2007 5:14 PM
>>> To: hadoop-userlucene.apache.org
>>> Subject: Multiple Job Jar files?
>>>
>>> Is it possible to have multiple job jar files
being submitted to hadoop
>>> at once?  If not, is this a feature that might
be useful?
>>>
>>> I can see this being useful for custom Nutch
development, having a nutch
>>> job.jar and a custom.job.jar file.
>>>
>>> Dennis Kubes
>>

Re: Multiple Job Jar files?
country flaguser name
United States
2007-07-20 11:58:48
Dennis Kubes wrote:
> Ok, I have completed the testing and it is working
good.  One problem 
> though.  I noticed that we are using a distributed
cache for the job 
> files.  If I am creating new job jar files on the fly,
but still copying 
> them to the job.jar location, how is this affected by
distributed caching?

The cache will still be effective.  Typically, in the course
of a job, 
multiple map tasks and multiple reduce tasks run on each
host.  The 
cache retrieves just a single copy of the job's jar for all
of these tasks.

However, with a new jar per job, the cache will not be
effective across 
jobs.  But this is not nearly as critical as caching across
tasks in a 
job, since there are typically thousands of tasks per job. 
One could 
attempt to optimize across jobs, but I think that would be
overkill, 
especially for the first version of this feature.

Doug

[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )