On Jan 15, 2008, at 13:57, Ted Dunning wrote:
> This is happening because you have many reducers
running, only one
> of which
> gets any data.
>
> Since you have combiners, this probably isn't a
problem. That reducer
> should only get as many records as you have maps. It
would be a
> problem if
> your reducer were getting lots of input records.
>
> You can avoid this by setting the number of reducers to
1.
Thanks!
I also have another, perhaps stupid question. I am trying to
write a
task which will produce a list of records with top N values.
My idea
is to write a reducer class which iterates through records
keeping N
with biggest values and spits them out. I can use it as both
a
combiner and reducer class. This way each MAP task will
produce N
records and I will set up single reduce task which will
combine them
into final N records. (N is reasonably small, like 10).
However to do
this I need to postpone issuing output until I am done
processing all
records. I can try to do this in close() method, but I do
not have an
OutputCollector there. I guess I can write special output
collector,
but it seems a bit artificial.
Probably I am missing something obvious and there is a
common and easy
way to do this?
Thanks!
Sincerely,
Vadim
|