List Info

Thread: Hbase mapred classes question




Hbase mapred classes question
country flaguser name
United States
2008-01-09 20:48:32
Hi,

I have two questions for the mapred BuildTableIndex classes
folks.

First, if I have 40 servers with about 32 regions per
server, what would 
I set the mapper and reducers to?

And secondly, is it allowed to add new column values during
the process? 
For example, if I read all rows and the column
"contents:A" (for example 
row123.contents:A), analyze the data and then write out the
result in 
"row123.contents:B", is that OK to do?

Thanks,
Lars

---
Lars George, CTO
WorldLingo


Re: Hbase mapred classes question
country flaguser name
United States
2008-01-10 00:42:24
Lars George wrote:
> Hi,
>
> I have two questions for the mapred BuildTableIndex
classes folks.
>
> First, if I have 40 servers with about 32 regions per
server, what 
> would I set the mapper and reducers to?

Coarsely, make as many maps as you have total regions
(Assuming 
TableInputFormat is in the mix; it splits on table regions)
and make the 
number of reducers equal to the amount of index shards you
want out the 
other end.  For example, you could have just one reducer
produce one 
index for all table content if table is small, etc.

>
> And secondly, is it allowed to add new column values
during the 
> process? For example, if I read all rows and the column
"contents:A" 
> (for example row123.contents:A), analyze the data and
then write out 
> the result in "row123.contents:B", is that OK
to do?

You mean add new content while indexing?  Yes.  If you don't
mind some 
of the added content ending up in the index...

St.Ack


>
> Thanks,
> Lars
>
> ---
> Lars George, CTO
> WorldLingo
>


Re: Hbase mapred classes question
country flaguser name
United States
2008-01-10 08:27:53
Hi Stack,

>> First, if I have 40 servers with about 32 regions
per server, what 
>> would I set the mapper and reducers to?
>
> Coarsely, make as many maps as you have total regions
(Assuming 
> TableInputFormat is in the mix; it splits on table
regions) and make 
> the number of reducers equal to the amount of index
shards you want 
> out the other end.  For example, you could have just
one reducer 
> produce one index for all table content if table is
small, etc.

But if we need to search it at the end while not producing
one index, 
how would you handle this? Would you for example create ten
indexes and 
then use a MultiReader (?) to search across all 10? And this
also means 
obviously that I have to save those 10 indexes locally first
to be able 
to search it, means I need the storage room for them as a
total anyways. 
What advantage does that have? Is there a maximum size
(apart from what 
the OS implies on the filesystem) for Lucene indexes that
would affect that?

>> And secondly, is it allowed to add new column
values during the 
>> process? For example, if I read all rows and the
column "contents:A" 
>> (for example row123.contents:A), analyze the data
and then write out 
>> the result in "row123.contents:B", is
that OK to do?
>
> You mean add new content while indexing?  Yes.  If you
don't mind some 
> of the added content ending up in the index...

I would add the same family but with a different label, and
since the 
job maps a different label, they would not be indexed,
right?

Thanks again for your help Stack!

Best regards,
Lars

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )