[ https://issues.apache.org/jira/browse
/HADOOP-2161?page=com.atlassian.jira.plugin.system.issuetabp
anels:comment-tabpanel#action_12540661 ]
stack commented on HADOOP-2161:
-------------------------------
Actually, I misspoke. getFull scanning memory and all
on-disk files is not 'wrong' -- though it is slow. Here's
why.
Columns can be added willy-nilly. There is no need of an
ALTER TABLE-like statement adding a column as there is in a
traditional RDBMS -- as long as the column belongs to an
existing column family (has an extant column family for a
prefix).
And there is no accounting anywhere in hbase of all the
columns made in any particular family. Since there is no
list of all-columns to consult, the only way hbase can be
sure its found all column mentions is if it scans all data.
This is main difference between get and getFull. Because
you provide a list of columns to fetch to get, it can know
when its done. Not so with getFull.
Is it important to you that this run faster Clint? If so,
there may be some things we can do like keep an integer of
counts of unique column names. getFull would know that when
it had hit the count of all column names, it could return
(Keeping a list of all column names would probably not be
viable since in some schemas it might grow without bound).
> getRow() is orders of magnitudes slower than get(),
even on rows with one column
>
------------------------------------------------------------
--------------------
>
> Key: HADOOP-2161
> URL: htt
ps://issues.apache.org/jira/browse/HADOOP-2161
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Environment: latest from trunk
> Reporter: Clint Morgan
> Attachments: PerformanceEvaluation-patch.txt
>
>
> HTable.getRow(Text) is several orders of magnitude
slower than
> HTable.get(Text, Text), even on rows with a single
column.
> This problem can be observed by the attached patch of
> PerformanceEvaluation.java which changes SequentialRead
to use getRow,
> and prints out the time for each read.
> The test can the be run with:
> bin/hbase org.apache.hadoop.hbase.PerformaeEvaluation
sequentialRead 1
> On my laptop, the original test (using get()) produces
reads on the order of 5-20
> milliseconds. Using getRow(), the reads take 50-2000
ms.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
|