B.1. General | |
When should I use HBase? | |
Anybody can download and give HBase a spin, even on a laptop. The scope of this answer is when would it be best to use HBase in a real deployment. First, make sure you have enough hardware. Even HDFS doesn't do well with anything less than 5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode. Second, make sure you have enough data. HBase isn't suitable for every problem. If you have hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few thousand/million rows, then using a traditional RDBMS might be a better choice due to the fact that all of your data might wind up on a single node (or two) and the rest of the cluster may be sitting idle. | |
Are there other HBase FAQs? | |
See the FAQ that is up on the wiki, HBase Wiki FAQ. | |
Does HBase support SQL? | |
Not really. SQL-ish support for HBase via Hive is in development, however Hive is based on MapReduce which is not generally suitable for low-latency requests. See the Chapter 5, Data Model section for examples on the HBase client. | |
How does HBase work on top of HDFS? | |
HDFS is a distributed file system that is well suited for the storage of large files. It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files. HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion. See the Chapter 5, Data Model and Chapter 8, Architecture sections for more information on how HBase achieves its goals. | |
Can I change a table's rowkeys? | |
B.2. Amazon EC2 | |
I am running HBase on Amazon EC2 and... | |
See Troubleshooting Section 11.9, “Amazon EC2” and Performance Section 10.10, “Amazon EC2” sections. | |
B.3. Building HBase | |
When I build, why do I always get | |
Ignore it. Its not an error. It is officially ugly though. | |
B.4. Runtime | |
I'm having problems with my HBase cluster, how can I troubleshoot it? | |
How can I improve HBase cluster performance? | |
Why are logs flooded with '2011-01-10 12:40:48,407 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor' messages? | |
Because we are not using the native versions of compression libraries. See HBASE-1900 Put back native support when hadoop 0.21 is released. Copy the native libs from hadoop under hbase lib dir or symlink them into place and the message should go away. | |
B.5. How do I...? | |
Secondary Indexes in HBase? | |
See Section 6.8, “ Secondary Indexes and Alternate Query Paths ” | |
Store (fill in the blank) in HBase? | |
Back up my HBase Cluster? | |
Get a column 'slice': i.e. I have a million columns in my row but I only want to look at columns bbbb-bbbd? | |
See |