Chapter 1. HBase Operational Management

Table of Contents

1.1. HBase Tools and Utilities
1.1.1. HBase hbck
1.1.2. HFile Tool
1.1.3. WAL Tools
1.1.4. Compression Tool
1.1.5. CopyTable
1.1.6. Export
1.1.7. Import
1.1.8. RowCounter
1.2. Node Management
1.2.1. Node Decommission
1.2.2. Rolling Restart
1.3. Metrics
1.3.1. Metric Setup
1.3.2. RegionServer Metrics
1.4. HBase Monitoring
1.5. Cluster Replication
1.6. HBase Backup
1.6.1. Full Shutdown Backup
1.6.2. Live Cluster Backup - Replication
1.6.3. Live Cluster Backup - CopyTable
1.6.4. Live Cluster Backup - Export
1.7. Capacity Planning
1.7.1. Storage
1.7.2. Regions
This chapter will cover operational tools and practices required of a running HBase cluster. The subject of operations is related to the topics of ???, ???, and ??? but is a distinct topic in itself.

1.1. HBase Tools and Utilities

Here we list HBase tools for administration, analysis, fixup, and debugging.

1.1.1. HBase hbck

An fsck for your HBase install

To run hbck against your HBase cluster run

$ ./bin/hbase hbck

At the end of the commands output it prints OK or INCONSISTENCY. If your cluster reports inconsistencies, pass -details to see more detail emitted. If inconsistencies, run hbck a few times because the inconsistency may be transient (e.g. cluster is starting up or a region is splitting). Passing -fix may correct the inconsistency (This latter is an experimental feature).

1.1.2. HFile Tool

See ???.

1.1.3. WAL Tools

1.1.3.1. HLog tool

The main method on HLog offers manual split and dump facilities. Pass it WALs or the product of a split, the content of the recovered.edits. directory.

You can get a textual dump of a WAL file content by doing the following:

 $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012 

The return code will be non-zero if issues with the file so you can test wholesomeness of file by redirecting STDOUT to /dev/null and testing the program return.

Similarily you can force a split of a log file directory by doing:

 $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --split hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/

1.1.4. Compression Tool

See Section 1.1.4, “Compression Tool”.

1.1.5. CopyTable

CopyTable is a utility that can copy part or of all of a table, either to the same cluster or another cluster. The usage is as follows:

$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable [--rs.class=CLASS] [--rs.impl=IMPL] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] tablename

Options:

  • rs.class hbase.regionserver.class of the peer cluster. Specify if different from current cluster.
  • rs.impl hbase.regionserver.impl of the peer cluster.
  • starttime Beginning of the time range. Without endtime means starttime to forever.
  • endtime End of the time range. Without endtime means starttime to forever.
  • new.name New table's name.
  • peer.adr Address of the peer cluster given in the format hbase.zookeeper.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent
  • families Comma-separated list of ColumnFamilies to copy.

Args:

  • tablename Name of table to copy.

Example of copying 'TestTable' to a cluster that uses replication for a 1 hour window:

$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable
--rs.class=org.apache.hadoop.hbase.ipc.ReplicationRegionInterface
--rs.impl=org.apache.hadoop.hbase.regionserver.replication.ReplicationRegionServer
--starttime=1265875194289 --endtime=1265878794289
--peer.adr=server1,server2,server3:2181:/hbase TestTable

1.1.6. Export

Export is a utility that will dump the contents of table to HDFS in a sequence file. Invoke via:

$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]

1.1.7. Import

Import is a utility that will load data that has been exported back into HBase. Invoke via:

$ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>

1.1.8. RowCounter

RowCounter is a utility that will count all the rows of a table. This is a good utility to use as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency.

$ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename> [<column1> <column2>...]