Chapter 1. Apache HBase Performance Tuning

Table of Contents

1.1. Operating System
1.1.1. Memory
1.1.2. 64-bit
1.1.3. Swapping
1.2. Network
1.2.1. Single Switch
1.2.2. Multiple Switches
1.2.3. Multiple Racks
1.2.4. Network Interfaces
1.3. Java
1.3.1. The Garbage Collector and Apache HBase
1.4. HBase Configurations
1.4.1. Managing Compactions
1.4.2. hbase.regionserver.handler.count
1.4.3. hfile.block.cache.size
1.4.4. hbase.regionserver.global.memstore.upperLimit
1.4.5. hbase.regionserver.global.memstore.lowerLimit
1.4.6. hbase.hstore.blockingStoreFiles
1.4.7. hbase.hregion.memstore.block.multiplier
1.4.8. hbase.regionserver.checksum.verify
1.5. ZooKeeper
1.6. Schema Design
1.6.1. Number of Column Families
1.6.2. Key and Attribute Lengths
1.6.3. Table RegionSize
1.6.4. Bloom Filters
1.6.5. ColumnFamily BlockSize
1.6.6. In-Memory ColumnFamilies
1.6.7. Compression
1.7. HBase General Patterns
1.7.1. Constants
1.8. Writing to HBase
1.8.1. Batch Loading
1.8.2. Table Creation: Pre-Creating Regions
1.8.3. Table Creation: Deferred Log Flush
1.8.4. HBase Client: AutoFlush
1.8.5. HBase Client: Turn off WAL on Puts
1.8.6. HBase Client: Group Puts by RegionServer
1.8.7. MapReduce: Skip The Reducer
1.8.8. Anti-Pattern: One Hot Region
1.9. Reading from HBase
1.9.1. Scan Caching
1.9.2. Scan Attribute Selection
1.9.3. MapReduce - Input Splits
1.9.4. Close ResultScanners
1.9.5. Block Cache
1.9.6. Optimal Loading of Row Keys
1.9.7. Concurrency: Monitor Data Spread
1.9.8. Bloom Filters
1.10. Deleting from HBase
1.10.1. Using HBase Tables as Queues
1.10.2. Delete RPC Behavior
1.11. HDFS
1.11.1. Current Issues With Low-Latency Reads
1.11.2. Leveraging local data
1.11.3. Performance Comparisons of HBase vs. HDFS
1.12. Amazon EC2
1.13. Collocating HBase and MapReduce
1.14. Case Studies

1.1. Operating System

1.1.1. Memory

RAM, RAM, RAM. Don't starve HBase.

1.1.2. 64-bit

Use a 64-bit platform (and 64-bit JVM).

1.1.3. Swapping

Watch out for swapping. Set swappiness to 0.

comments powered by Disqus