Chapter 10. Performance Tuning

Table of Contents

10.1. Operating System
10.1.1. Memory
10.1.2. 64-bit
10.1.3. Swapping
10.2. Network
10.2.1. Single Switch
10.2.2. Multiple Switches
10.2.3. Multiple Racks
10.3. Java
10.3.1. The Garbage Collector and HBase
10.4. HBase Configurations
10.4.1. Number of Regions
10.4.2. Managing Compactions
10.4.3. hbase.regionserver.handler.count
10.4.4. hfile.block.cache.size
10.4.5. hbase.regionserver.global.memstore.upperLimit
10.4.6. hbase.regionserver.global.memstore.lowerLimit
10.4.7. hbase.hstore.blockingStoreFiles
10.4.8. hbase.hregion.memstore.block.multiplier
10.5. Schema Design
10.5.1. Number of Column Families
10.5.2. Key and Attribute Lengths
10.5.3. Table RegionSize
10.5.4. Bloom Filters
10.5.5. ColumnFamily BlockSize
10.5.6. In-Memory ColumnFamilies
10.5.7. Compression
10.6. Writing to HBase
10.6.1. Batch Loading
10.6.2. Table Creation: Pre-Creating Regions
10.6.3. Table Creation: Deferred Log Flush
10.6.4. HBase Client: AutoFlush
10.6.5. HBase Client: Turn off WAL on Puts
10.6.6. HBase Client: Group Puts by RegionServer
10.6.7. MapReduce: Skip The Reducer
10.6.8. Anti-Pattern: One Hot Region
10.7. Reading from HBase
10.7.1. Scan Caching
10.7.2. Scan Attribute Selection
10.7.3. Close ResultScanners
10.7.4. Block Cache
10.7.5. Optimal Loading of Row Keys
10.7.6. Concurrency: Monitor Data Spread
10.8. Deleting from HBase
10.8.1. Using HBase Tables as Queues
10.8.2. Delete RPC Behavior
10.9. HDFS
10.9.1. Current Issues With Low-Latency Reads
10.9.2. Performance Comparisons of HBase vs. HDFS
10.10. Amazon EC2

10.1. Operating System

10.1.1. Memory

RAM, RAM, RAM. Don't starve HBase.

10.1.2. 64-bit

Use a 64-bit platform (and 64-bit JVM).

10.1.3. Swapping

Watch out for swapping. Set swappiness to 0.