Chapter 1. HBase and Schema Design

Table of Contents

1.1. Schema Creation
1.1.1. Schema Updates
1.2. On the number of column families
1.2.1. Cardinality of ColumnFamilies
1.3. Rowkey Design
1.3.1. Monotonically Increasing Row Keys/Timeseries Data
1.3.2. Try to minimize row and column sizes
1.3.3. Reverse Timestamps
1.3.4. Rowkeys and ColumnFamilies
1.3.5. Immutability of Rowkeys
1.3.6. Relationship Between RowKeys and Region Splits
1.4. Number of Versions
1.4.1. Maximum Number of Versions
1.4.2. Minimum Number of Versions
1.5. Supported Datatypes
1.5.1. Counters
1.6. Joins
1.7. Time To Live (TTL)
1.8. Keeping Deleted Cells
1.9. Secondary Indexes and Alternate Query Paths
1.9.1. Filter Query
1.9.2. Periodic-Update Secondary Index
1.9.3. Dual-Write Secondary Index
1.9.4. Summary Tables
1.9.5. Coprocessor Secondary Index
1.10. Constraints
1.11. Schema Design Case Studies
1.11.1. Case Study - Log Data and Timeseries Data
1.11.2. Case Study - Log Data and Timeseries Data on Steroids
1.11.3. Case Study - Customer/Order
1.11.4. Case Study - "Tall/Wide/Middle" Schema Design Smackdown
1.11.5. Case Study - List Data
1.12. Operational and Performance Configuration Options

A good general introduction on the strength and weaknesses modelling on the various non-rdbms datastores is Ian Varley's Master thesis, No Relation: The Mixed Blessings of Non-Relational Databases. Recommended. Also, read ??? for how HBase stores data internally, and the section on Section 1.11, “Schema Design Case Studies”.

1.1.  Schema Creation

HBase schemas can be created or updated with ??? or by using HBaseAdmin in the Java API.

Tables must be disabled when making ColumnFamily modifications, for example..

Configuration config = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
String table = "myTable";


HColumnDescriptor cf1 = ...;
admin.addColumn(table, cf1);      // adding new ColumnFamily
HColumnDescriptor cf2 = ...;
admin.modifyColumn(table, cf2);    // modifying existing ColumnFamily


See ??? for more information about configuring client connections.

Note: online schema changes are supported in the 0.92.x codebase, but the 0.90.x codebase requires the table to be disabled.

1.1.1. Schema Updates

When changes are made to either Tables or ColumnFamilies (e.g., region size, block size), these changes take effect the next time there is a major compaction and the StoreFiles get re-written.

See ??? for more information on StoreFiles.

comments powered by Disqus