10.5. Schema Design

10.5.1. Number of Column Families

See Section 6.2, “ On the number of column families ”.

10.5.2. Key and Attribute Lengths

See Section 6.3.2, “Try to minimize row and column sizes”.

10.5.3. Table RegionSize

The regionsize can be set on a per-table basis via setFileSize on HTableDescriptor in the event where certain tables require different regionsizes than the configured default regionsize.

See Section 10.4.1, “Number of Regions” for more information.

10.5.4. Bloom Filters

Bloom Filters can be enabled per-ColumnFamily. Use HColumnDescriptor.setBloomFilterType(NONE | ROW | ROWCOL) to enable blooms per Column Family. Default = NONE for no bloom filters. If ROW, the hash of the row will be added to the bloom on each insert. If ROWCOL, the hash of the row + column family + column family qualifier will be added to the bloom on each key insert.

See HColumnDescriptor and Section 8.6.5, “Bloom Filters” for more information.

10.5.5. ColumnFamily BlockSize

The blocksize can be configured for each ColumnFamily in a table, and this defaults to 64k. Larger cell values require larger blocksizes. There is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if the blocksize is doubled then the resulting indexes should be roughly halved).

See HColumnDescriptor and Section 8.6.4, “Store”for more information.

10.5.6. In-Memory ColumnFamilies

ColumnFamilies can optionally be defined as in-memory. Data is still persisted to disk, just like any other ColumnFamily. In-memory blocks have the highest priority in the Section 8.5.3, “Block Cache”, but it is not a guarantee that the entire table will be in memory.

See HColumnDescriptor for more information.

10.5.7. Compression

Production systems should use compression with their ColumnFamily definitions. See Appendix A, Compression In HBase for more information.