See ???.
See ???. See also Section 1.6.7.1, “However...” for compression caveats.
The regionsize can be set on a per-table basis via setFileSize on
    HTableDescriptor in the
    event where certain tables require different regionsizes than the configured default regionsize.
    
See Section 1.4.1, “Number of Regions” for more information.
Bloom Filters can be enabled per-ColumnFamily.
        Use HColumnDescriptor.setBloomFilterType(NONE | ROW |
        ROWCOL) to enable blooms per Column Family. Default =
        NONE for no bloom filters. If
        ROW, the hash of the row will be added to the bloom
        on each insert. If ROWCOL, the hash of the row +
        column family + column family qualifier will be added to the bloom on
        each key insert.
See HColumnDescriptor and Section 1.9.8, “Bloom Filters” for more information or this answer up in quora, How are bloom filters used in HBase?.
The blocksize can be configured for each ColumnFamily in a table, and this defaults to 64k. Larger cell values require larger blocksizes. There is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if the blocksize is doubled then the resulting indexes should be roughly halved).
See HColumnDescriptor and ???for more information.
ColumnFamilies can optionally be defined as in-memory. Data is still persisted to disk, just like any other ColumnFamily. In-memory blocks have the highest priority in the ???, but it is not a guarantee that the entire table will be in memory.
See HColumnDescriptor for more information.
Production systems should use compression with their ColumnFamily definitions. See ??? for more information.
Compression deflates data on disk. When it's in-memory (e.g., in the MemStore) or on the wire (e.g., transferring between RegionServer and Client) it's inflated. So while using ColumnFamily compression is a best practice, but it's not going to completely eliminate the impact of over-sized Keys, over-sized ColumnFamily names, or over-sized Column names.
See ??? on for schema design tips, and ??? for more information on HBase stores data internally.