A common question for HBase administrators is estimating how much storage will be required for an HBase cluster. There are several apsects to consider, the most important of which is what data load into the cluster. Start with a solid understanding of how HBase handles data internally (KeyValue).
HBase storage will be dominated by KeyValues. See Section 22.214.171.124, “KeyValue” and Section 6.3.2, “Try to minimize row and column sizes” for how HBase stores data internally.
It is critical to understand that there is a KeyValue instance for every attribute stored in a row, and the rowkey-length, ColumnFamily name-length and attribute lengths will drive the size of the database more than any other factor.
KeyValue instances are aggregated into blocks, and the blocksize is configurable on a per-ColumnFamily basis. Blocks are aggregated into StoreFile's. See Section 9.7, “Regions”.
Another common question for HBase administrators is determining the right number of regions per RegionServer. This affects both storage and hardware planning. See Section 12.4.1, “Number of Regions”.