E.2. HFile format version 1 overview

As we will be discussing the changes we are making to the HFile format, it is useful to give a short overview of the previous (HFile version 1) format. An HFile in the existing format is structured as follows: HFile Version 1 [37]

E.2.1.  Block index format in version 1

The block index in version 1 is very straightforward. For each entry, it contains:

  1. Offset (long)

  2. Uncompressed size (int)

  3. Key (a serialized byte array written using Bytes.writeByteArray)

    1. Key length as a variable-length integer (VInt)

    2. Key bytes

The number of entries in the block index is stored in the fixed file trailer, and has to be passed in to the method that reads the block index. One of the limitations of the block index in version 1 is that it does not provide the compressed size of a block, which turns out to be necessary for decompression. Therefore, the HFile reader has to infer this compressed size from the offset difference between blocks. We fix this limitation in version 2, where we store on-disk block size instead of uncompressed size, and get uncompressed size from the block header.

[37] Image courtesy of Lars George, hbase-architecture-101-storage.html.

comments powered by Disqus