HBase includes several methods of loading data into tables.
        The most straightforward method is to either use the TableOutputFormat
        class from a MapReduce job, or use the normal client APIs; however,
        these are not always the most efficient methods.
      
The bulk load feature uses a MapReduce job to output table data in HBase's internal data format, and then directly loads the generated StoreFiles into a running cluster. Using bulk load will use less CPU and network resources than simply using the HBase API.
The HBase bulk load process consists of two main steps.
          The first step of a bulk load is to generate HBase data files (StoreFiles) from
          a MapReduce job using HFileOutputFormat. This output format writes
          out data in HBase's internal storage format so that they can be
          later loaded very efficiently into the cluster.
        
          In order to function efficiently, HFileOutputFormat must be
          configured such that each output HFile fits within a single region.
          In order to do this, jobs whose output will be bulk loaded into HBase
          use Hadoop's TotalOrderPartitioner class to partition the map output
          into disjoint ranges of the key space, corresponding to the key
          ranges of the regions in the table.
        
          HFileOutputFormat includes a convenience function,
          configureIncrementalLoad(), which automatically sets up
          a TotalOrderPartitioner based on the current region boundaries of a
          table.
        
          After the data has been prepared using
          HFileOutputFormat, it is loaded into the cluster using
          completebulkload. This command line tool iterates
          through the prepared data files, and for each one determines the
          region the file belongs to. It then contacts the appropriate Region
          Server which adopts the HFile, moving it into its storage directory
          and making the data available to clients.
        
          If the region boundaries have changed during the course of bulk load
          preparation, or between the preparation and completion steps, the
          completebulkloads utility will automatically split the
          data files into pieces corresponding to the new boundaries. This
          process is not optimally efficient, so users should take care to
          minimize the delay between preparing a bulk load and importing it
          into the cluster, especially if other clients are simultaneously
          loading data through other means.
        
        After a data import has been prepared, either by using the
        importtsv tool with the
        "importtsv.bulk.output" option or by some other MapReduce
        job using the HFileOutputFormat, the
        completebulkload tool is used to import the data into the
        running cluster.
      
        The completebulkload tool simply takes the output path
        where importtsv or your MapReduce job put its results, and
        the table name to import into. For example:
      
$ hadoop jar hbase-VERSION.jar completebulkload [-c /path/to/hbase/config/hbase-site.xml] /user/todd/myoutput mytable
        The -c config-file option can be used to specify a file
        containing the appropriate hbase parameters (e.g., hbase-site.xml) if
        not supplied already on the CLASSPATH (In addition, the CLASSPATH must
        contain the directory that has the zookeeper configuration file if
        zookeeper is NOT managed by HBase).
      
Note: If the target table does not already exist in HBase, this tool will create the table automatically.
This tool will run quickly, after which point the new data will be visible in the cluster.
For more information about the referenced utilities, see Section 15.1.9, “ImportTsv” and Section 15.1.10, “CompleteBulkLoad”.
        Although the importtsv tool is useful in many cases, advanced users may
        want to generate data programatically, or import data from other formats. To get
        started doing so, dig into ImportTsv.java and check the JavaDoc for
        HFileOutputFormat.
      
        The import step of the bulk load can also be done programatically. See the
        LoadIncrementalHFiles class for more information.