Chapter 1. Building and Developing Apache HBase

Table of Contents

1.1. Apache HBase Repositories
1.1.1. SVN
1.1.2. Git
1.2. IDEs
1.2.1. Eclipse
1.3. Building Apache HBase
1.3.1. Basic Compile
1.3.2. Build Protobuf
1.3.3. Build Gotchas
1.3.4. Building in snappy compression support
1.4. Releasing Apache HBase
1.4.1. Making a Release Candidate
1.4.2. Publishing a SNAPSHOT to maven
1.5. Generating the HBase Reference Guide
1.6. Updating hbase.apache.org
1.6.1. Contributing to hbase.apache.org
1.6.2. Publishing hbase.apache.org
1.7. Tests
1.7.1. Apache HBase Modules
1.7.2. Unit Tests
1.7.3. Running tests
1.7.4. Writing Tests
1.7.5. Integration Tests
1.8. Maven Build Commands
1.8.1. Compile
1.8.2. Running all or individual Unit Tests
1.8.3. Building against various hadoop versions.
1.9. Getting Involved
1.9.1. Mailing Lists
1.9.2. Jira
1.10. Developing
1.10.1. Codelines
1.10.2. Unit Tests
1.10.3. Code Standards
1.10.4. Invariants
1.10.5. Running In-Situ
1.10.6. Adding Metrics
1.11. Submitting Patches
1.11.1. Create Patch
1.11.2. Patch File Naming
1.11.3. Unit Tests
1.11.4. Attach Patch to Jira
1.11.5. Common Patch Feedback
1.11.6. Submitting a patch again
1.11.7. Submitting incremental patches
1.11.8. ReviewBoard
1.11.9. Committing Patches

This chapter will be of interest only to those building and developing Apache HBase (i.e., as opposed to just downloading the latest distribution).

1.1. Apache HBase Repositories

There are two different repositories for Apache HBase: Subversion (SVN) and Git. The former is the system of record for committers, but the latter is easier to work with to build and contribute. SVN updates get automatically propagated to the Git repo.

1.1.1. SVN

svn co http://svn.apache.org/repos/asf/hbase/trunk hbase-core-trunk
        

1.1.2. Git

git clone git://git.apache.org/hbase.git
        

There is also a github repository that mirrors Apache git repository. If you'd like to develop within github environment (collaborating, pull requests) you can get the source code by:

git clone git://github.com/apache/hbase.git
              

1.2. IDEs

1.2.1. Eclipse

1.2.1.1. Code Formatting

Under the dev-support folder, you will find hbase_eclipse_formatter.xml. We encourage you to have this formatter in place in eclipse when editing HBase code. To load it into eclipse:

  1. Go to Eclipse->Preferences...

  2. In Preferences, Go to Java->Code Style->Formatter

  3. Import... hbase_eclipse_formatter.xml

  4. Click Apply

  5. Still in Preferences, Go to Java->Editor->Save Actions

  6. Check the following:

    1. Perform the selected actions on save

    2. Format source code

    3. Format edited lines

  7. Click Apply

In addition to the automatic formatting, make sure you follow the style guidelines explained in Section 1.11.5, “Common Patch Feedback”

Also, no @author tags - that's a rule. Quality Javadoc comments are appreciated. And include the Apache license.

1.2.1.2. Subversive Plugin

Download and install the Subversive plugin.

Set up an SVN Repository target from Section 1.1.1, “SVN”, then check out the code.

1.2.1.3. Git Plugin

If you cloned the project via git, download and install the Git plugin (EGit). Attach to your local git repo (via the Git Repositories window) and you'll be able to see file revision history, generate patches, etc.

1.2.1.4. HBase Project Setup in Eclipse

The easiest way is to use the m2eclipse plugin for Eclipse. Eclipse Indigo or newer has m2eclipse built-in, or it can be found here:http://www.eclipse.org/m2e/. M2Eclipse provides Maven integration for Eclipse - it even lets you use the direct Maven commands from within Eclipse to compile and test your project.

To import the project, you merely need to go to File->Import...Maven->Existing Maven Projects and then point Eclipse at the HBase root directory; m2eclipse will automatically find all the hbase modules for you.

If you install m2eclipse and import HBase in your workspace, you will have to fix your eclipse Build Path. Remove target folder, add target/generated-jamon and target/generated-sources/java folders. You may also remove from your Build Path the exclusions on the src/main/resources and src/test/resources to avoid error message in the console 'Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (default) on project hbase: 'An Ant BuildException has occured: Replace: source file .../target/classes/hbase-default.xml doesn't exist'. This will also reduce the eclipse build cycles and make your life easier when developing.

1.2.1.5. Import into eclipse with the command line

For those not inclined to use m2eclipse, you can generate the Eclipse files from the command line. First, run (you should only have to do this once):

mvn clean install -DskipTests

and then close Eclipse and execute...

mvn eclipse:eclipse

... from your local HBase project directory in your workspace to generate some new .project and .classpathfiles. Then reopen Eclipse, or refresh your eclipse project (F5), and import the .project file in the HBase directory to a workspace.

1.2.1.6. Maven Classpath Variable

The M2_REPO classpath variable needs to be set up for the project. This needs to be set to your local Maven repository, which is usually ~/.m2/repository

If this classpath variable is not configured, you will see compile errors in Eclipse like this...
Description	Resource	Path	Location	Type
The project cannot be built until build path errors are resolved	hbase		Unknown	Java Problem
Unbound classpath variable: 'M2_REPO/asm/asm/3.1/asm-3.1.jar' in project 'hbase'	hbase		Build path	Build Path Problem
Unbound classpath variable: 'M2_REPO/com/github/stephenc/high-scale-lib/high-scale-lib/1.1.1/high-scale-lib-1.1.1.jar' in project 'hbase'	hbase		Build path	Build Path Problem
Unbound classpath variable: 'M2_REPO/com/google/guava/guava/r09/guava-r09.jar' in project 'hbase'	hbase		Build path	Build Path Problem
Unbound classpath variable: 'M2_REPO/com/google/protobuf/protobuf-java/2.3.0/protobuf-java-2.3.0.jar' in project 'hbase'	hbase		Build path	Build Path Problem Unbound classpath variable:
            

1.2.1.7. Eclipse Known Issues

Eclipse will currently complain about Bytes.java. It is not possible to turn these errors off.

Description	Resource	Path	Location	Type
Access restriction: The method arrayBaseOffset(Class) from the type Unsafe is not accessible due to restriction on required library /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Classes/classes.jar	Bytes.java	/hbase/src/main/java/org/apache/hadoop/hbase/util	line 1061	Java Problem
Access restriction: The method arrayIndexScale(Class) from the type Unsafe is not accessible due to restriction on required library /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Classes/classes.jar	Bytes.java	/hbase/src/main/java/org/apache/hadoop/hbase/util	line 1064	Java Problem
Access restriction: The method getLong(Object, long) from the type Unsafe is not accessible due to restriction on required library /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Classes/classes.jar	Bytes.java	/hbase/src/main/java/org/apache/hadoop/hbase/util	line 1111	Java Problem
             

1.2.1.8. Eclipse - More Information

For additional information on setting up Eclipse for HBase development on Windows, see Michael Morello's blog on the topic.

1.3. Building Apache HBase

1.3.1. Basic Compile

Thanks to maven, building HBase is pretty easy. You can read about the various maven commands in Section 1.8, “Maven Build Commands”, but the simplest command to compile HBase from its java source code is:

mvn package -DskipTests
       

Or, to clean up before compiling:

mvn clean package -DskipTests
       

With Eclipse set up as explained above in Section 1.2.1, “Eclipse”, you can also simply use the build command in Eclipse. To create the full installable HBase package takes a little bit more work, so read on.

1.3.2. Build Protobuf

You may need to change the protobuf definitions that reside in the hbase-protocol module.

The protobuf files are located in hbase-protocol/src/main/protobuf. For the change to be effective, you will need to regenerate the classes (read the hbase-protocol/README.txt for more details).

1.3.3. Build Gotchas

If you see Unable to find resource 'VM_global_library.vm', ignore it. Its not an error. It is officially ugly though.

1.3.4. Building in snappy compression support

Pass -Dsnappy to trigger the snappy maven profile for building snappy native libs into hbase. See also ???

1.4. Releasing Apache HBase

HBase 0.96.x will run on hadoop 1.x or hadoop 2.x but building, you must choose which to build against; we cannot make a single HBase binary to run against both hadoop1 and hadoop2. Since we include the Hadoop we were built against -- so we can do standalone mode -- the set of modules included in the tarball changes dependent on whether the hadoop1 or hadoop2 target chosen. You can tell which HBase you have -- whether it is for hadoop1 or hadoop2 by looking at the version; the HBase for hadoop1 will include 'hadoop1' in its version. Ditto for hadoop2.

Maven, our build system, natively will not let you have a single product built against different dependencies. Its understandable. But neither could we convince maven to change the set of included modules and write out the correct poms w/ appropriate dependencies even though we have two build targets; one for hadoop1 and another for hadoop2. So, there is a prestep required. This prestep takes as input the current pom.xmls and it generates hadoop1 or hadoop2 versions. You then reference these generated poms when you build. Read on for examples

Publishing to maven requires you sign the artifacts you want to upload. To have the build do this for you, you need to make sure you have a properly configured settings.xml in your local repository under .m2. Here is my ~/.m2/settings.xml.

<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
                      http://maven.apache.org/xsd/settings-1.0.0.xsd">
  <servers>
    <!- To publish a snapshot of some part of Maven -->
    <server>
      <id>apache.snapshots.https</id>
      <username>YOUR_APACHE_ID
      </username>
      <password>YOUR_APACHE_PASSWORD
      </password>
    </server>
    <!-- To publish a website using Maven -->
    <!-- To stage a release of some part of Maven -->
    <server>
      <id>apache.releases.https</id>
      <username>YOUR_APACHE_ID
      </username>
      <password>YOUR_APACHE_PASSWORD
      </password>
    </server>
  </servers>
  <profiles>
    <profile>
      <id>apache-release</id>
      <properties>
    <gpg.keyname>YOUR_KEYNAME</gpg.keyname>
    <!--Keyname is something like this ... 00A5F21E... do gpg --list-keys to find it-->
    <gpg.passphrase>YOUR_KEY_PASSWORD
    </gpg.passphrase>
      </properties>
    </profile>
  </profiles>
</settings>
        

You must use maven 3.0.x (Check by running mvn -version).

1.4.1. Making a Release Candidate

I'll explain by running through the process. See later in this section for more detail on particular steps. The script dev-support/make_rc.sh automates most of this.

The Hadoop How To Release wiki page informs much of the below and may have more detail on particular sections so it is worth review.

Update CHANGES.txt with the changes since the last release (query JIRA, export to excel then hack w/ vim to format to suit CHANGES.txt TODO: Needs detail). Adjust the version in all the poms appropriately; e.g. you may need to remove -SNAPSHOT from all versions. The Versions Maven Plugin can be of use here. To set a version in all poms, do something like this:

$ mvn clean org.codehaus.mojo:versions-maven-plugin:1.3.1:set -DnewVersion=0.96.0

Checkin the CHANGES.txt and version changes.

Now, build the src tarball. This tarball is hadoop version independent. It is just the pure src code and documentation without an hadoop1 or hadoop2 taint. Add the -Prelease profile when building; it checks files for licenses and will fail the build if unlicensed files present.

$ MAVEN_OPTS="-Xmx2g" mvn clean install -DskipTests assembly:single -Dassembly.file=hbase-assembly/src/main/assembly/src.xml -Prelease

Undo the tarball and make sure it looks good (A good test is seeing if you can build from the undone tarball). Save it off to a version directory, i.e a directory somewhere where you are collecting all of the tarballs you will publish as part of the release candidate. For example if we were building a hbase-0.96.0 release candidate, we might call the directory hbase-0.96.0RC0. Later we will publish this directory as our release candidate up on people.apache.org/~you.

Now we are into the making of the hadoop1 and hadoop2 specific builds. Lets do hadoop1 first. First generate the hadoop1 poms. See the generate-hadoopX-poms.sh script usage for what it expects by way of arguments. You will find it in the dev-support subdirectory. In the below, we generate hadoop1 poms with a version of 0.96.0-hadoop1 (the script will look for a version of 0.96.0 in the current pom.xml).

$ ./dev-support/generate-hadoopX-poms.sh 0.96.0 0.96.0-hadoop1

The script will work silently if all goes well. It will drop a pom.xml.hadoop1 beside all pom.xmls in all modules.

Now build the hadoop1 tarball. Note how we reference the new pom.xml.hadoop1 explicitly. We also add the -Prelease profile when building; it checks files for licenses and will fail the build if unlicensed files present. Do it in two steps. First install into the local repository and then generate documentation and assemble the tarball (Otherwise build complains that hbase modules are not in maven repo when we try to do it all in the one go especially on fresh repo). It seems that you need the install goal in both steps.

$ MAVEN_OPTS="-Xmx3g" mvn -f pom.xml.hadoop1 clean install -DskipTests -Prelease
$ MAVEN_OPTS="-Xmx3g" mvn -f pom.xml.hadoop1 install -DskipTests site assembly:single -Prelease

Undo the generated tarball and check it out. Look at doc. and see if it runs, etc. Are the set of modules appropriate: e.g. do we have a hbase-hadoop2-compat in the hadoop1 tarball? If good, copy the tarball to your version directory.

I'll tag the release at this point since its looking good. If we find an issue later, we can delete the tag and start over. Release needs to be tagged when we do next step.

Now deploy hadoop1 hbase to mvn. Do the mvn deploy and tgz for a particular version all together in the one go else if you flip between hadoop1 and hadoop2 builds, you might mal-publish poms and hbase-default.xml's (the version interpolations won't match). This time we use the apache-release profile instead of just release profile when doing mvn deploy; it will invoke the apache pom referenced by our poms. It will also sign your artifacts published to mvn as long as your settings.xml in your local .m2 repository is configured correctly (your settings.xml adds your gpg password property to the apache profile).

$ MAVEN_OPTS="-Xmx3g" mvn -f pom.xml.hadoop1 deploy -DskipTests -Papache-release

The last command above copies all artifacts for hadoop1 up to mvn repo. If no -SNAPSHOT in the version, it puts the artifacts into a staging directory. This is what you want.

hbase-downstreamer

See the hbase-downstreamer test for a simple example of a project that is downstream of hbase an depends on it. Check it out and run its simple test to make sure maven hbase-hadoop1 and hbase-hadoop2 are properly deployed to the maven repository.

Lets do the hadoop2 artifacts (read above hadoop1 section closely before coming here because we don't repeat explaination in the below).

# Generate the hadoop2 poms.
$ ./dev-support/generate-hadoopX-poms.sh 0.96.0 0.96.0-hadoop2
# Install the hbase hadoop2 jars into local repo then build the doc and tarball
$ MAVEN_OPTS="-Xmx3g" mvn -f pom.xml.hadoop2 clean install -DskipTests -Prelease
$ MAVEN_OPTS="-Xmx3g" mvn -f pom.xml.hadoop2 install -DskipTests site assembly:single -Prelease
# Undo the tgz and check it out.  If good, copy the tarball to your 'version directory'. Now deploy to mvn.
$ MAVEN_OPTS="-Xmx3g" mvn -f pom.xml.hadoop2 deploy -DskipTests -Papache-release
            

At this stage we have three tarballs in our 'version directory' and two sets of artifacts up in maven in staging area. First lets put the version directory up on people.apache.org. You will need to sign and fingerprint them before you push them up. In the version directory do this:

$ for i in *.tar.gz; do echo $i; gpg --print-mds $i > $i.mds ; done
$ for i in *.tar.gz; do echo $i; gpg --armor --output $i.asc --detach-sig $i  ; done
$ cd ..
# Presuming our 'version directory' is named 0.96.0RC0, now copy it up to people.apache.org.
$ rsync -av 0.96.0RC0 people.apache.org:public_html
        

For the maven artifacts, login at repository.apache.org. Find your artifacts in the staging directory. Close the artifacts. This will give you an URL for the temporary mvn staging repository. Do the closing for hadoop1 and hadoop2 repos. See Publishing Maven Artifacts for some pointers.

Note

We no longer publish using the maven release plugin. Instead we do mvn deploy. It seems to give us a backdoor to maven release publishing. If no -SNAPSHOT on the version string, then we are 'deployed' to the apache maven repository staging directory from which we can publish URLs for candidates and later, if they pass, publish as release (if a -SNAPSHOT on the version string, deploy will put the artifacts up into apache snapshot repos).

Make sure the people.apache.org directory is showing -- it can take a while to show -- and that the mvn repo urls are good. Announce the release candidate on the mailing list and call a vote.

A strange issue I ran into was the one where the upload into the apache repository was being sprayed across multiple apache machines making it so I could not release. See INFRA-4482 Why is my upload to mvn spread across multiple repositories?.

1.4.2. Publishing a SNAPSHOT to maven

Make sure your settings.xml is set up properly (see above for how). Make sure the hbase version includes -SNAPSHOT as a suffix. Here is how I published SNAPSHOTS of a checked that had an hbase version of 0.96.0 in its poms. First we generated the hadoop1 poms with a version that has a -SNAPSHOT suffix. We then installed the build into the local repository. Then we deploy this build to apache. See the output for the location up in apache to where the snapshot is copied. Notice how add the release profile when install locally -- to find files that are without proper license -- and then the apache-release profile to deploy to the apache maven repository.

$ ./dev-support/generate-hadoopX-poms.sh 0.96.0 0.96.0-hadoop1-SNAPSHOT
 $ MAVEN_OPTS="-Xmx3g" mvn -f pom.xml.hadoop1 clean install -DskipTests  javadoc:aggregate site assembly:single -Prelease
 $ MAVEN_OPTS="-Xmx3g" mvn -f pom.xml.hadoop1 -DskipTests  deploy -Papache-release

Next, do the same to publish the hadoop2 artifacts.

$ ./dev-support/generate-hadoopX-poms.sh 0.96.0 0.96.0-hadoop2-SNAPSHOT
$ MAVEN_OPTS="-Xmx3g" mvn -f pom.xml.hadoop2 clean install -DskipTests  javadoc:aggregate site assembly:single -Prelease
$ MAVEN_OPTS="-Xmx3g" mvn -f pom.xml.hadoop2 deploy -DskipTests -Papache-release

1.5. Generating the HBase Reference Guide

The manual is marked up using docbook. We then use the docbkx maven plugin to transform the markup to html. This plugin is run when you specify the site goal as in when you run mvn site or you can call the plugin explicitly to just generate the manual by doing mvn docbkx:generate-html (TODO: It looks like you have to run mvn site first because docbkx wants to include a transformed hbase-default.xml. Fix). When you run mvn site, we do the document generation twice, once to generate the multipage manual and then again for the single page manual (the single page version is easier to search).

1.6. Updating hbase.apache.org

1.6.1. Contributing to hbase.apache.org

The Apache HBase apache web site (including this reference guide) is maintained as part of the main Apache HBase source tree, under /src/main/docbkx and /src/main/site [1]. The former -- docbkx -- is this reference guide as a bunch of xml marked up using docbook; the latter is the hbase site (the navbars, the header, the layout, etc.), and some of the documentation, legacy pages mostly that are in the process of being merged into the docbkx tree that is converted to html by a maven plugin by the site build.

To contribute to the reference guide, edit these files under site or docbkx and submit them as a patch (see Section 1.11, “Submitting Patches”). Your Jira should contain a summary of the changes in each section (see HBASE-6081 for an example).

To generate the site locally while you're working on it, run:

mvn site

Then you can load up the generated HTML files in your browser (file are under /target/site).

1.6.2. Publishing hbase.apache.org

As of INFRA-5680 Migrate apache hbase website, to publish the website, build it, and then deploy it over a checkout of https://svn.apache.org/repos/asf/hbase/hbase.apache.org/trunk. Finally, check it in. For example, if trunk is checked out out at /Users/stack/checkouts/trunk and the hbase website, hbase.apache.org, is checked out at /Users/stack/checkouts/hbase.apache.org/trunk, to update the site, do the following:

              # Build the site and deploy it to the checked out directory
              # Getting the javadoc into site is a little tricky.  You have to build it before you invoke 'site'.
              $ MAVEN_OPTS=" -Xmx3g" mvn clean install -DskipTests javadoc:aggregate site  site:stage -DstagingDirectory=/Users/stack/checkouts/hbase.apache.org/trunk
          

Now check the deployed site by viewing in a brower, browse to file:////Users/stack/checkouts/hbase.apache.org/trunk/index.html and check all is good. If all checks out, commit it and your new build will show up immediately at http://hbase.apache.org

              $ cd /Users/stack/checkouts/hbase.apache.org/trunk
              $ svn status
              # Do an svn add of any new content...
              $ svn add ....
              $ svn commit -m 'Committing latest version of website...'
          

1.7. Tests

Developers, at a minimum, should familiarize themselves with the unit test detail; unit tests in HBase have a character not usually seen in other projects.

1.7.1. Apache HBase Modules

As of 0.96, Apache HBase is split into multiple modules which creates "interesting" rules for how and where tests are written. If you are writting code for hbase-server, see Section 1.7.2, “Unit Tests” for how to write your tests; these tests can spin up a minicluster and will need to be categorized. For any other module, for example hbase-common, the tests must be strict unit tests and just test the class under test - no use of the HBaseTestingUtility or minicluster is allowed (or even possible given the dependency tree).

1.7.1.1. Running Tests in other Modules

If the module you are developing in has no other dependencies on other HBase modules, then you can cd into that module and just run:
mvn test
which will just run the tests IN THAT MODULE. If there are other dependencies on other modules, then you will have run the command from the ROOT HBASE DIRECTORY. This will run the tests in the other modules, unless you specify to skip the tests in that module. For instance, to skip the tests in the hbase-server module, you would run:
mvn clean test -PskipServerTests
from the top level directory to run all the tests in modules other than hbase-server. Note that you can specify to skip tests in multiple modules as well as just for a single module. For example, to skip the tests in hbase-server and hbase-common, you would run:
mvn clean test -PskipServerTests -PskipCommonTests

Also, keep in mind that if you are running tests in the hbase-server module you will need to apply the maven profiles discussed in Section 1.7.3, “Running tests” to get the tests to run properly.

1.7.2. Unit Tests

Apache HBase unit tests are subdivided into four categories: small, medium, large, and integration with corresponding JUnit categories: SmallTests, MediumTests, LargeTests, IntegrationTests. JUnit categories are denoted using java annotations and look like this in your unit test code.

...
@Category(SmallTests.class)
public class TestHRegionInfo {
  @Test
  public void testCreateHRegionInfoName() throws Exception {
    // ...
  }
}

The above example shows how to mark a unit test as belonging to the small category. All unit tests in HBase have a categorization.

The first three categories, small, medium, and large are for tests run when you type $ mvn test; i.e. these three categorizations are for HBase unit tests. The integration category is for not for unit tests but for integration tests. These are run when you invoke $ mvn verify. Integration tests are described in Section 1.7.5, “Integration Tests” and will not be discussed further in this section on HBase unit tests.

Apache HBase uses a patched maven surefire plugin and maven profiles to implement its unit test characterizations.

Read the below to figure which annotation of the set small, medium, and large to put on your new HBase unit test.

1.7.2.1. Small Tests

Small tests are executed in a shared JVM. We put in this category all the tests that can be executed quickly in a shared JVM. The maximum execution time for a small test is 15 seconds, and small tests should not use a (mini)cluster.

1.7.2.2. Medium Tests

Medium tests represent tests that must be executed before proposing a patch. They are designed to run in less than 30 minutes altogether, and are quite stable in their results. They are designed to last less than 50 seconds individually. They can use a cluster, and each of them is executed in a separate JVM.

1.7.2.3. Large Tests

Large tests are everything else. They are typically large-scale tests, regression tests for specific bugs, timeout tests, performance tests. They are executed before a commit on the pre-integration machines. They can be run on the developer machine as well.

1.7.2.4. Integration Tests

Integration tests are system level tests. See Section 1.7.5, “Integration Tests” for more info.

1.7.3. Running tests

Below we describe how to run the Apache HBase junit categories.

1.7.3.1. Default: small and medium category tests

Running

mvn test

will execute all small tests in a single JVM (no fork) and then medium tests in a separate JVM for each test instance. Medium tests are NOT executed if there is an error in a small test. Large tests are NOT executed. There is one report for small tests, and one report for medium tests if they are executed.

1.7.3.2. Running all tests

Running

mvn test -P runAllTests

will execute small tests in a single JVM then medium and large tests in a separate JVM for each test. Medium and large tests are NOT executed if there is an error in a small test. Large tests are NOT executed if there is an error in a small or medium test. There is one report for small tests, and one report for medium and large tests if they are executed.

1.7.3.3. Running a single test or all tests in a package

To run an individual test, e.g. MyTest, do

mvn test -Dtest=MyTest

You can also pass multiple, individual tests as a comma-delimited list:

mvn test -Dtest=MyTest1,MyTest2,MyTest3

You can also pass a package, which will run all tests under the package:

mvn test -Dtest=org.apache.hadoop.hbase.client.*

When -Dtest is specified, localTests profile will be used. It will use the official release of maven surefire, rather than our custom surefire plugin, and the old connector (The HBase build uses a patched version of the maven surefire plugin). Each junit tests is executed in a separate JVM (A fork per test class). There is no parallelization when tests are running in this mode. You will see a new message at the end of the -report: "[INFO] Tests are skipped". It's harmless. While you need to make sure the sum of Tests run: in the Results : section of test reports matching the number of tests you specified because no error will be reported when a non-existent test case is specified.

1.7.3.4. Other test invocation permutations

Running

mvn test -P runSmallTests

will execute "small" tests only, using a single JVM.

Running

mvn test -P runMediumTests

will execute "medium" tests only, launching a new JVM for each test-class.

Running

mvn test -P runLargeTests

will execute "large" tests only, launching a new JVM for each test-class.

For convenience, you can run

mvn test -P runDevTests

to execute both small and medium tests, using a single JVM.

1.7.3.5. Running tests faster

By default, $ mvn test -P runAllTests runs 5 tests in parallel. It can be increased on a developer's machine. Allowing that you can have 2 tests in parallel per core, and you need about 2Gb of memory per test (at the extreme), if you have an 8 core, 24Gb box, you can have 16 tests in parallel. but the memory available limits it to 12 (24/2), To run all tests with 12 tests in parallell, do this: mvn test -P runAllTests -Dsurefire.secondPartThreadCount=12. To increase the speed, you can as well use a ramdisk. You will need 2Gb of memory to run all tests. You will also need to delete the files between two test run. The typical way to configure a ramdisk on Linux is:

$ sudo mkdir /ram2G
sudo mount -t tmpfs -o size=2048M tmpfs /ram2G

You can then use it to run all HBase tests with the command: mvn test -P runAllTests -Dsurefire.secondPartThreadCount=12 -Dtest.build.data.basedirectory=/ram2G

1.7.3.6. hbasetests.sh

It's also possible to use the script hbasetests.sh. This script runs the medium and large tests in parallel with two maven instances, and provides a single report. This script does not use the hbase version of surefire so no parallelization is being done other than the two maven instances the script sets up. It must be executed from the directory which contains the pom.xml.

For example running

./dev-support/hbasetests.sh

will execute small and medium tests. Running

./dev-support/hbasetests.sh runAllTests

will execute all tests. Running

./dev-support/hbasetests.sh replayFailed

will rerun the failed tests a second time, in a separate jvm and without parallelisation.

1.7.3.7. Test Resource Checker

A custom Maven SureFire plugin listener checks a number of resources before and after each HBase unit test runs and logs its findings at the end of the test output files which can be found in target/surefire-reports per Maven module (Tests write test reports named for the test class into this directory. Check the *-out.txt files). The resources counted are the number of threads, the number of file descriptors, etc. If the number has increased, it adds a LEAK? comment in the logs. As you can have an HBase instance running in the background, some threads can be deleted/created without any specific action in the test. However, if the test does not work as expected, or if the test should not impact these resources, it's worth checking these log lines ...hbase.ResourceChecker(157): before... and ...hbase.ResourceChecker(157): after.... For example: 2012-09-26 09:22:15,315 INFO [pool-1-thread-1] hbase.ResourceChecker(157): after: regionserver.TestColumnSeeking#testReseeking Thread=65 (was 65), OpenFileDescriptor=107 (was 107), MaxFileDescriptor=10240 (was 10240), ConnectionCount=1 (was 1)

1.7.4. Writing Tests

1.7.4.1. General rules

  • As much as possible, tests should be written as category small tests.
  • All tests must be written to support parallel execution on the same machine, hence they should not use shared resources as fixed ports or fixed file names.
  • Tests should not overlog. More than 100 lines/second makes the logs complex to read and use i/o that are hence not available for the other tests.
  • Tests can be written with HBaseTestingUtility. This class offers helper functions to create a temp directory and do the cleanup, or to start a cluster.

1.7.4.2. Categories and execution time

  • All tests must be categorized, if not they could be skipped.
  • All tests should be written to be as fast as possible.
  • Small category tests should last less than 15 seconds, and must not have any side effect.
  • Medium category tests should last less than 50 seconds.
  • Large category tests should last less than 3 minutes. This should ensure a good parallelization for people using it, and ease the analysis when the test fails.

1.7.4.3. Sleeps in tests

Whenever possible, tests should not use Thread.sleep, but rather waiting for the real event they need. This is faster and clearer for the reader. Tests should not do a Thread.sleep without testing an ending condition. This allows understanding what the test is waiting for. Moreover, the test will work whatever the machine performance is. Sleep should be minimal to be as fast as possible. Waiting for a variable should be done in a 40ms sleep loop. Waiting for a socket operation should be done in a 200 ms sleep loop.

1.7.4.4. Tests using a cluster

Tests using a HRegion do not have to start a cluster: A region can use the local file system. Start/stopping a cluster cost around 10 seconds. They should not be started per test method but per test class. Started cluster must be shutdown using HBaseTestingUtility#shutdownMiniCluster, which cleans the directories. As most as possible, tests should use the default settings for the cluster. When they don't, they should document it. This will allow to share the cluster later.

1.7.5. Integration Tests

HBase integration/system tests are tests that are beyond HBase unit tests. They are generally long-lasting, sizeable (the test can be asked to 1M rows or 1B rows), targetable (they can take configuration that will point them at the ready-made cluster they are to run against; integration tests do not include cluster start/stop code), and verifying success, integration tests rely on public APIs only; they do not attempt to examine server internals asserting success/fail. Integration tests are what you would run when you need to more elaborate proofing of a release candidate beyond what unit tests can do. They are not generally run on the Apache Continuous Integration build server, however, some sites opt to run integration tests as a part of their continuous testing on an actual cluster.

Integration tests currently live under the src/test directory in the hbase-it submodule and will match the regex: **/IntegrationTest*.java. All integration tests are also annotated with @Category(IntegrationTests.class).

Integration tests can be run in two modes: using a mini cluster, or against an actual distributed cluster. Maven failsafe is used to run the tests using the mini cluster. IntegrationTestsDriver class is used for executing the tests against a distributed cluster. Integration tests SHOULD NOT assume that they are running against a mini cluster, and SHOULD NOT use private API's to access cluster state. To interact with the distributed or mini cluster uniformly, IntegrationTestingUtility, and HBaseCluster classes, and public client API's can be used.

On a distributed cluster, integration tests that use ChaosMonkey or otherwise manipulate services thru cluster manager (e.g. restart regionservers) use SSH to do it. To run these, test process should be able to run commands on remote end, so ssh should be configured accordingly (for example, if HBase runs under hbase user in your cluster, you can set up passwordless ssh for that user and run the test also under it). To facilitate that, hbase.it.clustermanager.ssh.user, hbase.it.clustermanager.ssh.opts and hbase.it.clustermanager.ssh.cmd configuration settings can be used. "User" is the remote user that cluster manager should use to perform ssh commands. "Opts" contains additional options that are passed to SSH (for example, "-i /tmp/my-key"). Finally, if you have some custom environment setup, "cmd" is the override format for the entire tunnel (ssh) command. The default string is {/usr/bin/ssh %1$s %2$s%3$s%4$s "%5$s"} and is a good starting point. This is a standard Java format string with 5 arguments that is used to execute the remote command. The argument 1 (%1$s) is SSH options set the via opts setting or via environment variable, 2 is SSH user name, 3 is "@" if username is set or "" otherwise, 4 is the target host name, and 5 is the logical command to execute (that may include single quotes, so don't use them). For example, if you run the tests under non-hbase user and want to ssh as that user and change to hbase on remote machine, you can use {/usr/bin/ssh %1$s %2$s%3$s%4$s "su hbase - -c \"%5$s\""}. That way, to kill RS (for example) integration tests may run {/usr/bin/ssh some-hostname "su hbase - -c \"ps aux | ... | kill ...\""}. The command is logged in the test logs, so you can verify it is correct for your environment.

1.7.5.1. Running integration tests against mini cluster

HBase 0.92 added a verify maven target. Invoking it, for example by doing mvn verify, will run all the phases up to and including the verify phase via the maven failsafe plugin, running all the above mentioned HBase unit tests as well as tests that are in the HBase integration test group. After you have completed

mvn install -DskipTests

You can run just the integration tests by invoking:

cd hbase-it
mvn verify

If you just want to run the integration tests in top-level, you need to run two commands. First:

mvn failsafe:integration-test

This actually runs ALL the integration tests.

Note

This command will always output BUILD SUCCESS even if there are test failures.

At this point, you could grep the output by hand looking for failed tests. However, maven will do this for us; just use:

mvn failsafe:verify

The above command basically looks at all the test results (so don't remove the 'target' directory) for test failures and reports the results.

1.7.5.1.1. Running a subset of Integration tests

This is very similar to how you specify running a subset of unit tests (see above), but use the property it.test instead of test. To just run IntegrationTestClassXYZ.java, use:

mvn failsafe:integration-test -Dit.test=IntegrationTestClassXYZ

The next thing you might want to do is run groups of integration tests, say all integration tests that are named IntegrationTestClassX*.java:

mvn failsafe:integration-test -Dit.test=*ClassX*

This runs everything that is an integration test that matches *ClassX*. This means anything matching: "**/IntegrationTest*ClassX*". You can also run multiple groups of integration tests using comma-delimited lists (similar to unit tests). Using a list of matches still supports full regex matching for each of the groups.This would look something like:

mvn failsafe:integration-test -Dit.test=*ClassX*, *ClassY

1.7.5.2. Running integration tests against distributed cluster

If you have an already-setup HBase cluster, you can launch the integration tests by invoking the class IntegrationTestsDriver. You may have to run test-compile first. The configuration will be picked by the bin/hbase script.

mvn test-compile

Then launch the tests with:

bin/hbase [--config config_dir] org.apache.hadoop.hbase.IntegrationTestsDriver

Pass -h to get usage on this sweet tool. Running the IntegrationTestsDriver without any argument will launch tests found under hbase-it/src/test, having @Category(IntegrationTests.class) annotation, and a name starting with IntegrationTests. See the usage, by passing -h, to see how to filter test classes. You can pass a regex which is checked against the full class name; so, part of class name can be used. IntegrationTestsDriver uses Junit to run the tests. Currently there is no support for running integration tests against a distributed cluster using maven (see HBASE-6201).

The tests interact with the distributed cluster by using the methods in the DistributedHBaseCluster (implementing HBaseCluster) class, which in turn uses a pluggable ClusterManager. Concrete implementations provide actual functionality for carrying out deployment-specific and environment-dependent tasks (SSH, etc). The default ClusterManager is HBaseClusterManager, which uses SSH to remotely execute start/stop/kill/signal commands, and assumes some posix commands (ps, etc). Also assumes the user running the test has enough "power" to start/stop servers on the remote machines. By default, it picks up HBASE_SSH_OPTS, HBASE_HOME, HBASE_CONF_DIR from the env, and uses bin/hbase-daemon.sh to carry out the actions. Currently tarball deployments, deployments which uses hbase-daemons.sh, and Apache Ambari deployments are supported. /etc/init.d/ scripts are not supported for now, but it can be easily added. For other deployment options, a ClusterManager can be implemented and plugged in.

1.7.5.3. Destructive integration / system tests

In 0.96, a tool named ChaosMonkey has been introduced. It is modeled after the same-named tool by Netflix. Some of the tests use ChaosMonkey to simulate faults in the running cluster in the way of killing random servers, disconnecting servers, etc. ChaosMonkey can also be used as a stand-alone tool to run a (misbehaving) policy while you are running other tests.

ChaosMonkey defines Action's and Policy's. Actions are sequences of events. We have at least the following actions:

  • Restart active master (sleep 5 sec)
  • Restart random regionserver (sleep 5 sec)
  • Restart random regionserver (sleep 60 sec)
  • Restart META regionserver (sleep 5 sec)
  • Restart ROOT regionserver (sleep 5 sec)
  • Batch restart of 50% of regionservers (sleep 5 sec)
  • Rolling restart of 100% of regionservers (sleep 5 sec)

Policies on the other hand are responsible for executing the actions based on a strategy. The default policy is to execute a random action every minute based on predefined action weights. ChaosMonkey executes predefined named policies until it is stopped. More than one policy can be active at any time.

To run ChaosMonkey as a standalone tool deploy your HBase cluster as usual. ChaosMonkey uses the configuration from the bin/hbase script, thus no extra configuration needs to be done. You can invoke the ChaosMonkey by running:

bin/hbase org.apache.hadoop.hbase.util.ChaosMonkey

This will output smt like:

12/11/19 23:21:57 INFO util.ChaosMonkey: Using ChaosMonkey Policy: class org.apache.hadoop.hbase.util.ChaosMonkey$PeriodicRandomActionPolicy, period:60000
12/11/19 23:21:57 INFO util.ChaosMonkey: Sleeping for 26953 to add jitter
12/11/19 23:22:24 INFO util.ChaosMonkey: Performing action: Restart active master
12/11/19 23:22:24 INFO util.ChaosMonkey: Killing master:master.example.com,60000,1353367210440
12/11/19 23:22:24 INFO hbase.HBaseCluster: Aborting Master: master.example.com,60000,1353367210440
12/11/19 23:22:24 INFO hbase.ClusterManager: Executing remote command: ps aux | grep master | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL , hostname:master.example.com
12/11/19 23:22:25 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:
12/11/19 23:22:25 INFO hbase.HBaseCluster: Waiting service:master to stop: master.example.com,60000,1353367210440
12/11/19 23:22:25 INFO hbase.ClusterManager: Executing remote command: ps aux | grep master | grep -v grep | tr -s ' ' | cut -d ' ' -f2 , hostname:master.example.com
12/11/19 23:22:25 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:
12/11/19 23:22:25 INFO util.ChaosMonkey: Killed master server:master.example.com,60000,1353367210440
12/11/19 23:22:25 INFO util.ChaosMonkey: Sleeping for:5000
12/11/19 23:22:30 INFO util.ChaosMonkey: Starting master:master.example.com
12/11/19 23:22:30 INFO hbase.HBaseCluster: Starting Master on: master.example.com
12/11/19 23:22:30 INFO hbase.ClusterManager: Executing remote command: /homes/enis/code/hbase-0.94/bin/../bin/hbase-daemon.sh --config /homes/enis/code/hbase-0.94/bin/../conf start master , hostname:master.example.com
12/11/19 23:22:31 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:starting master, logging to /homes/enis/code/hbase-0.94/bin/../logs/hbase-enis-master-master.example.com.out
....
12/11/19 23:22:33 INFO util.ChaosMonkey: Started master: master.example.com,60000,1353367210440
12/11/19 23:22:33 INFO util.ChaosMonkey: Sleeping for:51321
12/11/19 23:23:24 INFO util.ChaosMonkey: Performing action: Restart random region server
12/11/19 23:23:24 INFO util.ChaosMonkey: Killing region server:rs3.example.com,60020,1353367027826
12/11/19 23:23:24 INFO hbase.HBaseCluster: Aborting RS: rs3.example.com,60020,1353367027826
12/11/19 23:23:24 INFO hbase.ClusterManager: Executing remote command: ps aux | grep regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL , hostname:rs3.example.com
12/11/19 23:23:25 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:
12/11/19 23:23:25 INFO hbase.HBaseCluster: Waiting service:regionserver to stop: rs3.example.com,60020,1353367027826
12/11/19 23:23:25 INFO hbase.ClusterManager: Executing remote command: ps aux | grep regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 , hostname:rs3.example.com
12/11/19 23:23:25 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:
12/11/19 23:23:25 INFO util.ChaosMonkey: Killed region server:rs3.example.com,60020,1353367027826. Reported num of rs:6
12/11/19 23:23:25 INFO util.ChaosMonkey: Sleeping for:60000
12/11/19 23:24:25 INFO util.ChaosMonkey: Starting region server:rs3.example.com
12/11/19 23:24:25 INFO hbase.HBaseCluster: Starting RS on: rs3.example.com
12/11/19 23:24:25 INFO hbase.ClusterManager: Executing remote command: /homes/enis/code/hbase-0.94/bin/../bin/hbase-daemon.sh --config /homes/enis/code/hbase-0.94/bin/../conf start regionserver , hostname:rs3.example.com
12/11/19 23:24:26 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:starting regionserver, logging to /homes/enis/code/hbase-0.94/bin/../logs/hbase-enis-regionserver-rs3.example.com.out

12/11/19 23:24:27 INFO util.ChaosMonkey: Started region server:rs3.example.com,60020,1353367027826. Reported num of rs:6

As you can see from the log, ChaosMonkey started the default PeriodicRandomActionPolicy, which is configured with all the available actions, and ran RestartActiveMaster and RestartRandomRs actions. ChaosMonkey tool, if run from command line, will keep on running until the process is killed.

1.8. Maven Build Commands

All commands executed from the local HBase project directory.

Note: use Maven 3 (Maven 2 may work but we suggest you use Maven 3).

1.8.1. Compile

mvn compile
          

1.8.2. Running all or individual Unit Tests

See the Section 1.7.3, “Running tests” section above in Section 1.7.2, “Unit Tests”

1.8.3. Building against various hadoop versions.

As of 0.96, Apache HBase supports building against Apache Hadoop versions: 1.0.3, 2.0.0-alpha and 3.0.0-SNAPSHOT. By default, we will build with Hadoop-1.0.3. To change the version to run with Hadoop-2.0.0-alpha, you would run:

mvn -Dhadoop.profile=2.0 ...

That is, designate build with hadoop.profile 2.0. Pass 2.0 for hadoop.profile to build against hadoop 2.0. Tests may not all pass as of this writing so you may need to pass -DskipTests unless you are inclined to fix the failing tests.

Similarly, for 3.0, you would just replace the profile value. Note that Hadoop-3.0.0-SNAPSHOT does not currently have a deployed maven artificat - you will need to build and install your own in your local maven repository if you want to run against this profile.

In earilier verions of Apache HBase, you can build against older versions of Apache Hadoop, notably, Hadoop 0.22.x and 0.23.x. If you are running, for example HBase-0.94 and wanted to build against Hadoop 0.23.x, you would run with:

mvn -Dhadoop.profile=22 ...

1.9. Getting Involved

Apache HBase gets better only when people contribute!

As Apache HBase is an Apache Software Foundation project, see ??? for more information about how the ASF functions.

1.9.1. Mailing Lists

Sign up for the dev-list and the user-list. See the mailing lists page. Posing questions - and helping to answer other people's questions - is encouraged! There are varying levels of experience on both lists so patience and politeness are encouraged (and please stay on topic.)

1.9.2. Jira

Check for existing issues in Jira. If it's either a new feature request, enhancement, or a bug, file a ticket.

1.9.2.1. Jira Priorities

The following is a guideline on setting Jira issue priorities:

  • Blocker: Should only be used if the issue WILL cause data loss or cluster instability reliably.
  • Critical: The issue described can cause data loss or cluster instability in some cases.
  • Major: Important but not tragic issues, like updates to the client API that will add a lot of much-needed functionality or significant bugs that need to be fixed but that don't cause data loss.
  • Minor: Useful enhancements and annoying but not damaging bugs.
  • Trivial: Useful enhancements but generally cosmetic.

1.9.2.2. Code Blocks in Jira Comments

A commonly used macro in Jira is {code}. If you do this in a Jira comment...

{code}
   code snippet
{code}

... Jira will format the code snippet like code, instead of a regular comment. It improves readability.

1.10. Developing

1.10.1. Codelines

Most development is done on TRUNK. However, there are branches for minor releases (e.g., 0.90.1, 0.90.2, and 0.90.3 are on the 0.90 branch).

If you have any questions on this just send an email to the dev dist-list.

1.10.2. Unit Tests

In HBase we use JUnit 4. If you need to run miniclusters of HDFS, ZooKeeper, HBase, or MapReduce testing, be sure to checkout the HBaseTestingUtility. Alex Baranau of Sematext describes how it can be used in HBase Case-Study: Using HBaseTestingUtility for Local Testing and Development (2010).

1.10.2.1. Mockito

Sometimes you don't need a full running server unit testing. For example, some methods can make do with a a org.apache.hadoop.hbase.Server instance or a org.apache.hadoop.hbase.master.MasterServices Interface reference rather than a full-blown org.apache.hadoop.hbase.master.HMaster. In these cases, you maybe able to get away with a mocked Server instance. For example:

              TODO...
              

1.10.3. Code Standards

See Section 1.2.1.1, “Code Formatting” and Section 1.11.5, “Common Patch Feedback”.

Also, please pay attention to the interface stability/audience classifications that you will see all over our code base. They look like this at the head of the class:

@InterfaceAudience.Public
@InterfaceStability.Stable

If the InterfaceAudience is Private, we can change the class (and we do not need to include a InterfaceStability mark). If a class is marked Public but its InterfaceStability is marked Unstable, we can change it. If it's marked Public/Evolving, we're allowed to change it but should try not to. If it's Public and Stable we can't change it without a deprecation path or with a really GREAT reason.

When you add new classes, mark them with the annotations above if publically accessible. If you are not cleared on how to mark your additions, ask up on the dev list.

This convention comes from our parent project Hadoop.

1.10.4. Invariants

We don't have many but what we have we list below. All are subject to challenge of course but until then, please hold to the rules of the road.

1.10.4.1. No permanent state in ZooKeeper

ZooKeeper state should transient (treat it like memory). If deleted, hbase should be able to recover and essentially be in the same state[2].

1.10.5. Running In-Situ

If you are developing Apache HBase, frequently it is useful to test your changes against a more-real cluster than what you find in unit tests. In this case, HBase can be run directly from the source in local-mode. All you need to do is run:

${HBASE_HOME}/bin/start-hbase.sh

This will spin up a full local-cluster, just as if you had packaged up HBase and installed it on your machine.

Keep in mind that you will need to have installed HBase into your local maven repository for the in-situ cluster to work properly. That is, you will need to run:

mvn clean install -DskipTests

to ensure that maven can find the correct classpath and dependencies. Generally, the above command is just a good thing to try running first, if maven is acting oddly.

1.10.6. Adding Metrics

After adding a new feature a developer might want to add metrics. HBase exposes metrics using the Hadoop Metrics 2 system, so adding a new metric involves exposing that metric to the hadoop system. Unfortunately the API of metrics2 changed from hadoop 1 to hadoop 2. In order to get around this a set of interfaces and implementations have to be loaded at runtime. To get an in-depth look at the reasoning and structure of these classes you can read the blog post located here. To add a metric to an existing MBean follow the short guide below:

1.10.6.1. Add Metric name and Function to Hadoop Compat Interface.

Inside of the source interface the corresponds to where the metrics are generated (eg MetricsMasterSource for things coming from HMaster) create new static strings for metric name and description. Then add a new method that will be called to add new reading.

1.10.6.2. Add the Implementation to Both Hadoop 1 and Hadoop 2 Compat modules.

Inside of the implementation of the source (eg. MetricsMasterSourceImpl in the above example) create a new histogram, counter, gauge, or stat in the init method. Then in the method that was added to the interface wire up the parameter passed in to the histogram.

Now add tests that make sure the data is correctly exported to the metrics 2 system. For this the MetricsAssertHelper is provided.

1.11. Submitting Patches

If you are new to submitting patches to open source or new to submitting patches to Apache, I'd suggest you start by reading the On Contributing Patches page from Apache Commons Project. Its a nice overview that applies equally to the Apache HBase Project.

1.11.1. Create Patch

See the aforementioned Apache Commons link for how to make patches against a checked out subversion repository. Patch files can also be easily generated from Eclipse, for example by selecting "Team -> Create Patch". Patches can also be created by git diff and svn diff.

Please submit one patch-file per Jira. For example, if multiple files are changed make sure the selected resource when generating the patch is a directory. Patch files can reflect changes in multiple files.

Generating patches using git:

$ git diff --no-prefix  > HBASE_XXXX.patch
              

Don't forget the 'no-prefix' option; and generate the diff from the root directory of project

Make sure you review Section 1.2.1.1, “Code Formatting” for code style.

1.11.2. Patch File Naming

The patch file should have the Apache HBase Jira ticket in the name. For example, if a patch was submitted for Foo.java, then a patch file called Foo_HBASE_XXXX.patch would be acceptable where XXXX is the Apache HBase Jira number.

If you generating from a branch, then including the target branch in the filename is advised, e.g., HBASE_XXXX-0.90.patch.

1.11.3. Unit Tests

Yes, please. Please try to include unit tests with every code patch (and especially new classes and large changes). Make sure unit tests pass locally before submitting the patch.

Also, see Section 1.10.2.1, “Mockito”.

If you are creating a new unit test class, notice how other unit test classes have classification/sizing annotations at the top and a static method on the end. Be sure to include these in any new unit test files you generate. See Section 1.7, “Tests” for more on how the annotations work.

1.11.4. Attach Patch to Jira

The patch should be attached to the associated Jira ticket "More Actions -> Attach Files". Make sure you click the ASF license inclusion, otherwise the patch can't be considered for inclusion.

Once attached to the ticket, click "Submit Patch" and the status of the ticket will change. Committers will review submitted patches for inclusion into the codebase. Please understand that not every patch may get committed, and that feedback will likely be provided on the patch. Fear not, though, because the Apache HBase community is helpful!

1.11.5. Common Patch Feedback

The following items are representative of common patch feedback. Your patch process will go faster if these are taken into account before submission.

See the Java coding standards for more information on coding conventions in Java.

1.11.5.1. Space Invaders

Rather than do this...

if ( foo.equals( bar ) ) {     // don't do this

... do this instead...

if (foo.equals(bar)) {

Also, rather than do this...

foo = barArray[ i ];     // don't do this

... do this instead...

foo = barArray[i];

1.11.5.2. Auto Generated Code

Auto-generated code in Eclipse often looks like this...

 public void readFields(DataInput arg0) throws IOException {    // don't do this
   foo = arg0.readUTF();                                       // don't do this

... do this instead ...

 public void readFields(DataInput di) throws IOException {
   foo = di.readUTF();

See the difference? 'arg0' is what Eclipse uses for arguments by default.

1.11.5.3. Long Lines

Keep lines less than 100 characters.

Bar bar = foo.veryLongMethodWithManyArguments(argument1, argument2, argument3, argument4, argument5, argument6, argument7, argument8, argument9);  // don't do this

... do something like this instead ...

Bar bar = foo.veryLongMethodWithManyArguments(
 argument1, argument2, argument3,argument4, argument5, argument6, argument7, argument8, argument9);

1.11.5.4. Trailing Spaces

This happens more than people would imagine.

Bar bar = foo.getBar();     <--- imagine there's an extra space(s) after the semicolon instead of a line break.

Make sure there's a line-break after the end of your code, and also avoid lines that have nothing but whitespace.

1.11.5.5. Implementing Writable

Applies pre-0.96 only

In 0.96, HBase moved to protobufs. The below section on Writables applies to 0.94.x and previous, not to 0.96 and beyond.

Every class returned by RegionServers must implement Writable. If you are creating a new class that needs to implement this interface, don't forget the default constructor.

1.11.5.6. Javadoc

This is also a very common feedback item. Don't forget Javadoc!

Javadoc warnings are checked during precommit. If the precommit tool gives you a '-1', please fix the javadoc issue. Your patch won't be committed if it adds such warnings.

1.11.5.7. Findbugs

Findbugs is used to detect common bugs pattern. As Javadoc, it is checked during the precommit build up on Apache's Jenkins, and as with Javadoc, please fix them. You can run findbugs locally with 'mvn findbugs:findbugs': it will generate the findbugs files locally. Sometimes, you may have to write code smarter than Findbugs. You can annotate your code to tell Findbugs you know what you're doing, by annotating your class with:

@edu.umd.cs.findbugs.annotations.SuppressWarnings(
                    value="HE_EQUALS_USE_HASHCODE",
                    justification="I know what I'm doing")

Note that we're using the apache licensed version of the annotations.

1.11.5.8. Javadoc - Useless Defaults

Don't just leave the @param arguments the way your IDE generated them. Don't do this...

  /**
   *
   * @param bar             <---- don't do this!!!!
   * @return                <---- or this!!!!
   */
  public Foo getFoo(Bar bar);

... either add something descriptive to the @param and @return lines, or just remove them. But the preference is to add something descriptive and useful.

1.11.5.9. One Thing At A Time, Folks

If you submit a patch for one thing, don't do auto-reformatting or unrelated reformatting of code on a completely different area of code.

Likewise, don't add unrelated cleanup or refactorings outside the scope of your Jira.

1.11.5.10. Ambigious Unit Tests

Make sure that you're clear about what you are testing in your unit tests and why.

1.11.6. Submitting a patch again

Sometimes committers ask for changes for a patch. After incorporating the suggested/requested changes, follow the following process to submit the patch again.

  • Do not delete the old patch file
  • version your new patch file using a simple scheme like this:
    HBASE-{jira number}-{version}.patch
    e.g: HBASE_XXXX-v2.patch
  • 'Cancel Patch' on JIRA.. bug status will change back to Open
  • Attach new patch file (e.g. HBASE_XXXX-v2.patch) using 'Files --> Attach'
  • Click on 'Submit Patch'. Now the bug status will say 'Patch Available'.
Committers will review the patch. Rinse and repeat as many times as needed :-)

1.11.7. Submitting incremental patches

At times you may want to break a big change into mulitple patches. Here is a sample work-flow using git

  • patch 1:
    • $ git diff --no-prefix > HBASE_XXXX-1.patch
  • patch 2:
    • create a new git branch
      $ git checkout -b my_branch
    • save your work $ git add file1 file2
      $ git commit -am 'saved after HBASE_XXXX-1.patch'
      now you have your own branch, that is different from remote master branch
    • make more changes...
    • create second patch
      $ git diff --no-prefix > HBASE_XXXX-2.patch

1.11.8. ReviewBoard

Larger patches should go through ReviewBoard.

For more information on how to use ReviewBoard, see the ReviewBoard documentation.

1.11.9. Committing Patches

Committers do this. See How To Commit in the Apache HBase wiki.

Commiters will also resolve the Jira, typically after the patch passes a build.

1.11.9.1. Committers are responsible for making sure commits do not break the build or tests

If a committer commits a patch it is their responsibility to make sure it passes the test suite. It is helpful if contributors keep an eye out that their patch does not break the hbase build and/or tests but ultimately, a contributor cannot be expected to be up on the particular vagaries and interconnections that occur in a project like hbase. A committer should.



[1] Before 0.95.0, site and reference guide were at src/docbkx and src/site respectively

[2] There are currently a few exceptions that we need to fix around whether a table is enabled or disabled

comments powered by Disqus