Table of Contents
The Apache HBase project welcomes contributions to all aspects of the project, including the documentation. In HBase, documentation includes the following areas, and probably some others:
The HBase Reference Guide (this book)
The HBase websitee
The HBase Wiki
Command-line utility output and help text
Web UI strings, explicit help text, context-sensitive strings, and others
Comments in source files, configuration files, and others
Localization of any of the above into target languages other than English
No matter which area you want to help out with, the first step is almost always to download (typically by cloning the Git repository) and familiarize yourself with the HBase source code. The only exception in the list above is the HBase Wiki, which is edited online. For information on downloading and building the source, see ???.
The HBase Wiki is not well-maintained and much of its content has been moved into the HBase Reference Guide (this guide). However, some pages on the Wiki are well maintained, and it would be great to have some volunteers willing to help out with the Wiki. To request access to the Wiki, register a new account at https://wiki.apache.org/hadoop/Hbase?action=newaccount. Contact one of the HBase committers, who can either give you access or refer you to someone who can.
If you spot an error in a string in a UI, utility, script, log message, or elsewhere,
or you think something could be made more clear, or you think text needs to be added
where it doesn't currently exist, the first step is to file a JIRA. Be sure to set the
Documentation in addition any other involved components.
Most components have one or more default owners, who monitor new issues which come into
those queues. Regardless of whether you feel able to fix the bug, you should still file
bugs where you see them.
If you want to try your hand at fixing your newly-filed bug, assign it to yourself. You will need to clone the HBase Git repository to your local system and work on the issue there. When you have developed a potential fix, submit it for review. If it addresses the issue and is seen as an improvement, one of the HBase committers will commit it to one or more branches, as appropriate.
Procedure A.1. Suggested Work flow for Submitting Patches
This procedure goes into more detail than Git pros will need, but is included in this appendix so that people unfamiliar with Git can feel confident contributing to HBase while they learn.
If you have not already done so, clone the Git repository locally. You only need to do this once.
Fairly often, pull remote changes into your local repository by using the
git pull command, while your master branch is checked
For each issue you work on, create a new branch. One convention that works well for naming the branches is to name a given branch the same as the JIRA it relates to:
$ git checkout -b HBASE-123456
Make your suggested changes on your branch, committing your changes to your local repository often. If you need to switch to working on a different issue, remember to check out the appropriate branch.
When you are ready to submit your patch, first be sure that HBase builds cleanly and behaves as expected in your modified branch. If you have made documentation changes, be sure the documentation and website builds.
Before you use the
site target the very first time, be
sure you have built HBase at least once, in order to fetch all the Maven
dependencies you need.
$ mvn clean install -DskipTests # Builds HBase
$ mvn clean site -DskipTests # Builds the website and documentation
If any errors occur, address them.
If it takes you several days or weeks to implement your fix, or you know that the area of the code you are working in has had a lot of changes lately, make sure you rebase your branch against the remote master and take care of any conflicts before submitting your patch.
$ git checkout HBASE-123456 $ git rebase origin/master
Generate your patch against the remote master. Run the following command from
the top level of your git repository (usually called
$ git diff --no-prefix origin/master > HBASE-123456.patch
The name of the patch should contain the JIRA ID. Look over the patch file to be sure that you did not change any additional files by accident and that there are no other surprises. When you are satisfied, attach the patch to the JIRA and click thebutton. A reviewer will review your patch. If you need to submit a new version of the patch, leave the old one on the JIRA and add a version number to the name of the new patch.
After a change has been committed, there is no need to keep your local branch around. Instead you should run git pull to get the new change into your master branch.
The source for the HBase website is in the HBase source, in the
src/main/site/ directory. Within this directory, source for the
individual pages is in the
xdocs/ directory, and images referenced
in those pages are in the
images/ directory. This directory also
stores images used in the HBase Reference Guide.
The website's pages are written in an HTML-like XML dialect called xdoc, which has a reference guide at http://maven.apache.org/archives/maven-1.x/plugins/xdoc/reference/xdocs.html. You can edit these files in a plain-text editor, an IDE, or an XML editor such as XML Mind XML Editor (XXE) or Oxygen XML Author.
To preview your changes, build the website using the mvn clean site
-DskipTests command. The HTML output resides in the
target/site/ directory. When you are satisfied with your
changes, follow the procedure in Procedure A.1, “Suggested Work flow for Submitting Patches” to submit
The source for the HBase Reference Guide is in the HBase source, in the
src/main/docbkx/ directory. It is written in Docbook XML. Docbook can be
intimidating, but you can typically follow the formatting of the surrounding file to get
an idea of the mark-up. You can edit Docbook XML files using a plain-text editor, an
XML-aware IDE, or a specialized XML editor.
Docbook's syntax can be picky. Before submitting a patch, be sure to build the output
locally using the mvn site command. If you do not get any build
errors, that means that the XML is well-formed, which means that each opening tag is
balanced by a closing tag. Well-formedness is not exactly the same as validity. Check
the output in
target/docbkx/ for any surprises before submitting a
Some parts of the HBase Reference Guide, most notably ???,
are generated automatically, so that this area of the documentation stays in sync with
the code. This is done by means of an XSLT transform, which you can examine in the
file into a Docbook output which can be included in the Reference Guide. Sometimes, it
is necessary to add configuration parameters or modify their descriptions. Make the
modifications to the source file, and they will be included in the Reference Guide when
it is rebuilt.
It is possible that other types of content can and will be automatically generated from HBase source files in the future.
You can examine the
site target in the Maven
pom.xml file included at the top level of the HBase source for
details on the process of building the website and documentation. The Reference Guide is
built twice, once as a single-page output and once with one HTML file per chapter. The
single-page output is located in
target/docbkx/book.html, while the
multi-page output's index page is at
Each of these outputs has its own
css/ directories, which are created at build time.
You can include images in the HBase Reference Guide. For accessibility reasons, it is recommended that you use a <figure> Docbook element for an image. This allows screen readers to navigate to the image and also provides alternative text for the image. The following is an example of a <figure> element.
<figure> <title>HFile Version 1</title> <mediaobject> <imageobject> <imagedata fileref="timeline_consistency.png" /> </imageobject> <textobject> <phrase>HFile Version 1</phrase> </textobject> </mediaobject> </figure>
The <textobject> can contain a few sentences describing the image, rather than simply reiterating the title. You can optionally specify alignment and size options in the <imagedata> element.
When doing a local build, save the image to the
src/main/site/resources/images/ directory. In the
<imagedata> element, refer to the image as above, with no directory component. The
image will be copied to the appropriate target location during the build of the
When you submit a patch which includes adding an image to the HBase Reference Guide, attach the image to the JIRA. If the committer asks where the image should be committed, it should go into the above directory.
If you want to add a new chapter to the HBase Reference Guide, the easiest way is to
copy an existing chapter file, rename it, and change the ID and title elements near the
top of the file. Delete the existing content and create the new content. Then open the
book.xml file, which is the main file for the HBase Reference
Guide, and use an <xi:include> element to include your new chapter in the
appropriate location. Be sure to add your new file to your Git repository before
creating your patch. Note that the
book.xml file currently contains
many chapters. You can only include a chapter at the same nesting levels as the other
chapters in the file. When in doubt, check to see how other files have been
The following Docbook issues come up often. Some of these are preferences, but others can create mysterious build errors or other problems.
What can go where?
There is often confusion about which child elements are valid in a given context. When in doubt, Docbook: The Definitive Guide is the best resource. It has an appendix which is indexed by element and contains all valid child and parent elements of any given element. If you edit Docbook often, a schema-aware XML editor makes things easier.
Paragraphs and Admonitions
It is a common pattern, and it is technically valid, to put an admonition such as a <note> inside a <para> element. Because admonitions render as block-level elements (they take the whole width of the page), it is better to mark them up as siblings to the paragraphs around them, like this:
<para>This is the paragraph.</para> <note> <para>This is an admonition which occurs after the paragraph.</para> </note>
Wrap textual <listitem> and <entry> contents in <para> elements.
Because the contents of a <listitem> (an element in an itemized, ordered, or variable list) or an <entry> (a cell in a table) can consist of things other than plain text, they need to be wrapped in some element. If they are plain text, they need to be inclosed in <para> tags. This is tedious but necessary for validity.
<itemizedlist> <listitem> <para>This is a paragraph.</para> </listitem> <listitem> <screen>This is screen output.</screen> </listitem> </itemizedlist>
When to use <command>, <code>, <programlisting>, <screen>
The first two are in-line tags, which can occur within the flow of paragraphs or titles. The second two are block elements.
Use <command> to mention a command such as hbase shell in the flow of a sentence. Use <code> for other inline text referring to code. Incidentally, use <literal> to specify literal strings that should be typed or entered exactly as shown. Within a <screen> listing, it can be helpful to use the <userinput> and <computeroutput> elements to mark up the text further.
Use <screen> to display input and output as the user would see it on the screen, in a log file, etc. Use <programlisting> only for blocks of code that occur within a file, such as Java or XML code, or a Bash shell script.
How to escape XML elements so that they show up as XML
For one-off instances or short in-line mentions, use the < and > encoded characters. For longer mentions, or blocks of code, enclose it with <![CDATA]>, which is much easier to maintain and parse in the source files..
Tips and tricks for making screen output look good
Text within <screen> and <programlisting> elements is shown exactly as it appears in the source, including indentation, tabs, and line wrap.
Isolate Changes for Easy Diff Review.
Be careful with pretty-printing or re-formatting an entire XML file, even if the formatting has degraded over time. If you need to reformat a file, do that in a separate JIRA where you do not change any content. Be careful because some XML editors do a bulk-reformat when you open a new file, especially if you use GUI mode in the editor.
The HBase Reference Guide uses the XSLT Syntax Highlighting Maven module for syntax highlighting.
To enable syntax highlighting for a given <programlisting> or
<screen> (or possibly other elements), add the attribute
<programlisting language="xml"> <foo>bar</foo> <bar>foo</bar> </programlisting>
Several syntax types are supported. The most interesting ones for the
HBase Reference Guide are