Appendix A. Contributing to Documentation

Table of Contents

A.1. Getting Access to the Wiki
A.2. Contributing to Documentation or Other Strings
A.3. Editing the HBase Website
A.4. Editing the HBase Reference Guide
A.5. Auto-Generated Content
A.6. Multi-Page and Single-Page Output
A.7. Images in the HBase Reference Guide
A.8. Adding a New Chapter to the HBase Reference Guide
A.9. Docbook Common Issues

The Apache HBase project welcomes contributions to all aspects of the project, including the documentation. In HBase, documentation includes the following areas, and probably some others:

No matter which area you want to help out with, the first step is almost always to download (typically by cloning the Git repository) and familiarize yourself with the HBase source code. The only exception in the list above is the HBase Wiki, which is edited online. For information on downloading and building the source, see ???.

A.1. Getting Access to the Wiki

The HBase Wiki is not well-maintained and much of its content has been moved into the HBase Reference Guide (this guide). However, some pages on the Wiki are well maintained, and it would be great to have some volunteers willing to help out with the Wiki. To request access to the Wiki, register a new account at https://wiki.apache.org/hadoop/Hbase?action=newaccount. Contact one of the HBase committers, who can either give you access or refer you to someone who can.

A.2. Contributing to Documentation or Other Strings

If you spot an error in a string in a UI, utility, script, log message, or elsewhere, or you think something could be made more clear, or you think text needs to be added where it doesn't currently exist, the first step is to file a JIRA. Be sure to set the component to Documentation in addition any other involved components. Most components have one or more default owners, who monitor new issues which come into those queues. Regardless of whether you feel able to fix the bug, you should still file bugs where you see them.

If you want to try your hand at fixing your newly-filed bug, assign it to yourself. You will need to clone the HBase Git repository to your local system and work on the issue there. When you have developed a potential fix, submit it for review. If it addresses the issue and is seen as an improvement, one of the HBase committers will commit it to one or more branches, as appropriate.

Procedure A.1. Suggested Work flow for Submitting Patches

This procedure goes into more detail than Git pros will need, but is included in this appendix so that people unfamiliar with Git can feel confident contributing to HBase while they learn.

  1. If you have not already done so, clone the Git repository locally. You only need to do this once.

  2. Fairly often, pull remote changes into your local repository by using the git pull command, while your master branch is checked out.

  3. For each issue you work on, create a new branch. One convention that works well for naming the branches is to name a given branch the same as the JIRA it relates to:

    $ git checkout -b HBASE-123456
  4. Make your suggested changes on your branch, committing your changes to your local repository often. If you need to switch to working on a different issue, remember to check out the appropriate branch.

  5. When you are ready to submit your patch, first be sure that HBase builds cleanly and behaves as expected in your modified branch. If you have made documentation changes, be sure the documentation and website builds.

    Note

    Before you use the site target the very first time, be sure you have built HBase at least once, in order to fetch all the Maven dependencies you need.

    $ mvn clean install -DskipTests               # Builds HBase
    $ mvn clean site -DskipTests                  # Builds the website and documentation

    If any errors occur, address them.

  6. If it takes you several days or weeks to implement your fix, or you know that the area of the code you are working in has had a lot of changes lately, make sure you rebase your branch against the remote master and take care of any conflicts before submitting your patch.

    $ git checkout HBASE-123456
    $ git rebase origin/master                
                    
  7. Generate your patch against the remote master. Run the following command from the top level of your git repository (usually called hbase):

    $ git diff --no-prefix origin/master > HBASE-123456.patch

    The name of the patch should contain the JIRA ID. Look over the patch file to be sure that you did not change any additional files by accident and that there are no other surprises. When you are satisfied, attach the patch to the JIRA and click the Patch Available button. A reviewer will review your patch. If you need to submit a new version of the patch, leave the old one on the JIRA and add a version number to the name of the new patch.

  8. After a change has been committed, there is no need to keep your local branch around. Instead you should run git pull to get the new change into your master branch.

A.3. Editing the HBase Website

The source for the HBase website is in the HBase source, in the src/main/site/ directory. Within this directory, source for the individual pages is in the xdocs/ directory, and images referenced in those pages are in the images/ directory. This directory also stores images used in the HBase Reference Guide.

The website's pages are written in an HTML-like XML dialect called xdoc, which has a reference guide at http://maven.apache.org/archives/maven-1.x/plugins/xdoc/reference/xdocs.html. You can edit these files in a plain-text editor, an IDE, or an XML editor such as XML Mind XML Editor (XXE) or Oxygen XML Author.

To preview your changes, build the website using the mvn clean site -DskipTests command. The HTML output resides in the target/site/ directory. When you are satisfied with your changes, follow the procedure in Procedure A.1, “Suggested Work flow for Submitting Patches” to submit your patch.

A.4. Editing the HBase Reference Guide

The source for the HBase Reference Guide is in the HBase source, in the src/main/docbkx/ directory. It is written in Docbook XML. Docbook can be intimidating, but you can typically follow the formatting of the surrounding file to get an idea of the mark-up. You can edit Docbook XML files using a plain-text editor, an XML-aware IDE, or a specialized XML editor.

Docbook's syntax can be picky. Before submitting a patch, be sure to build the output locally using the mvn site command. If you do not get any build errors, that means that the XML is well-formed, which means that each opening tag is balanced by a closing tag. Well-formedness is not exactly the same as validity. Check the output in target/docbkx/ for any surprises before submitting a patch.

A.5. Auto-Generated Content

Some parts of the HBase Reference Guide, most notably ???, are generated automatically, so that this area of the documentation stays in sync with the code. This is done by means of an XSLT transform, which you can examine in the source at src/main/xslt/configuration_to_docbook_section.xsl. This transforms the hbase-common/src/main/resources/hbase-default.xml file into a Docbook output which can be included in the Reference Guide. Sometimes, it is necessary to add configuration parameters or modify their descriptions. Make the modifications to the source file, and they will be included in the Reference Guide when it is rebuilt.

It is possible that other types of content can and will be automatically generated from HBase source files in the future.

A.6. Multi-Page and Single-Page Output

You can examine the site target in the Maven pom.xml file included at the top level of the HBase source for details on the process of building the website and documentation. The Reference Guide is built twice, once as a single-page output and once with one HTML file per chapter. The single-page output is located in target/docbkx/book.html, while the multi-page output's index page is at target/docbkx/book/book.html. Each of these outputs has its own images/ and css/ directories, which are created at build time.

A.7. Images in the HBase Reference Guide

You can include images in the HBase Reference Guide. For accessibility reasons, it is recommended that you use a <figure> Docbook element for an image. This allows screen readers to navigate to the image and also provides alternative text for the image. The following is an example of a <figure> element.

<figure>
  <title>HFile Version 1</title>
  <mediaobject>
    <imageobject>
      <imagedata fileref="timeline_consistency.png" />
    </imageobject>
    <textobject>
      <phrase>HFile Version 1</phrase>
    </textobject>
  </mediaobject>
</figure>
        

The <textobject> can contain a few sentences describing the image, rather than simply reiterating the title. You can optionally specify alignment and size options in the <imagedata> element.

When doing a local build, save the image to the src/main/site/resources/images/ directory. In the <imagedata> element, refer to the image as above, with no directory component. The image will be copied to the appropriate target location during the build of the output.

When you submit a patch which includes adding an image to the HBase Reference Guide, attach the image to the JIRA. If the committer asks where the image should be committed, it should go into the above directory.

A.8. Adding a New Chapter to the HBase Reference Guide

If you want to add a new chapter to the HBase Reference Guide, the easiest way is to copy an existing chapter file, rename it, and change the ID and title elements near the top of the file. Delete the existing content and create the new content. Then open the book.xml file, which is the main file for the HBase Reference Guide, and use an <xi:include> element to include your new chapter in the appropriate location. Be sure to add your new file to your Git repository before creating your patch. Note that the book.xml file currently contains many chapters. You can only include a chapter at the same nesting levels as the other chapters in the file. When in doubt, check to see how other files have been included.

A.9. Docbook Common Issues

The following Docbook issues come up often. Some of these are preferences, but others can create mysterious build errors or other problems.

A.9.1. What can go where?
A.9.2. Paragraphs and Admonitions
A.9.3. Wrap textual <listitem> and <entry> contents in <para> elements.
A.9.4. When to use <command>, <code>, <programlisting>, <screen>
A.9.5. How to escape XML elements so that they show up as XML
A.9.6. Tips and tricks for making screen output look good
A.9.7. Isolate Changes for Easy Diff Review.
A.9.8. Syntax Highlighting

A.9.1.

What can go where?

There is often confusion about which child elements are valid in a given context. When in doubt, Docbook: The Definitive Guide is the best resource. It has an appendix which is indexed by element and contains all valid child and parent elements of any given element. If you edit Docbook often, a schema-aware XML editor makes things easier.

A.9.2.

Paragraphs and Admonitions

It is a common pattern, and it is technically valid, to put an admonition such as a <note> inside a <para> element. Because admonitions render as block-level elements (they take the whole width of the page), it is better to mark them up as siblings to the paragraphs around them, like this:

<para>This is the paragraph.</para>
<note>
    <para>This is an admonition which occurs after the paragraph.</para>
</note>

A.9.3.

Wrap textual <listitem> and <entry> contents in <para> elements.

Because the contents of a <listitem> (an element in an itemized, ordered, or variable list) or an <entry> (a cell in a table) can consist of things other than plain text, they need to be wrapped in some element. If they are plain text, they need to be inclosed in <para> tags. This is tedious but necessary for validity.

<itemizedlist>
    <listitem>
        <para>This is a paragraph.</para>
    </listitem>
    <listitem>
        <screen>This is screen output.</screen>
    </listitem>
</itemizedlist>

A.9.4.

When to use <command>, <code>, <programlisting>, <screen>

The first two are in-line tags, which can occur within the flow of paragraphs or titles. The second two are block elements.

Use <command> to mention a command such as hbase shell in the flow of a sentence. Use <code> for other inline text referring to code. Incidentally, use <literal> to specify literal strings that should be typed or entered exactly as shown. Within a <screen> listing, it can be helpful to use the <userinput> and <computeroutput> elements to mark up the text further.

Use <screen> to display input and output as the user would see it on the screen, in a log file, etc. Use <programlisting> only for blocks of code that occur within a file, such as Java or XML code, or a Bash shell script.

A.9.5.

How to escape XML elements so that they show up as XML

For one-off instances or short in-line mentions, use the &lt; and &gt; encoded characters. For longer mentions, or blocks of code, enclose it with &lt;![CDATA[]]&gt;, which is much easier to maintain and parse in the source files..

A.9.6.

Tips and tricks for making screen output look good

Text within <screen> and <programlisting> elements is shown exactly as it appears in the source, including indentation, tabs, and line wrap.

  • Indent the starting and closing XML elements, but do not indent the content. Also, to avoid having an extra blank line at the beginning of the programlisting output, do not put the CDATA element on its own line. For example:

            <programlisting>
    case $1 in
      --cleanZk|--cleanHdfs|--cleanAll)
        matches="yes" ;;
      *) ;;
    esac
            </programlisting>
  • After pasting code into a programlisting, fix the indentation manually, using two spaces per desired indentation. For screen output, be sure to include line breaks so that the text is no longer than 100 characters.

A.9.7.

Isolate Changes for Easy Diff Review.

Be careful with pretty-printing or re-formatting an entire XML file, even if the formatting has degraded over time. If you need to reformat a file, do that in a separate JIRA where you do not change any content. Be careful because some XML editors do a bulk-reformat when you open a new file, especially if you use GUI mode in the editor.

A.9.8.

Syntax Highlighting

The HBase Reference Guide uses the XSLT Syntax Highlighting Maven module for syntax highlighting. To enable syntax highlighting for a given <programlisting> or <screen> (or possibly other elements), add the attribute language=LANGUAGE_OF_CHOICE to the element, as in the following example:

<programlisting language="xml">
    <foo>bar</foo>
    <bar>foo</bar>
</programlisting>

Several syntax types are supported. The most interesting ones for the HBase Reference Guide are java, xml, sql, and bourne (for BASH shell output or Linux command-line examples).

comments powered by Disqus