Chapter 1. Secure Apache HBase

Table of Contents

1.1. Secure Client Access to Apache HBase
1.1.1. Prerequisites
1.1.2. Server-side Configuration for Secure Operation
1.1.3. Client-side Configuration for Secure Operation
1.1.4. Client-side Configuration for Secure Operation - Thrift Gateway
1.1.5. Configure the Thrift Gateway to Authenticate on Behalf of the Client
1.1.6. Client-side Configuration for Secure Operation - REST Gateway
1.1.7. REST Gateway Impersonation Configuration
1.2. Simple User Access to Apache HBase
1.2.1. Simple Versus Secure Access
1.3. Tags
1.4. Access Control
1.4.1. Prerequisites
1.4.2. Overview
1.4.3. Access Control Matrix
1.4.4. Server-side Configuration for Access Control
1.4.5. Cell level Access Control using Tags
1.4.6. Shell Enhancements for Access Control
1.5. Secure Bulk Load
1.6. Visibility Labels
1.6.1. Visibility Label Administration
1.6.2. Server Side Configuration
1.7. Transparent Server Side Encryption
1.7.1. Configuration
1.7.2. Setting Encryption on a CF
1.7.3. Data Key Rotation
1.7.4. Master Key Rotation

1.1. Secure Client Access to Apache HBase

Newer releases of Apache HBase (>= 0.92) support optional SASL authentication of clients[1].

This describes how to set up Apache HBase and clients for connection to secure HBase resources.

1.1.1. Prerequisites

Hadoop Authentication Configuration

To run HBase RPC with strong authentication, you must set hbase.security.authentication to true. In this case, you must also set hadoop.security.authentication to true. Otherwise, you would be using strong authentication for HBase but not for the underlying HDFS, which would cancel out any benefit.

Kerberos KDC

You need to have a working Kerberos KDC.

A HBase configured for secure client access is expected to be running on top of a secured HDFS cluster. HBase must be able to authenticate to HDFS services. HBase needs Kerberos credentials to interact with the Kerberos-enabled HDFS daemons. Authenticating a service should be done using a keytab file. The procedure for creating keytabs for HBase service is the same as for creating keytabs for Hadoop. Those steps are omitted here. Copy the resulting keytab files to wherever HBase Master and RegionServer processes are deployed and make them readable only to the user account under which the HBase daemons will run.

A Kerberos principal has three parts, with the form username/fully.qualified.domain.name@YOUR-REALM.COM. We recommend using hbase as the username portion.

The following is an example of the configuration properties for Kerberos operation that must be added to the hbase-site.xml file on every server machine in the cluster. Required for even the most basic interactions with a secure Hadoop configuration, independent of HBase security.

<property>
  <name>hbase.regionserver.kerberos.principal</name>
  <value>hbase/_HOST@YOUR-REALM.COM</value>
</property>
<property>
  <name>hbase.regionserver.keytab.file</name>
  <value>/etc/hbase/conf/keytab.krb5</value>
</property>
<property>
  <name>hbase.master.kerberos.principal</name>
  <value>hbase/_HOST@YOUR-REALM.COM</value>
</property>
<property>
  <name>hbase.master.keytab.file</name>
  <value>/etc/hbase/conf/keytab.krb5</value>
</property>
    

Each HBase client user should also be given a Kerberos principal. This principal should have a password assigned to it (as opposed to a keytab file). The client principal's maxrenewlife should be set so that it can be renewed enough times for the HBase client process to complete. For example, if a user runs a long-running HBase client process that takes at most 3 days, we might create this user's principal within kadmin with: addprinc -maxrenewlife 3days

Long running daemons with indefinite lifetimes that require client access to HBase can instead be configured to log in from a keytab. For each host running such daemons, create a keytab with kadmin or kadmin.local. The procedure for creating keytabs for HBase service is the same as for creating keytabs for Hadoop. Those steps are omitted here. Copy the resulting keytab files to where the client daemon will execute and make them readable only to the user account under which the daemon will run.

1.1.2. Server-side Configuration for Secure Operation

First, refer to Section 1.1.1, “Prerequisites” and ensure that your underlying HDFS configuration is secure.

Add the following to the hbase-site.xml file on every server machine in the cluster:

<property>
  <name>hbase.security.authentication</name>
  <value>kerberos</value>
</property>
<property>
  <name>hbase.security.authorization</name>
  <value>true</value>
</property>
<property>
<name>hbase.coprocessor.region.classes</name>
  <value>org.apache.hadoop.hbase.security.token.TokenProvider</value>
</property>
    

A full shutdown and restart of HBase service is required when deploying these configuration changes.

1.1.3. Client-side Configuration for Secure Operation

First, refer to Section 1.1.1, “Prerequisites” and ensure that your underlying HDFS configuration is secure.

Add the following to the hbase-site.xml file on every client:

<property>
  <name>hbase.security.authentication</name>
  <value>kerberos</value>
</property>
    

The client environment must be logged in to Kerberos from KDC or keytab via the kinit command before communication with the HBase cluster will be possible.

Be advised that if the hbase.security.authentication in the client- and server-side site files do not match, the client will not be able to communicate with the cluster.

Once HBase is configured for secure RPC it is possible to optionally configure encrypted communication. To do so, add the following to the hbase-site.xml file on every client:

<property>
  <name>hbase.rpc.protection</name>
  <value>privacy</value>
</property>
    

This configuration property can also be set on a per connection basis. Set it in the Configuration supplied to HTable:

Configuration conf = HBaseConfiguration.create();
conf.set("hbase.rpc.protection", "privacy");
HTable table = new HTable(conf, tablename);
    

Expect a ~10% performance penalty for encrypted communication.

1.1.4. Client-side Configuration for Secure Operation - Thrift Gateway

Add the following to the hbase-site.xml file for every Thrift gateway:

<property>
  <name>hbase.thrift.keytab.file</name>
  <value>/etc/hbase/conf/hbase.keytab</value>
</property>
<property>
  <name>hbase.thrift.kerberos.principal</name>
  <value>$USER/_HOST@HADOOP.LOCALDOMAIN</value>
  <!-- TODO: This may need to be  HTTP/_HOST@<REALM> and _HOST may not work.
   You may have  to put the concrete full hostname.
   -->
</property>
    

Substitute the appropriate credential and keytab for $USER and $KEYTAB respectively.

In order to use the Thrift API principal to interact with HBase, it is also necessary to add the hbase.thrift.kerberos.principal to the _acl_ table. For example, to give the Thrift API principal, thrift_server, administrative access, a command such as this one will suffice:

grant 'thrift_server', 'RWCA'
    

For more information about ACLs, please see the Access Control section

The Thrift gateway will authenticate with HBase using the supplied credential. No authentication will be performed by the Thrift gateway itself. All client access via the Thrift gateway will use the Thrift gateway's credential and have its privilege.

1.1.5. Configure the Thrift Gateway to Authenticate on Behalf of the Client

Section 1.1.4, “Client-side Configuration for Secure Operation - Thrift Gateway” describes how to authenticate a Thrift client to HBase using a fixed user. As an alternative, you can configure the Thrift gateway to authenticate to HBase on the client's behalf, and to access HBase using a proxy user. This was implemented in HBASE-11349 for Thrift 1, and HBASE-11474 for Thrift 2.

Limitations with Thrift Framed Transport

If you use framed transport, you cannot yet take advantage of this feature, because SASL does not work with Thrift framed transport at this time.

To enable it, do the following.

  1. Be sure Thrift is running in secure mode, by following the procedure described in Section 1.1.4, “Client-side Configuration for Secure Operation - Thrift Gateway”.

  2. Be sure that HBase is configured to allow proxy users, as described in Section 1.1.7, “REST Gateway Impersonation Configuration”.

  3. In hbase-site.xml for each cluster node running a Thrift gateway, set the property hbase.thrift.security.qop to one of the following three values:

    • auth-conf - authentication, integrity, and confidentiality checking

    • auth-int - authentication and integrity checking

    • auth - authentication checking only

  4. Restart the Thrift gateway processes for the changes to take effect. If a node is running Thrift, the output of the jps command will list a ThriftServer process. To stop Thrift on a node, run the command bin/hbase-daemon.sh stop thrift. To start Thrift on a node, run the command bin/hbase-daemon.sh start thrift.

1.1.6. Client-side Configuration for Secure Operation - REST Gateway

Add the following to the hbase-site.xml file for every REST gateway:

<property>
  <name>hbase.rest.keytab.file</name>
  <value>$KEYTAB</value>
</property>
<property>
  <name>hbase.rest.kerberos.principal</name>
  <value>$USER/_HOST@HADOOP.LOCALDOMAIN</value>
</property>
    

Substitute the appropriate credential and keytab for $USER and $KEYTAB respectively.

The REST gateway will authenticate with HBase using the supplied credential. No authentication will be performed by the REST gateway itself. All client access via the REST gateway will use the REST gateway's credential and have its privilege.

In order to use the REST API principal to interact with HBase, it is also necessary to add the hbase.rest.kerberos.principal to the _acl_ table. For example, to give the REST API principal, rest_server, administrative access, a command such as this one will suffice:

grant 'rest_server', 'RWCA'
    

For more information about ACLs, please see the Access Control section

It should be possible for clients to authenticate with the HBase cluster through the REST gateway in a pass-through manner via SPEGNO HTTP authentication. This is future work.

1.1.7. REST Gateway Impersonation Configuration

By default, the REST gateway doesn't support impersonation. It accesses the HBase on behalf of clients as the user configured as in the previous section. To the HBase server, all requests are from the REST gateway user. The actual users are unknown. You can turn on the impersonation support. With impersonation, the REST gateway user is a proxy user. The HBase server knows the acutal/real user of each request. So it can apply proper authorizations.

To turn on REST gateway impersonation, we need to configure HBase servers (masters and region servers) to allow proxy users; configure REST gateway to enable impersonation.

To allow proxy users, add the following to the hbase-site.xml file for every HBase server:

<property>
  <name>hadoop.security.authorization</name>
  <value>true</value>
</property>
<property>
  <name>hadoop.proxyuser.$USER.groups</name>
  <value>$GROUPS</value>
</property>
<property>
  <name>hadoop.proxyuser.$USER.hosts</name>
  <value>$GROUPS</value>
</property>
    

Substitute the REST gateway proxy user for $USER, and the allowed group list for $GROUPS.

To enable REST gateway impersonation, add the following to the hbase-site.xml file for every REST gateway.

<property>
  <name>hbase.rest.authentication.type</name>
  <value>kerberos</value>
</property>
<property>
  <name>hbase.rest.authentication.kerberos.principal</name>
  <value>HTTP/_HOST@HADOOP.LOCALDOMAIN</value>
</property>
<property>
  <name>hbase.rest.authentication.kerberos.keytab</name>
  <value>$KEYTAB</value>
</property>
    

Substitute the keytab for HTTP for $KEYTAB.

1.2. Simple User Access to Apache HBase

Newer releases of Apache HBase (>= 0.92) support optional SASL authentication of clients[2].

This describes how to set up Apache HBase and clients for simple user access to HBase resources.

1.2.1. Simple Versus Secure Access

The following section shows how to set up simple user access. Simple user access is not a secure method of operating HBase. This method is used to prevent users from making mistakes. It can be used to mimic the Access Control using on a development system without having to set up Kerberos.

This method is not used to prevent malicious or hacking attempts. To make HBase secure against these types of attacks, you must configure HBase for secure operation. Refer to the section Secure Client Access to HBase and complete all of the steps described there.

1.2.1.1. Prerequisites

None

1.2.1.1.1. Server-side Configuration for Simple User Access Operation

Add the following to the hbase-site.xml file on every server machine in the cluster:

<property>
  <name>hbase.security.authentication</name>
  <value>simple</value>
</property>
<property>
  <name>hbase.security.authorization</name>
  <value>true</value>
</property>
<property>
  <name>hbase.coprocessor.master.classes</name>
  <value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
  <name>hbase.coprocessor.region.classes</name>
  <value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
    

For 0.94, add the following to the hbase-site.xml file on every server machine in the cluster:

<property>
  <name>hbase.rpc.engine</name>
  <value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value>
</property>
<property>
  <name>hbase.coprocessor.master.classes</name>
  <value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
  <name>hbase.coprocessor.region.classes</name>
  <value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property> 
    

A full shutdown and restart of HBase service is required when deploying these configuration changes.

1.2.1.1.2. Client-side Configuration for Simple User Access Operation

Add the following to the hbase-site.xml file on every client:

<property>
  <name>hbase.security.authentication</name>
  <value>simple</value>
</property>
    

For 0.94, add the following to the hbase-site.xml file on every server machine in the cluster:

<property>
  <name>hbase.rpc.engine</name>
  <value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value>
</property>
    

Be advised that if the hbase.security.authentication in the client- and server-side site files do not match, the client will not be able to communicate with the cluster.

1.2.1.1.3. Client-side Configuration for Simple User Access Operation - Thrift Gateway

The Thrift gateway user will need access. For example, to give the Thrift API user, thrift_server, administrative access, a command such as this one will suffice:

grant 'thrift_server', 'RWCA'
    

For more information about ACLs, please see the Access Control section

The Thrift gateway will authenticate with HBase using the supplied credential. No authentication will be performed by the Thrift gateway itself. All client access via the Thrift gateway will use the Thrift gateway's credential and have its privilege.

1.2.1.1.4. Client-side Configuration for Simple User Access Operation - REST Gateway

The REST gateway will authenticate with HBase using the supplied credential. No authentication will be performed by the REST gateway itself. All client access via the REST gateway will use the REST gateway's credential and have its privilege.

The REST gateway user will need access. For example, to give the REST API user, rest_server, administrative access, a command such as this one will suffice:

grant 'rest_server', 'RWCA'
    

For more information about ACLs, please see the Access Control section

It should be possible for clients to authenticate with the HBase cluster through the REST gateway in a pass-through manner via SPEGNO HTTP authentication. This is future work.

1.3. Tags

Every cell can have metadata associated with it. Adding metadata in the data part of every cell would make things difficult.

The 0.98 version of HBase solves this problem by providing Tags along with the cell format. Some of the usecases that uses the tags are Visibility labels, Cell level ACLs, etc.

HFile V3 version from 0.98 onwards supports tags and this feature can be turned on using the following configuration

<property>
  <name>hfile.format.version</name>
  <value>3</value>
</property>
    

Every cell can have zero or more tags. Every tag has a type and the actual tag byte array. The types 0-31 are reserved for System tags. For example ‘1’ is reserved for ACL and ‘2’ is reserved for Visibility tags.

The way rowkeys, column families, qualifiers and values are encoded using different Encoding Algos, similarly the tags can also be encoded. Tag encoding can be turned on per CF. Default is always turn ON. To turn on the tag encoding on the HFiles use

HColumnDescriptor#setCompressTags(boolean compressTags)
    

Note that encoding of tags takes place only if the DataBlockEncoder is enabled for the CF.

As we compress the WAL entries using Dictionary the tags present in the WAL can also be compressed using Dictionary. Every tag is compressed individually using WAL Dictionary. To turn ON tag compression in WAL dictionary enable the property

<property>
  <name>hbase.regionserver.wal.tags.enablecompression</name>
  <value>true</value>
</property>
    

To add tags to every cell during Puts, the following apis are provided

Put#add(byte[] family, byte [] qualifier, byte [] value, Tag[] tag)
Put#add(byte[] family, byte[] qualifier, long ts, byte[] value, Tag[] tag)
    

Some of the feature developed using tags are Cell level ACLs and Visibility labels. These are some features that use tags framework and allows users to gain better security features on cell level.

For details, see:

Access Control Visibility labels

1.4. Access Control

Newer releases of Apache HBase (>= 0.92) support optional access control list (ACL-) based protection of resources on a column family and/or table basis.

This describes how to set up Secure HBase for access control, with an example of granting and revoking user permission on table resources provided.

1.4.1. Prerequisites

You must configure HBase for secure or simple user access operation. Refer to the Secure Client Access to HBase or Simple User Access to HBase sections and complete all of the steps described there.

For secure access, you must also configure ZooKeeper for secure operation. Changes to ACLs are synchronized throughout the cluster using ZooKeeper. Secure authentication to ZooKeeper must be enabled or otherwise it will be possible to subvert HBase access control via direct client access to ZooKeeper. Refer to the section on secure ZooKeeper configuration and complete all of the steps described there.

1.4.2. Overview

With Secure RPC and Access Control enabled, client access to HBase is authenticated and user data is private unless access has been explicitly granted. Access to data can be granted at a table or per column family basis.

However, the following items have been left out of the initial implementation for simplicity:

  1. Row-level or per value (cell): Using Tags in HFile V3

  2. Push down of file ownership to HDFS: HBase is not designed for the case where files may have different permissions than the HBase system principal. Pushing file ownership down into HDFS would necessitate changes to core code. Also, while HDFS file ownership would make applying quotas easy, and possibly make bulk imports more straightforward, it is not clear that it would offer a more secure setup.

  3. HBase managed "roles" as collections of permissions: We will not model "roles" internally in HBase to begin with. We instead allow group names to be granted permissions, which allows external modeling of roles via group membership. Groups are created and manipulated externally to HBase, via the Hadoop group mapping service.

Access control mechanisms are mature and fairly standardized in the relational database world. The HBase implementation approximates current convention, but HBase has a simpler feature set than relational databases, especially in terms of client operations. We don't distinguish between an insert (new record) and update (of existing record), for example, as both collapse down into a Put. Accordingly, the important operations condense to four permissions: READ, WRITE, CREATE, and ADMIN.

Table 1.1. Operation To Permission Mapping

PermissionOperation
ReadGet
 Exists
 Scan
WritePut
 Delete
 Lock/UnlockRow
 IncrementColumnValue
 CheckAndDelete/Put
CreateCreate
 Alter
 Drop
 Bulk Load
AdminEnable/Disable
 Snapshot/Restore/Clone
 Split
 Flush
 Compact
 Major Compact
 Grant
 Revoke
 Shutdown

Permissions can be granted in any of the following scopes, though CREATE and ADMIN permissions are effective only at table scope.

  • Table

    • Read: User can read from any column family in table

    • Write: User can write to any column family in table

    • Create: User can alter table attributes; add, alter, or drop column families; and drop the table.

    • Admin: User can alter table attributes; add, alter, or drop column families; and enable, disable, or drop the table. User can also trigger region (re)assignments or relocation.

  • Column Family

    • Read: User can read from the column family

    • Write: User can write to the column family

There is also an implicit global scope for the superuser.

The superuser is a principal, specified in the HBase site configuration file, that has equivalent access to HBase as the 'root' user would on a UNIX derived system. Normally this is the principal that the HBase processes themselves authenticate as. Although future versions of HBase Access Control may support multiple superusers, the superuser privilege will always include the principal used to run the HMaster process. Only the superuser is allowed to create tables, switch the balancer on or off, or take other actions with global consequence. Furthermore, the superuser has an implicit grant of all permissions to all resources.

Tables have a new metadata attribute: OWNER, the user principal who owns the table. By default this will be set to the user principal who creates the table, though it may be changed at table creation time or during an alter operation by setting or changing the OWNER table attribute. Only a single user principal can own a table at a given time. A table owner will have all permissions over a given table.

1.4.3. Access Control Matrix

The following matrix shows the minimum permission set required to perform operations in HBase. Before using the table, read through the information about how to interpret it.

Interpreting the ACL Matrix Table

The following conventions are used in the ACL Matrix table:

Scopes

Permissions are evaluated starting at the widest scope and working to the narrowest scope. A scope corresponds to a level of the data model. From broadest to narrowest, the scopes are as follows::

  • Global

  • Namespace (NS)

  • Table

  • Column Qualifier (CF)

  • Column Family (CQ)

  • Cell

For instance, a permission granted at table level dominates any grants done at the ColumnFamily, ColumnQualifier, or cell level. The user can do what that grant implies at any location in the table. A permission granted at global scope dominates all: the user is always allowed to take that action everywhere.

Permissions

Possible permissions include the following:

  • Superuser - a special user that belongs to group "supergroup" and has unlimited access

  • Admin (A)

  • Create (C)

  • Write (W)

  • Read (R)

  • Execute (X)

For the most part, permissions work in an expected way, with the following caveats:

  • Having Write permission does not imply Read permission. It is possible and sometimes desirable for a user to be able to write data that same user cannot read. One such example is a log-writing process.

  • Admin is a superset of Create, so a user with Admin permissions does not also need Create permissions to perform an action such as creating a table.

  • The hbase:meta table is readable by every user, regardless of the user's other grants or restrictions. This is a requirement for HBase to function correctly.

  • Users with Create or Admin permissions are granted Write permission on meta regions, so the table operations they are allowed to perform can complete, even if technically the bits can be granted separately in any possible combination.

  • CheckAndPut and CheckAndDelete operations will fail if the user does not have both Write and Read permission.

  • Increment and Append operations do not require Read access.

The following table is sorted by the interface that provides each operation. In case the table goes out of date, the unit tests which check for accuracy of permissions can be found in hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java, and the access controls themselves can be examined in hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java.

Table 1.2. ACL Matrix

InterfaceOperationMinimum ScopeMinimum Permission

Master

createTable

Global

A

modifyTable

Table

A|CW

deleteTable

Table

A|CW

truncateTable

Table

A|CW

addColumn

Table

A|CW

modifyColumn

Table

A|CW

deleteColumn

Table

A|CW

disableTable

Table

A|CW

disableAclTable

None

Not allowed

enableTable

Table

A|CW

move

Global

A

assign

Global

A

unassign

Global

A

regionOffline

Global

A

balance

Global

A

balanceSwitch

Global

A

shutdown

Global

A

stopMaster

Global

A

snapshot

Global

A

clone

Global

A

restore

Global

A

deleteSnapshot

Global

A

createNamespace

Global

A

deleteNamespace

Namespace

A

modifyNamespace

Namespace

A

flushTable

Table

A|CW

getTableDescriptors

Global|Table

A

mergeRegions

Global

A

RegionpreOpenGlobalA

openRegion

Global

A

preCloseGlobalA

closeRegion

Global

A

preStopRegionServerGlobalA

stopRegionServer

Global

A

mergeRegions

Global

A

appendTableW
deleteTable|CF|CQW
existsTable|CF|CQR
getTable|CF|CQR
getClosestRowBeforeTable|CF|CQR
incrementTable|CF|CQW
putTable|CF|CQW

flush

Global

A|CW

split

Global

A

compact

Global

A|CW

bulkLoadHFileTableW
prepareBulkLoadTableCW
cleanupBulkLoadTableW
checkAndDeleteTable|CF|CQRW
checkAndPutTable|CF|CQRW
incrementColumnValueTable|CF|CQRW
ScannerCloseTableR
ScannerNextTableR
ScannerOpenTable|CQ|CFR

Endpoint

invoke

Endpoint

X

AccessController

grant

Global|Table|NS

A

revoke

Global|Table|NS

A

userPermissions

Global|Table|NS

A

checkPermissions

Global|Table|NS

A


1.4.4. Server-side Configuration for Access Control

Enable the AccessController coprocessor in the cluster configuration and restart HBase. The restart can be a rolling one. Complete the restart of all Master and RegionServer processes before setting up ACLs.

To enable the AccessController, modify the hbase-site.xml file on every server machine in the cluster to look like:

<property>
  <name>hbase.coprocessor.master.classes</name>
  <value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
<name>hbase.coprocessor.region.classes</name>
  <value>org.apache.hadoop.hbase.security.token.TokenProvider,
  org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
    

1.4.5. Cell level Access Control using Tags

Prior to HBase 0.98 access control was restricted to table and column family level. Thanks to tags feature in 0.98 that allows Access control on a cell level. The existing Access Controller coprocessor helps in achieving cell level access control also. For details on configuring it refer to Access Control section.

The ACLs can be specified for every mutation using the APIs

Mutation.setACL(String user, Permission perms)
Mutation.setACL(Map<String, Permission> perms)
    

For example, to provide read permission to an user ‘user1’ then

put.setACL(“user1”, new Permission(Permission.Action.READ))
    

Generally the ACL applied on the table and CF takes precedence over Cell level ACL. In order to make the cell level ACL to take precedence use the following API,

Mutation.setACLStrategy(boolean cellFirstStrategy)
    

Please note that inorder to use this feature, HFile V3 version should be turned on.

<property>
  <name>hfile.format.version</name>
  <value>3</value>
</property>
     

Note that deletes with ACLs do not have any effect. To keep things simple the ACLs applied on the current Put does not change the ACL of any previous Put in the sense that the ACL on the current put does not affect older versions of Put for the same row.

1.4.6. Shell Enhancements for Access Control

The HBase shell has been extended to provide simple commands for editing and updating user permissions. The following commands have been added for access control list management:

Example 1.1. Grant

grant <user|@group> <permissions> [ <table> [ <column family> [ <column qualifier> ] ] ]
        

<user|@group> is user or group (start with character '@'), Groups are created and manipulated via the Hadoop group mapping service.

<permissions> is zero or more letters from the set "RWCA": READ('R'), WRITE('W'), CREATE('C'), ADMIN('A').

Note: Grants and revocations of individual permissions on a resource are both accomplished using the grant command. A separate revoke command is also provided by the shell, but this is for fast revocation of all of a user's access rights to a given resource only.

Example 1.2. Revoke

revoke <user|@group> [ <table> [ <column family> [ <column qualifier> ] ] ]
    

Example 1.3. Alter

The alter command has been extended to allow ownership assignment:

alter 'tablename', {OWNER => 'username|@group'}

Example 1.4. User Permission

The user_permission command shows all access permissions for the current user for a given table:

user_permission <table>
    

1.5. Secure Bulk Load

Bulk loading in secure mode is a bit more involved than normal setup, since the client has to transfer the ownership of the files generated from the mapreduce job to HBase. Secure bulk loading is implemented by a coprocessor, named SecureBulkLoadEndpoint. SecureBulkLoadEndpoint uses a staging directory "hbase.bulkload.staging.dir", which defaults to /tmp/hbase-staging/. The algorithm is as follows.

  • Create an hbase owned staging directory which is world traversable (-rwx--x--x, 711) /tmp/hbase-staging.

  • A user writes out data to his secure output directory: /user/foo/data

  • A call is made to hbase to create a secret staging directory which is globally readable/writable (-rwxrwxrwx, 777): /tmp/hbase-staging/averylongandrandomdirectoryname

  • The user makes the data world readable and writable, then moves it into the random staging directory, then calls bulkLoadHFiles()

Like delegation tokens the strength of the security lies in the length and randomness of the secret directory.

You have to enable the secure bulk load to work properly. You can modify the hbase-site.xml file on every server machine in the cluster and add the SecureBulkLoadEndpoint class to the list of regionserver coprocessors:

<property>
  <name>hbase.bulkload.staging.dir</name>
  <value>/tmp/hbase-staging</value>
</property>
<property>
  <name>hbase.coprocessor.region.classes</name>
  <value>org.apache.hadoop.hbase.security.token.TokenProvider,
  org.apache.hadoop.hbase.security.access.AccessController,org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint</value>
</property>
    

1.6. Visibility Labels

This feature provides cell level security with labeled visibility for the cells. Cells can be associated with a visibility expression. The visibility expression can contain labels joined with logical expressions '&', '|' and '!'. Also using '(', ')' one can specify the precedence order. For example, consider the label set { confidential, secret, topsecret, probationary }, where the first three are sensitivity classifications and the last describes if an employee is probationary or not. If a cell is stored with this visibility expression: ( secret | topsecret ) & !probationary

Then any user associated with the secret or topsecret label will be able to view the cell, as long as the user is not also associated with the probationary label. Furthermore, any user only associated with the confidential label, whether probationary or not, will not see the cell or even know of its existence.

Visibility expressions like the above can be added when storing or mutating a cell using the API,

Mutation#setCellVisibility(new CellVisibility(String labelExpession));

Where the labelExpression could be '( secret | topsecret ) & !probationary'

We build the user's label set in the RPC context when a request is first received by the HBase RegionServer. How users are associated with labels is pluggable. The default plugin passes through labels specified in Authorizations added to the Get or Scan and checks those against the calling user's authenticated labels list. When client passes some labels for which the user is not authenticated, this default algorithm will drop those. One can pass a subset of user authenticated labels via the Scan/Get authorizations.

Get#setAuthorizations(new Authorizations(String,...));

Scan#setAuthorizations(new Authorizations(String,...));

1.6.1. Visibility Label Administration

There are new client side Java APIs and shell commands for performing visibility labels administrative actions. Only the HBase super user is authorized to perform these operations.

1.6.1.1. Adding Labels

A set of labels can be added to the system either by using the Java API

VisibilityClient#addLabels(Configuration conf, final String[] labels)

Or by using the shell command

add_labels [label1, label2]

Valid label can include alphanumeric characters and characters '-', '_', ':', '.' and '/'

1.6.1.2. User Label Association

A set of labels can be associated with a user by using the API

VisibilityClient#setAuths(Configuration conf, final String[] auths, final String user)

Or by using the shell command

set_auths user,[label1, label2].

Labels can be disassociated from a user using API

VisibilityClient#clearAuths(Configuration conf, final String[] auths, final String user)

Or by using shell command

clear_auths user,[label1, label2]

One can use the API VisibilityClient#getAuths(Configuration conf, final String user) or get_auths shell command to get the list of labels associated for a given user. The labels and user auths information will be stored in the system table "labels".

1.6.2. Server Side Configuration

HBase stores cell level labels as cell tags. HFile version 3 adds the cell tags support. Be sure to use HFile version 3 by setting this property in every server site configuration file:

		  <property>
		    <name>hfile.format.version</name>
			<value>3</value>
		  </property>
		

You will also need to make sure the VisibilityController coprocessor is active on every table to protect by adding it to the list of system coprocessors in the server site configuration files:

<property>
  <name>hbase.coprocessor.master.classes</name>
<value>org.apache.hadoop.hbase.security.visibility.VisibilityController</value>
</property>
<property>
  <name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.security.visibility.VisibilityController</value>
</property>
		

As said above, finding out labels authenticated for a given get/scan request is a pluggable algorithm. A custom implementation can be plugged in using the property hbase.regionserver.scan.visibility.label.generator.class. The default implementation class is org.apache.hadoop.hbase.security.visibility.DefaultScanLabelGenerator. One can configure a set of ScanLabelGenerators to be used by the system. For this, a comma separated set of implementation class names to be configured.

Visibility Labels and Replication

By default, visibility labels are lost on replication. To change this behavior, see ???.

1.7. Transparent Server Side Encryption

This feature provides transparent encryption for protecting HFile and WAL data at rest, using a two-tier key architecture for flexible and non-intrusive key rotation.

First, the administrator provisions a cluster master key, stored into a key provider accessable to every trusted HBase process: the Master, the RegionServers, and clients (e.g. the shell) on administrative workstations. The default key provider integrates with the Java KeyStore API and any key management system with support for it. How HBase retrieves key material is configurable via the site file. The master key may be stored on the cluster servers, protected by a secure KeyStore file, or on an external keyserver, or in a hardware security module. This master key is resolved as needed by HBase processes through the configured key provider.

Then, encryption keys can be specified in schema on a per column family basis, by creating or modifying a column descriptor to include two additional attributes: the name of the encryption algorithm to use (currently only "AES" is supported), and, optionally, a data key wrapped (encrypted) with the cluster master key. Per CF keys facilitates low impact incremental key rotation and reduces the scope of any external leak of key material. The wrapped data key is stored in the CF schema metadata, and in each HFile for the CF, encrypted with the cluster master key. Once the CF is configured for encryption, any new HFiles will be written encrypted. To insure encryption of all HFiles, trigger a major compaction after first enabling this feature. The key for decryption, encrypted with the cluster master key, is stored in the HFiles in a new meta block. At file open time the data key will be extracted from the HFile, decrypted with the cluster master key, and used for decryption of the remainder of the HFile. The HFile will be unreadable if the master key is not available. Should remote users somehow acquire access to the HFile data because of some lapse in HDFS permissions or from inappropriately discarded media, there will be no means to decrypt either the data key or the file data.

Specifying a data key in the CF schema is optional. If one is not present, a random data key will be created for each HFile.

A new configuration option for encrypting the WAL is also introduced. Even though WALs are transient, it is necessary to encrypt the WALEdits to avoid circumventing HFile protections for encrypted column families.

1.7.1. Configuration

Create a secret key of appropriate length for AES.

$ keytool -keystore /path/to/hbase/conf/hbase.jks \
  -storetype jceks -storepass <password> \
  -genseckey -keyalg AES -keysize 128 \
  -alias <alias>
	

where <password> is the password for the KeyStore file and <alias>is the user name of the HBase service account, typically "hbase". Simply press RETURN to store the key with the same password as the store. The resulting file should be distributed to all nodes running HBase daemons, with file ownership and permissions set to be readable only by the HBase service account.

Configure HBase daemons to use a key provider backed by the KeyStore files for retrieving the cluster master key as needed.

<property>
    <name>hbase.crypto.keyprovider</name>
    <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
</property>
<property>
    <name>hbase.crypto.keyprovider.parameters</name>
    <value>jceks:///path/to/hbase/conf/hbase.jks?password=<password></value>
</property>
        

By default the HBase service account name will be used to resolve the cluster master key, but you can store it with any arbitrary alias and configure HBase appropriately:

<property>
    <name>hbase.crypto.master.key.name</name>
    <value>hbase</value>
</property>
        

Because the password to the key store is sensitive information, the HBase site XML file should also have its permissions set to be readable only by the HBase service account.

Transparent encryption is a feature of HFile version 3. Be sure to use HFile version 3 by setting this property in every server site configuration file:

<property>
    <name>hfile.format.version</name>
    <value>3</value>
</property>
        

Finally, configure the secure WAL in every server site configuration file:

<property>
    <name>hbase.regionserver.hlog.reader.impl</name>
    <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
</property>
<property>
    <name>hbase.regionserver.hlog.writer.impl</name>
    <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
</property>
<property>
    <name>hbase.regionserver.wal.encryption</name>
    <value>true</value>
</property>
        

1.7.2. Setting Encryption on a CF

To enable encryption on a CF, use HBaseAdmin#modifyColumn or the HBase shell to modify the column descriptor. The attribute 'ENCRYPTION' specifies the encryption algorithm to use. Currently only "AES" is supported. If creating a new table, simply set this attribute; no subsequent table modification will be necessary.

If setting a specific data key, the attribute 'ENCRYPTION_KEY' should contain the data key wrapped by the cluster master key. The static methods wrapKey and unwrapKey in org.apache.hadoop.hbase.security.EncryptionUtil can be used in conjunction with HColumnDescriptor#setEncryptionKey for this purpose. Because this must be done programatically, setting a data key with the shell is not supported.

To disable encryption on a CF, simply remove the 'ENCRYPTION' (and 'ENCRYPTION_KEY', if it was set) attributes from the column schema, using HBaseAdmin#modifyColumn or the HBase shell. All new HFiles for the CF will be written without encryption. Trigger a major compaction to rewrite all files.

1.7.3. Data Key Rotation

Data key rotation is made simple by this design. First, change the CF key in the column descriptor. Then, trigger major compaction. Once compaction has completed, all files will be (re)encrypted with the new key material. While this process is ongoing, HFiles encrypted with old key material will still be readable.

1.7.4. Master Key Rotation

Master key rotation can be achieved by updating the KeyStore to contain a new master key, as described above, with also the old master key added to the KeyStore under a different alias. Then, configure fallback to the old master key in the HBase site file:

<property>
    <name>hbase.crypto.master.alternate.key.name</name>
    <value>hbase.old</value>
</property>
        

This will require a rolling restart of the HBase daemons to take effect. As with data key rotation, trigger a major compaction and wait for it to complete. Once compaction has completed, all files will be (re)encrypted with data keys wrapped by the new cluster master key. The old master key, and its associated site file configuration, can then be removed, and all trace of the old master key will be gone after the next rolling restart. A second rolling restart is not immediately necessary.



comments powered by Disqus