org.apache.hadoop.hbase.client
Class Scan

java.lang.Object
  extended by org.apache.hadoop.hbase.client.Operation
      extended by org.apache.hadoop.hbase.client.OperationWithAttributes
          extended by org.apache.hadoop.hbase.client.Scan
All Implemented Interfaces:
Attributes, org.apache.hadoop.io.Writable

public class Scan
extends OperationWithAttributes
implements org.apache.hadoop.io.Writable

Used to perform Scan operations.

All operations are identical to Get with the exception of instantiation. Rather than specifying a single row, an optional startRow and stopRow may be defined. If rows are not specified, the Scanner will iterate over all rows.

To scan everything for each row, instantiate a Scan object.

To modify scanner caching for just this scan, use setCaching. If caching is NOT set, we will use the caching value of the hosting HTable. See HTable.setScannerCaching(int).

To further define the scope of what to get when scanning, perform additional methods as outlined below.

To get all columns from specific families, execute addFamily for each family to retrieve.

To get specific columns, execute addColumn for each column to retrieve.

To only retrieve columns within a specific range of version timestamps, execute setTimeRange.

To only retrieve columns with a specific timestamp, execute setTimestamp.

To limit the number of versions of each column to be returned, execute setMaxVersions.

To limit the maximum number of values returned for each call to next(), execute setBatch.

To add a filter, execute setFilter.

Expert: To explicitly disable server-side block caching for this scan, execute setCacheBlocks(boolean).


Constructor Summary
Scan()
          Create a Scan operation across all rows.
Scan(byte[] startRow)
          Create a Scan operation starting at the specified row.
Scan(byte[] startRow, byte[] stopRow)
          Create a Scan operation for the range of rows specified.
Scan(byte[] startRow, Filter filter)
           
Scan(Get get)
          Builds a scan object with the same specs as get.
Scan(Scan scan)
          Creates a new instance of this class while copying all values.
 
Method Summary
 Scan addColumn(byte[] family, byte[] qualifier)
          Get the column from the specified family with the specified qualifier.
 Scan addFamily(byte[] family)
          Get all columns from the specified family.
 int getBatch()
           
 boolean getCacheBlocks()
          Get whether blocks should be cached for this Scan.
 int getCaching()
           
 byte[][] getFamilies()
           
 Map<byte[],NavigableSet<byte[]>> getFamilyMap()
          Getting the familyMap
 Filter getFilter()
           
 Map<String,Object> getFingerprint()
          Compile the table and column family (i.e.
 int getMaxVersions()
           
 byte[] getStartRow()
           
 byte[] getStopRow()
           
 TimeRange getTimeRange()
           
 boolean hasFamilies()
           
 boolean hasFilter()
           
 boolean isGetScan()
           
 int numFamilies()
           
 void readFields(DataInput in)
           
 void setBatch(int batch)
          Set the maximum number of values to return for each call to next()
 void setCacheBlocks(boolean cacheBlocks)
          Set whether blocks should be cached for this Scan.
 void setCaching(int caching)
          Set the number of rows for caching that will be passed to scanners.
 Scan setFamilyMap(Map<byte[],NavigableSet<byte[]>> familyMap)
          Setting the familyMap
 Scan setFilter(Filter filter)
          Apply the specified server-side filter when performing the Scan.
 Scan setMaxVersions()
          Get all available versions.
 Scan setMaxVersions(int maxVersions)
          Get up to the specified number of versions of each column.
 Scan setStartRow(byte[] startRow)
          Set the start row of the scan.
 Scan setStopRow(byte[] stopRow)
          Set the stop row.
 Scan setTimeRange(long minStamp, long maxStamp)
          Get versions of columns only within the specified timestamp range, [minStamp, maxStamp).
 Scan setTimeStamp(long timestamp)
          Get versions of columns with the specified timestamp.
 Map<String,Object> toMap(int maxCols)
          Compile the details beyond the scope of getFingerprint (row, columns, timestamps, etc.) into a Map along with the fingerprinted information.
 void write(DataOutput out)
           
 
Methods inherited from class org.apache.hadoop.hbase.client.OperationWithAttributes
getAttribute, getAttributeSize, getAttributesMap, readAttributes, setAttribute, writeAttributes
 
Methods inherited from class org.apache.hadoop.hbase.client.Operation
toJSON, toJSON, toMap, toString, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Scan

public Scan()
Create a Scan operation across all rows.


Scan

public Scan(byte[] startRow,
            Filter filter)

Scan

public Scan(byte[] startRow)
Create a Scan operation starting at the specified row.

If the specified row does not exist, the Scanner will start from the next closest row after the specified row.

Parameters:
startRow - row to start scanner at or after

Scan

public Scan(byte[] startRow,
            byte[] stopRow)
Create a Scan operation for the range of rows specified.

Parameters:
startRow - row to start scanner at or after (inclusive)
stopRow - row to stop scanner before (exclusive)

Scan

public Scan(Scan scan)
     throws IOException
Creates a new instance of this class while copying all values.

Parameters:
scan - The scan instance to copy from.
Throws:
IOException - When copying the values fails.

Scan

public Scan(Get get)
Builds a scan object with the same specs as get.

Parameters:
get - get to model scan after
Method Detail

isGetScan

public boolean isGetScan()

addFamily

public Scan addFamily(byte[] family)
Get all columns from the specified family.

Overrides previous calls to addColumn for this family.

Parameters:
family - family name
Returns:
this

addColumn

public Scan addColumn(byte[] family,
                      byte[] qualifier)
Get the column from the specified family with the specified qualifier.

Overrides previous calls to addFamily for this family.

Parameters:
family - family name
qualifier - column qualifier
Returns:
this

setTimeRange

public Scan setTimeRange(long minStamp,
                         long maxStamp)
                  throws IOException
Get versions of columns only within the specified timestamp range, [minStamp, maxStamp). Note, default maximum versions to return is 1. If your time range spans more than one version and you want all versions returned, up the number of versions beyond the defaut.

Parameters:
minStamp - minimum timestamp value, inclusive
maxStamp - maximum timestamp value, exclusive
Returns:
this
Throws:
IOException - if invalid time range
See Also:
setMaxVersions(), setMaxVersions(int)

setTimeStamp

public Scan setTimeStamp(long timestamp)
Get versions of columns with the specified timestamp. Note, default maximum versions to return is 1. If your time range spans more than one version and you want all versions returned, up the number of versions beyond the defaut.

Parameters:
timestamp - version timestamp
Returns:
this
See Also:
setMaxVersions(), setMaxVersions(int)

setStartRow

public Scan setStartRow(byte[] startRow)
Set the start row of the scan.

Parameters:
startRow - row to start scan on, inclusive
Returns:
this

setStopRow

public Scan setStopRow(byte[] stopRow)
Set the stop row.

Parameters:
stopRow - row to end at (exclusive)
Returns:
this

setMaxVersions

public Scan setMaxVersions()
Get all available versions.

Returns:
this

setMaxVersions

public Scan setMaxVersions(int maxVersions)
Get up to the specified number of versions of each column.

Parameters:
maxVersions - maximum versions for each column
Returns:
this

setBatch

public void setBatch(int batch)
Set the maximum number of values to return for each call to next()

Parameters:
batch - the maximum number of values

setCaching

public void setCaching(int caching)
Set the number of rows for caching that will be passed to scanners. If not set, the default setting from HTable.getScannerCaching() will apply. Higher caching values will enable faster scanners but will use more memory.

Parameters:
caching - the number of rows for caching

setFilter

public Scan setFilter(Filter filter)
Apply the specified server-side filter when performing the Scan.

Parameters:
filter - filter to run on the server
Returns:
this

setFamilyMap

public Scan setFamilyMap(Map<byte[],NavigableSet<byte[]>> familyMap)
Setting the familyMap

Parameters:
familyMap - map of family to qualifier
Returns:
this

getFamilyMap

public Map<byte[],NavigableSet<byte[]>> getFamilyMap()
Getting the familyMap

Returns:
familyMap

numFamilies

public int numFamilies()
Returns:
the number of families in familyMap

hasFamilies

public boolean hasFamilies()
Returns:
true if familyMap is non empty, false otherwise

getFamilies

public byte[][] getFamilies()
Returns:
the keys of the familyMap

getStartRow

public byte[] getStartRow()
Returns:
the startrow

getStopRow

public byte[] getStopRow()
Returns:
the stoprow

getMaxVersions

public int getMaxVersions()
Returns:
the max number of versions to fetch

getBatch

public int getBatch()
Returns:
maximum number of values to return for a single call to next()

getCaching

public int getCaching()
Returns:
caching the number of rows fetched when calling next on a scanner

getTimeRange

public TimeRange getTimeRange()
Returns:
TimeRange

getFilter

public Filter getFilter()
Returns:
RowFilter

hasFilter

public boolean hasFilter()
Returns:
true is a filter has been specified, false if not

setCacheBlocks

public void setCacheBlocks(boolean cacheBlocks)
Set whether blocks should be cached for this Scan.

This is true by default. When true, default settings of the table and family are used (this will never override caching blocks if the block cache is disabled for that family or entirely).

Parameters:
cacheBlocks - if false, default settings are overridden and blocks will not be cached

getCacheBlocks

public boolean getCacheBlocks()
Get whether blocks should be cached for this Scan.

Returns:
true if default caching should be used, false if blocks should not be cached

getFingerprint

public Map<String,Object> getFingerprint()
Compile the table and column family (i.e. schema) information into a String. Useful for parsing and aggregation by debugging, logging, and administration tools.

Specified by:
getFingerprint in class Operation
Returns:
Map

toMap

public Map<String,Object> toMap(int maxCols)
Compile the details beyond the scope of getFingerprint (row, columns, timestamps, etc.) into a Map along with the fingerprinted information. Useful for debugging, logging, and administration tools.

Specified by:
toMap in class Operation
Parameters:
maxCols - a limit on the number of columns output prior to truncation
Returns:
Map

readFields

public void readFields(DataInput in)
                throws IOException
Specified by:
readFields in interface org.apache.hadoop.io.Writable
Throws:
IOException

write

public void write(DataOutput out)
           throws IOException
Specified by:
write in interface org.apache.hadoop.io.Writable
Throws:
IOException


Copyright © 2012 Cloudera. All Rights Reserved.