|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
org.apache.hadoop.hbase.mapreduce.MultiTableInputFormatBase
@InterfaceAudience.Public @InterfaceStability.Evolving public abstract class MultiTableInputFormatBase
A base for MultiTableInputFormats. Receives a list of
Scan instances that define the input tables and
filters etc. Subclasses may use other TableRecordReader implementations.
| Constructor Summary | |
|---|---|
MultiTableInputFormatBase()
|
|
| Method Summary | |
|---|---|
org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
Builds a TableRecordReader. |
protected List<Scan> |
getScans()
Allows subclasses to get the list of Scan objects. |
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext context)
Calculates the splits that will serve as input for the map tasks. |
protected boolean |
includeRegionInSplit(byte[] startKey,
byte[] endKey)
Test if the given region is to be included in the InputSplit while splitting the regions of a table. |
protected void |
setScans(List<Scan> scans)
Allows subclasses to set the list of Scan objects. |
protected void |
setTableRecordReader(TableRecordReader tableRecordReader)
Allows subclasses to set the TableRecordReader. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public MultiTableInputFormatBase()
| Method Detail |
|---|
public org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException,
InterruptedException
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>split - The split to work with.context - The current context.
IOException - When creating the reader fails.
InterruptedException - when record reader initialization failsInputFormat.createRecordReader(
org.apache.hadoop.mapreduce.InputSplit,
org.apache.hadoop.mapreduce.TaskAttemptContext)
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
throws IOException
getSplits in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>context - The current job context.
IOException - When creating the list of splits fails.InputFormat.getSplits(org.apache.hadoop.mapreduce.JobContext)
protected boolean includeRegionInSplit(byte[] startKey,
byte[] endKey)
This optimization is effective when there is a specific reasoning to
exclude an entire region from the M-R job, (and hence, not contributing to
the InputSplit), given the start and end keys of the same.
Useful when we need to remember the last-processed top record and revisit
the [last, current) interval for M-R processing, continuously. In addition
to reducing InputSplits, reduces the load on the region server as well, due
to the ordering of the keys.
Note: It is possible that endKey.length() == 0 , for the last
(recent) region.
Override this method, if you want to bulk exclude regions altogether from
M-R. By default, no region is excluded( i.e. all regions are included).
startKey - Start key of the regionendKey - End key of the region
protected List<Scan> getScans()
Scan objects.
protected void setScans(List<Scan> scans)
Scan objects.
scans - The list of Scan used to define the inputprotected void setTableRecordReader(TableRecordReader tableRecordReader)
TableRecordReader.
tableRecordReader - A different TableRecordReader
implementation.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||