org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators
Class POSplit
java.lang.Object
org.apache.pig.impl.plan.Operator<PhyPlanVisitor>
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
- All Implemented Interfaces:
- Serializable, Cloneable, Comparable<Operator>
public class POSplit
- extends PhysicalOperator
The MapReduce Split operator.
The assumption here is that
the logical to physical translation
will create this dummy operator with
just the filename using which the input
branch will be stored and used for loading
Also the translation should make sure that
appropriate filter operators are configured
as outputs of this operator using the conditions
specified in the LOSplit. So LOSplit will be converted
into:
| | |
Filter1 Filter2 ... Filter3
| | ... |
| | ... |
---- POSplit -... ----
This is different than the existing implementation
where the POSplit writes to sidefiles after filtering
and then loads the appropirate file.
The approach followed here is as good as the old
approach if not better in many cases because
of the availablity of attachinInputs. An optimization
that can ensue is if there are multiple loads that
load the same file, they can be merged into one and
then the operators that take input from the load
can be stored. This can be used when
the mapPlan executes to read the file only once and
attach the resulting tuple as inputs to all the
operators that take input from this load.
In some cases where the conditions are exclusive and
some outputs are ignored, this approach can be worse.
But this leads to easier management of the Split and
also allows to reuse this data stored from the split
job whenever necessary.
- See Also:
- Serialized Form
Fields inherited from class org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator |
dummyBag, dummyBool, dummyDBA, dummyDouble, dummyFloat, dummyInt, dummyLong, dummyMap, dummyString, dummyTuple, input, inputAttached, inputs, lineageTracer, outputs, parentPlan, pigLogger, reporter, requestedParallelism, res, resultType |
Fields inherited from class org.apache.pig.impl.plan.Operator |
mKey |
Methods inherited from class org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator |
attachInput, clone, cloneHelper, detachInput, getInputs, getLogger, getNext, getNext, getNext, getNext, getNext, getNext, getNext, getNext, getNext, getNext, getPigLogger, getRequestedParallelism, getResultType, isBlocking, isInputAttached, processInput, setInputs, setLineageTracer, setParentPlan, setPigLogger, setReporter, setRequestedParallelism, setResultType |
POSplit
public POSplit(OperatorKey k)
POSplit
public POSplit(OperatorKey k,
int rp)
POSplit
public POSplit(OperatorKey k,
List<PhysicalOperator> inp)
POSplit
public POSplit(OperatorKey k,
int rp,
List<PhysicalOperator> inp)
visit
public void visit(PhyPlanVisitor v)
throws VisitorException
- Description copied from class:
Operator
- Visit this node with the provided visitor. This should only be called by
the visitor class itself, never directly.
- Specified by:
visit
in class PhysicalOperator
- Parameters:
v
- Visitor to visit with.
- Throws:
VisitorException
- if the visitor has a problem.
name
public String name()
- Specified by:
name
in class Operator<PhyPlanVisitor>
supportsMultipleInputs
public boolean supportsMultipleInputs()
- Description copied from class:
Operator
- Indicates whether this operator supports multiple inputs.
- Specified by:
supportsMultipleInputs
in class Operator<PhyPlanVisitor>
- Returns:
- true if it does, otherwise false.
supportsMultipleOutputs
public boolean supportsMultipleOutputs()
- Description copied from class:
Operator
- Indicates whether this operator supports multiple outputs.
- Specified by:
supportsMultipleOutputs
in class Operator<PhyPlanVisitor>
- Returns:
- true if it does, otherwise false.
getSplitStore
public FileSpec getSplitStore()
setSplitStore
public void setSplitStore(FileSpec splitStore)
Copyright © ${year} The Apache Software Foundation