org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators
Class POSplit

java.lang.Object
  extended by org.apache.pig.impl.plan.Operator<PhyPlanVisitor>
      extended by org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
          extended by org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
All Implemented Interfaces:
Serializable, Cloneable, Comparable<Operator>

public class POSplit
extends PhysicalOperator

The MapReduce Split operator. The assumption here is that the logical to physical translation will create this dummy operator with just the filename using which the input branch will be stored and used for loading Also the translation should make sure that appropriate filter operators are configured as outputs of this operator using the conditions specified in the LOSplit. So LOSplit will be converted into: | | | Filter1 Filter2 ... Filter3 | | ... | | | ... | ---- POSplit -... ---- This is different than the existing implementation where the POSplit writes to sidefiles after filtering and then loads the appropirate file. The approach followed here is as good as the old approach if not better in many cases because of the availablity of attachinInputs. An optimization that can ensue is if there are multiple loads that load the same file, they can be merged into one and then the operators that take input from the load can be stored. This can be used when the mapPlan executes to read the file only once and attach the resulting tuple as inputs to all the operators that take input from this load. In some cases where the conditions are exclusive and some outputs are ignored, this approach can be worse. But this leads to easier management of the Split and also allows to reuse this data stored from the split job whenever necessary.

See Also:
Serialized Form

Field Summary
 
Fields inherited from class org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
dummyBag, dummyBool, dummyDBA, dummyDouble, dummyFloat, dummyInt, dummyLong, dummyMap, dummyString, dummyTuple, input, inputAttached, inputs, lineageTracer, outputs, parentPlan, pigLogger, reporter, requestedParallelism, res, resultType
 
Fields inherited from class org.apache.pig.impl.plan.Operator
mKey
 
Constructor Summary
POSplit(OperatorKey k)
           
POSplit(OperatorKey k, int rp)
           
POSplit(OperatorKey k, int rp, List<PhysicalOperator> inp)
           
POSplit(OperatorKey k, List<PhysicalOperator> inp)
           
 
Method Summary
 FileSpec getSplitStore()
           
 String name()
           
 void setSplitStore(FileSpec splitStore)
           
 boolean supportsMultipleInputs()
          Indicates whether this operator supports multiple inputs.
 boolean supportsMultipleOutputs()
          Indicates whether this operator supports multiple outputs.
 void visit(PhyPlanVisitor v)
          Visit this node with the provided visitor.
 
Methods inherited from class org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
attachInput, clone, cloneHelper, detachInput, getInputs, getLogger, getNext, getNext, getNext, getNext, getNext, getNext, getNext, getNext, getNext, getNext, getPigLogger, getRequestedParallelism, getResultType, isBlocking, isInputAttached, processInput, setInputs, setLineageTracer, setParentPlan, setPigLogger, setReporter, setRequestedParallelism, setResultType
 
Methods inherited from class org.apache.pig.impl.plan.Operator
compareTo, equals, getOperatorKey, hashCode, toString
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

POSplit

public POSplit(OperatorKey k)

POSplit

public POSplit(OperatorKey k,
               int rp)

POSplit

public POSplit(OperatorKey k,
               List<PhysicalOperator> inp)

POSplit

public POSplit(OperatorKey k,
               int rp,
               List<PhysicalOperator> inp)
Method Detail

visit

public void visit(PhyPlanVisitor v)
           throws VisitorException
Description copied from class: Operator
Visit this node with the provided visitor. This should only be called by the visitor class itself, never directly.

Specified by:
visit in class PhysicalOperator
Parameters:
v - Visitor to visit with.
Throws:
VisitorException - if the visitor has a problem.

name

public String name()
Specified by:
name in class Operator<PhyPlanVisitor>

supportsMultipleInputs

public boolean supportsMultipleInputs()
Description copied from class: Operator
Indicates whether this operator supports multiple inputs.

Specified by:
supportsMultipleInputs in class Operator<PhyPlanVisitor>
Returns:
true if it does, otherwise false.

supportsMultipleOutputs

public boolean supportsMultipleOutputs()
Description copied from class: Operator
Indicates whether this operator supports multiple outputs.

Specified by:
supportsMultipleOutputs in class Operator<PhyPlanVisitor>
Returns:
true if it does, otherwise false.

getSplitStore

public FileSpec getSplitStore()

setSplitStore

public void setSplitStore(FileSpec splitStore)


Copyright © ${year} The Apache Software Foundation