org.apache.hadoop.hive.ql.exec
Class CommonJoinOperator<T extends JoinDesc>

java.lang.Object
  extended by org.apache.hadoop.hive.ql.exec.Operator<T>
      extended by org.apache.hadoop.hive.ql.exec.CommonJoinOperator<T>
All Implemented Interfaces:
Serializable, Cloneable, Node
Direct Known Subclasses:
AbstractMapJoinOperator, JoinOperator

public abstract class CommonJoinOperator<T extends JoinDesc>
extends Operator<T>
implements Serializable

Join operator implementation.

See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.hive.ql.exec.Operator
Operator.OperatorFunc, Operator.ProgressCounter, Operator.State
 
Field Summary
protected  Byte alias
           
protected  short[] aliasFilterTags
          On filterTags ANDed value of all filter tags in current join group if any of values passes on outer join alias (which makes zero for the tag alias), it means there exists a pair for it and safely regarded as a inner join for example, with table a, b something like, a = 100, 10 | 100, 20 | 100, 30 b = 100, 10 | 100, 20 | 100, 30 the query "a FO b ON a.k=b.k AND a.v>10 AND b.v>30" makes filter map 0(a) = [1(b),1] : a.v>10 1(b) = [0(a),1] : b.v>30 for filtered rows in a (100,10) create a-NULL for filtered rows in b (100,10) (100,20) (100,30) create NULL-b with 0(a) = [1(b),1] : a.v>10 100, 10 = 00000010 (filtered) 100, 20 = 00000000 (valid) 100, 30 = 00000000 (valid) ------------------------- sum = 00000000 : for valid rows in b, there is at least one pair in a with 1(b) = [0(a),1] : b.v>30 100, 10 = 00000001 (filtered) 100, 20 = 00000001 (filtered) 100, 30 = 00000001 (filtered) ------------------------- sum = 00000001 : for valid rows in a (100,20) (100,30), there is no pair in b result : 100, 10 : N, N N, N : 100, 10 N, N : 100, 20 N, N : 100, 30 100, 20 : N, N 100, 30 : N, N
protected  JoinCondDesc[] condn
           
protected  int countAfterReport
           
protected  ArrayList<Object>[] dummyObj
           
protected  RowContainer<List<Object>>[] dummyObjVectors
           
protected  int[][] filterMaps
           
protected  short[] filterTags
           
protected  Object[] forwardCache
           
protected  int heartbeatInterval
           
protected  List[] intermediate
           
protected  List<ObjectInspector>[] joinFilterObjectInspectors
          The ObjectInspectors for join filters.
protected  List<ExprNodeEvaluator>[] joinFilters
          The filters for join
protected  List<ExprNodeEvaluator>[] joinValues
          The expressions for join inputs.
protected  List<ObjectInspector>[] joinValuesObjectInspectors
          The ObjectInspectors for the join inputs.
protected  List<ObjectInspector>[] joinValuesStandardObjectInspectors
          The standard ObjectInspectors for the join inputs.
protected static org.apache.commons.logging.Log LOG
           
 boolean noOuterJoin
           
protected static int NOTSKIPBIGTABLE
           
protected  boolean[] nullsafes
           
protected  int numAliases
           
protected  int[] offsets
           
protected  Byte[] order
           
protected  List<ObjectInspector>[] rowContainerStandardObjectInspectors
          The standard ObjectInspectors for the row container.
protected  boolean[][] skipVectors
           
protected  TableDesc[] spillTableDesc
           
protected  int totalSz
           
 
Fields inherited from class org.apache.hadoop.hive.ql.exec.Operator
beginTime, childOperators, childOperatorsArray, childOperatorsTag, colExprMap, conf, counterNames, counterNameToEnum, counters, done, fatalErrorCntr, groupKeyObject, id, inputObjInspectors, inputRows, isLogInfoEnabled, numInputRowsCntr, numOutputRowsCntr, operatorId, out, outputObjInspector, outputRows, parentOperators, reporter, state, statsMap, timeTakenCntr, totalTime
 
Constructor Summary
CommonJoinOperator()
           
CommonJoinOperator(CommonJoinOperator<T> clone)
           
 
Method Summary
protected  void checkAndGenObject()
           
 void closeOp(boolean abort)
          All done.
 void endGroup()
          Forward a record of join results.
protected  ArrayList<Object> getFilteredValue(byte alias, Object row)
           
protected  short getFilterTag(List<Object> row)
           
protected static
<T extends JoinDesc>
ObjectInspector
getJoinOutputObjectInspector(Byte[] order, List<ObjectInspector>[] aliasToObjectInspectors, T conf)
           
 String getName()
          Implements the getName function for the Node Interface.
protected  long getNextSize(long sz)
           
static String getOperatorName()
           
 Map<Integer,Set<String>> getPosToAliasMap()
           
protected  boolean hasFilter(int alias)
           
protected  void initializeOp(Configuration hconf)
          Operator specific initialization.
 boolean opAllowedAfterMapJoin()
           
 boolean opAllowedBeforeMapJoin()
           
protected  void reportProgress()
           
 void setPosToAliasMap(Map<Integer,Set<String>> posToAliasMap)
           
 void startGroup()
           
 
Methods inherited from class org.apache.hadoop.hive.ql.exec.Operator
acceptLimitPushdown, allInitializedParentsAreClosed, areAllParentsInitialized, assignCounterNameToEnum, augmentPlan, checkFatalErrors, cleanUpInputFileChanged, cleanUpInputFileChangedOp, clone, close, columnNamesRowResolvedCanBeObtained, dump, dump, fatalErrorMessage, flush, forward, getAdditionalCounters, getChildOperators, getChildren, getColumnExprMap, getConf, getConfiguration, getCounterNames, getCounterNameToEnum, getCounters, getDone, getExecContext, getGroupKeyObject, getIdentifier, getInputObjInspectors, getNextCntr, getNumChild, getNumParent, getOperatorId, getParentOperators, getSchema, getStats, getType, getWrappedCounterName, incrCounter, initEvaluators, initEvaluators, initEvaluatorsAndReturnStruct, initialize, initialize, initializeChildren, initializeCounters, initializeLocalWork, initOperatorId, isUseBucketizedHiveInputFormat, jobClose, jobCloseOp, logStats, opAllowedBeforeSortMergeJoin, opAllowedConvertMapJoin, passExecContext, preorderMap, process, processGroup, processOp, removeChild, removeChildAndAdoptItsChildren, removeChildren, removeParent, replaceChild, replaceParent, reset, resetId, resetLastEnumUsed, resetStats, setAlias, setChildOperators, setColumnExprMap, setConf, setCounterNames, setCounterNameToEnum, setDone, setExecContext, setGroupKeyObject, setId, setInputObjInspectors, setOperatorId, setOutputCollector, setParentOperators, setReporter, setSchema, setUseBucketizedHiveInputFormat, supportAutomaticSortMergeJoin, supportSkewJoinOptimization, supportUnionRemoveOptimization, toString, toString, updateCounters
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

LOG

protected static final org.apache.commons.logging.Log LOG

numAliases

protected transient int numAliases

joinValues

protected transient List<ExprNodeEvaluator>[] joinValues
The expressions for join inputs.


joinFilters

protected transient List<ExprNodeEvaluator>[] joinFilters
The filters for join


filterMaps

protected transient int[][] filterMaps

joinValuesObjectInspectors

protected transient List<ObjectInspector>[] joinValuesObjectInspectors
The ObjectInspectors for the join inputs.


joinFilterObjectInspectors

protected transient List<ObjectInspector>[] joinFilterObjectInspectors
The ObjectInspectors for join filters.


joinValuesStandardObjectInspectors

protected transient List<ObjectInspector>[] joinValuesStandardObjectInspectors
The standard ObjectInspectors for the join inputs.


rowContainerStandardObjectInspectors

protected transient List<ObjectInspector>[] rowContainerStandardObjectInspectors
The standard ObjectInspectors for the row container.


order

protected transient Byte[] order

condn

protected transient JoinCondDesc[] condn

nullsafes

protected transient boolean[] nullsafes

noOuterJoin

public transient boolean noOuterJoin

dummyObj

protected transient ArrayList<Object>[] dummyObj

dummyObjVectors

protected transient RowContainer<List<Object>>[] dummyObjVectors

totalSz

protected transient int totalSz

spillTableDesc

protected transient TableDesc[] spillTableDesc

countAfterReport

protected transient int countAfterReport

heartbeatInterval

protected transient int heartbeatInterval

NOTSKIPBIGTABLE

protected static final int NOTSKIPBIGTABLE
See Also:
Constant Field Values

alias

protected transient Byte alias

forwardCache

protected transient Object[] forwardCache

offsets

protected transient int[] offsets

skipVectors

protected transient boolean[][] skipVectors

intermediate

protected transient List[] intermediate

filterTags

protected transient short[] filterTags

aliasFilterTags

protected transient short[] aliasFilterTags
On filterTags ANDed value of all filter tags in current join group if any of values passes on outer join alias (which makes zero for the tag alias), it means there exists a pair for it and safely regarded as a inner join for example, with table a, b something like, a = 100, 10 | 100, 20 | 100, 30 b = 100, 10 | 100, 20 | 100, 30 the query "a FO b ON a.k=b.k AND a.v>10 AND b.v>30" makes filter map 0(a) = [1(b),1] : a.v>10 1(b) = [0(a),1] : b.v>30 for filtered rows in a (100,10) create a-NULL for filtered rows in b (100,10) (100,20) (100,30) create NULL-b with 0(a) = [1(b),1] : a.v>10 100, 10 = 00000010 (filtered) 100, 20 = 00000000 (valid) 100, 30 = 00000000 (valid) ------------------------- sum = 00000000 : for valid rows in b, there is at least one pair in a with 1(b) = [0(a),1] : b.v>30 100, 10 = 00000001 (filtered) 100, 20 = 00000001 (filtered) 100, 30 = 00000001 (filtered) ------------------------- sum = 00000001 : for valid rows in a (100,20) (100,30), there is no pair in b result : 100, 10 : N, N N, N : 100, 10 N, N : 100, 20 N, N : 100, 30 100, 20 : N, N 100, 30 : N, N

Constructor Detail

CommonJoinOperator

public CommonJoinOperator()

CommonJoinOperator

public CommonJoinOperator(CommonJoinOperator<T> clone)
Method Detail

getJoinOutputObjectInspector

protected static <T extends JoinDesc> ObjectInspector getJoinOutputObjectInspector(Byte[] order,
                                                                                   List<ObjectInspector>[] aliasToObjectInspectors,
                                                                                   T conf)

initializeOp

protected void initializeOp(Configuration hconf)
                     throws HiveException
Description copied from class: Operator
Operator specific initialization.

Overrides:
initializeOp in class Operator<T extends JoinDesc>
Throws:
HiveException

startGroup

public void startGroup()
                throws HiveException
Overrides:
startGroup in class Operator<T extends JoinDesc>
Throws:
HiveException

getNextSize

protected long getNextSize(long sz)

getFilteredValue

protected ArrayList<Object> getFilteredValue(byte alias,
                                             Object row)
                                      throws HiveException
Throws:
HiveException

hasFilter

protected final boolean hasFilter(int alias)

getFilterTag

protected final short getFilterTag(List<Object> row)

endGroup

public void endGroup()
              throws HiveException
Forward a record of join results.

Overrides:
endGroup in class Operator<T extends JoinDesc>
Throws:
HiveException

checkAndGenObject

protected void checkAndGenObject()
                          throws HiveException
Throws:
HiveException

reportProgress

protected void reportProgress()

closeOp

public void closeOp(boolean abort)
             throws HiveException
All done.

Overrides:
closeOp in class Operator<T extends JoinDesc>
Throws:
HiveException

getName

public String getName()
Description copied from class: Operator
Implements the getName function for the Node Interface.

Specified by:
getName in interface Node
Overrides:
getName in class Operator<T extends JoinDesc>
Returns:
the name of the operator

getOperatorName

public static String getOperatorName()

getPosToAliasMap

public Map<Integer,Set<String>> getPosToAliasMap()
Returns:
the posToAliasMap

setPosToAliasMap

public void setPosToAliasMap(Map<Integer,Set<String>> posToAliasMap)
Parameters:
posToAliasMap - the posToAliasMap to set

opAllowedBeforeMapJoin

public boolean opAllowedBeforeMapJoin()
Overrides:
opAllowedBeforeMapJoin in class Operator<T extends JoinDesc>

opAllowedAfterMapJoin

public boolean opAllowedAfterMapJoin()
Overrides:
opAllowedAfterMapJoin in class Operator<T extends JoinDesc>


Copyright © 2012 The Apache Software Foundation