org.apache.pig.backend.hadoop.executionengine.mapReduceLayer
Class CombinerOptimizer
java.lang.Object
org.apache.pig.impl.plan.PlanVisitor<MapReduceOper,MROperPlan>
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.MROpPlanVisitor
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer
public class CombinerOptimizer
- extends MROpPlanVisitor
Optimize map reduce plans to use the combiner where possible.
Currently Foreach is copied to the combiner phase if it does not contain a
nested plan and all UDFs in the generate statement are algebraic.
The version of the foreach in the combiner
stage will use the initial function, and the version in the reduce stage
will be changed to use the final function.
Major areas for enhancement:
1) Currently, scripts such as:
B = group A by $0;
C = foreach B {
C1 = distinct A;
generate group, COUNT(C1);
}
do not use the combiner. The issue is being able to properly decompose
the expression in the UDF's plan. The current code just takes whatever is
the argument to the algebraic UDF and replaces it with a project. This
works for things like generate group, SUM(A.$1 + 1). But it fails for
things like the above. Certain types of inner plans will never be
movable (like filters). But distinct or order by in the inner plan
should be mobile. And, things like:
C = cogroup A by $0, B by $0;
D = foreach C {
D1 = distinct A;
D2 = distinct B;
generate UDF(D1 + D2);
}
make it even harder. The first step is probably just to handle queries
like the first above, as they will probably be the most common.
2) Scripts such as:
B = group A by $0;
C = foreach B generate algebraic(A), nonalgebraic(A);
currently aren't moved into the combiner, even though they could be.
Again, the trick here is properly decomposing the plan since A may be more
than a simply projection.
#2 should probably be the next area of focus.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CombinerOptimizer
public CombinerOptimizer(MROperPlan plan,
String chunkSize)
CombinerOptimizer
public CombinerOptimizer(MROperPlan plan,
String chunkSize,
CompilationMessageCollector messageCollector)
getMessageCollector
public CompilationMessageCollector getMessageCollector()
visitMROp
public void visitMROp(MapReduceOper mr)
throws VisitorException
- Overrides:
visitMROp
in class MROpPlanVisitor
- Throws:
VisitorException
Copyright © ${year} The Apache Software Foundation