datafu.pig.util
Class SimpleEvalFunc<T>
java.lang.Object
org.apache.pig.EvalFunc<T>
datafu.pig.util.SimpleEvalFunc<T>
- Direct Known Subclasses:
- AliasBagFields, AppendToBag, BoolToInt, Enumerate, HaversineDistInMiles, IntToBool, MD5, MD5Base64, PrependToBag, Quantile, RandInt, StreamingQuantile, TimeCount, UserAgentClassify, WilsonBinConf
public abstract class SimpleEvalFunc<T>
- extends org.apache.pig.EvalFunc<T>
Uses reflection to makes writing simple wrapper Pig UDFs easier.
For example, writing a simple string trimming UDF might look like
this:
package datafu.pig.util;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.WrappedIOException;
public class TRIM extends EvalFunc<String>
{
public String exec(Tuple input) throws IOException
{
if (input.size() != 1)
throw new IllegalArgumentException("requires a parameter");
try {
Object o = input.get(0);
if (!(o instanceof String))
throw new IllegalArgumentException("expected a string");
String str = (String)o;
return (str == null) ? null : str.trim();
}
catch (Exception e) {
throw WrappedIOException.wrap("error...", e);
}
}
}
There is a lot of boilerplate to check the number of arguments and
the parameter types in the tuple.
Instead, with this class, you can derive from SimpleEvalFunc and
create a call()
method (not exec!), just specifying the
arguments as a regular function. The class handles all the argument
checking and exception wrapping for you. So your code would be:
package datafu.pig.util;
public class TRIM2 extends SimpleEvalFunc<String>
{
public String call(String s)
{
return (s != null) ? s.trim() : null;
}
}
An example of this UDF in action with Pig:
grunt> a = load 'test' as (x:chararray, y:chararray); dump a;
(1 , 2)
grunt> b = foreach a generate TRIM2(x); dump b;
(1)
grunt> c = foreach a generate TRIM2((int)x); dump c;
datafu.pig.util.TRIM2(java.lang.String): argument type
mismatch [#1]; expected java.lang.String, got java.lang.Integer
grunt> d = foreach a generate TRIM2(x, y); dump d;
datafu.pig.util.TRIM2(java.lang.String): got 2 arguments,
expected 1.
Fields inherited from class org.apache.pig.EvalFunc |
log, pigLogger, reporter, returnType |
Method Summary |
T |
exec(org.apache.pig.data.Tuple input)
|
java.lang.reflect.Type |
getReturnType()
|
Methods inherited from class org.apache.pig.EvalFunc |
finish, getArgToFuncMapping, getCacheFiles, getLogger, getPigLogger, getReporter, getSchemaName, isAsynchronous, outputSchema, progress, setPigLogger, setReporter, warn |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SimpleEvalFunc
public SimpleEvalFunc()
getReturnType
public java.lang.reflect.Type getReturnType()
- Overrides:
getReturnType
in class org.apache.pig.EvalFunc<T>
exec
public T exec(org.apache.pig.data.Tuple input)
throws java.io.IOException
- Specified by:
exec
in class org.apache.pig.EvalFunc<T>
- Throws:
java.io.IOException
Matthew Hayes, Sam Shah