datafu.pig.date
Class TimeCount
java.lang.Object
org.apache.pig.EvalFunc<T>
datafu.pig.util.SimpleEvalFunc<java.lang.Long>
datafu.pig.date.TimeCount
public class TimeCount
- extends SimpleEvalFunc<java.lang.Long>
Performs a count of events, ignoring events which occur within the
same time window. For events to occur within separate time windows they
must be separated by at least the specified time span.
This is useful for tasks such as counting the number of page views per user since it:
a) prevent reloads and go-backs from overcounting actual views
b) captures the notion that views across multiple sessions are more meaningful
Input must be sorted ascendingly by time for this UDF to work.
Example:
%declare TIME_WINDOW 10m
define TimeCount datafu.pig.date.TimeCount('$TIME_WINDOW');
views = LOAD 'views' as (user_id:int, page_id:int, time:chararray);
views_grouped = GROUP views by (user_id, page_id);
view_counts = FOREACH views_grouped {
views = order views by time;
generate group.user_id as user_id,
group.page_id as page_id,
TimeCount(views.(time)) as count; }
Fields inherited from class org.apache.pig.EvalFunc |
log, pigLogger, reporter, returnType |
Constructor Summary |
TimeCount(java.lang.String timeSpec)
|
Method Summary |
java.lang.Long |
call(org.apache.pig.data.DataBag bag)
|
Methods inherited from class org.apache.pig.EvalFunc |
finish, getArgToFuncMapping, getCacheFiles, getLogger, getPigLogger, getReporter, getSchemaName, isAsynchronous, outputSchema, progress, setPigLogger, setReporter, warn |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TimeCount
public TimeCount(java.lang.String timeSpec)
call
public java.lang.Long call(org.apache.pig.data.DataBag bag)
throws java.io.IOException
- Throws:
java.io.IOException
Matthew Hayes, Sam Shah