datafu.pig.date
Class TimeCount

java.lang.Object
  extended by org.apache.pig.EvalFunc<T>
      extended by datafu.pig.util.SimpleEvalFunc<java.lang.Long>
          extended by datafu.pig.date.TimeCount

public class TimeCount
extends SimpleEvalFunc<java.lang.Long>

Performs a count of events, ignoring events which occur within the same time window. For events to occur within separate time windows they must be separated by at least the specified time span.

This is useful for tasks such as counting the number of page views per user since it: a) prevent reloads and go-backs from overcounting actual views b) captures the notion that views across multiple sessions are more meaningful

Input must be sorted ascendingly by time for this UDF to work.

Example:

 %declare TIME_WINDOW  10m
 
 define TimeCount datafu.pig.date.TimeCount('$TIME_WINDOW');
 
 views = LOAD 'views' as (user_id:int, page_id:int, time:chararray);
 views_grouped = GROUP views by (user_id, page_id);
 view_counts = FOREACH views_grouped { 
   views = order views by time;
   generate group.user_id as user_id, 
            group.page_id as page_id, 
            TimeCount(views.(time)) as count; }
 
 


Field Summary
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
TimeCount(java.lang.String timeSpec)
           
 
Method Summary
 java.lang.Long call(org.apache.pig.data.DataBag bag)
           
 
Methods inherited from class datafu.pig.util.SimpleEvalFunc
exec, getReturnType
 
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getLogger, getPigLogger, getReporter, getSchemaName, isAsynchronous, outputSchema, progress, setPigLogger, setReporter, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TimeCount

public TimeCount(java.lang.String timeSpec)
Method Detail

call

public java.lang.Long call(org.apache.pig.data.DataBag bag)
                    throws java.io.IOException
Throws:
java.io.IOException


Matthew Hayes, Sam Shah