datafu.pig.linkanalysis
Class PageRank
java.lang.Object
org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
datafu.pig.linkanalysis.PageRank
- All Implemented Interfaces:
- org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
public class PageRank
- extends org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
- implements org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
A UDF which implements PageRank.
Each graph is stored in memory while running the algorithm, with edges optionally
spilled to disk to conserve memory. This can be used to distribute the execution of PageRank on a large number of
reasonable sized graphs. It does not distribute execuion of PageRank on a single graph. Each graph is identified
by an integer valued topic ID.
Example:
topic_edges = LOAD 'input_edges' as (topic:INT,source:INT,dest:INT,weight:DOUBLE);
topic_edges_grouped = GROUP topic_edges by (topic, source) ;
topic_edges_grouped = FOREACH topic_edges_grouped GENERATE
group.topic as topic,
group.source as source,
topic_edges.(dest,weight) as edges;
topic_edges_grouped_by_topic = GROUP topic_edges_grouped BY topic;
topic_ranks = FOREACH topic_edges_grouped_by_topic GENERATE
group as topic,
FLATTEN(PageRank(topic_edges_grouped.(source,edges))) as (source,rank);
skill_ranks = FOREACH skill_ranks GENERATE
topic, source, rank;
Fields inherited from class org.apache.pig.EvalFunc |
log, pigLogger, reporter, returnType |
Method Summary |
void |
accumulate(org.apache.pig.data.Tuple t)
|
void |
cleanup()
|
org.apache.pig.data.DataBag |
exec(org.apache.pig.data.Tuple input)
|
org.apache.pig.data.DataBag |
getValue()
|
org.apache.pig.impl.logicalLayer.schema.Schema |
outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
|
Methods inherited from class org.apache.pig.EvalFunc |
finish, getArgToFuncMapping, getCacheFiles, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setPigLogger, setReporter, warn |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
PageRank
public PageRank()
PageRank
public PageRank(java.lang.String... parameters)
accumulate
public void accumulate(org.apache.pig.data.Tuple t)
throws java.io.IOException
- Specified by:
accumulate
in interface org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
- Throws:
java.io.IOException
getValue
public org.apache.pig.data.DataBag getValue()
- Specified by:
getValue
in interface org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
cleanup
public void cleanup()
- Specified by:
cleanup
in interface org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
exec
public org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple input)
throws java.io.IOException
- Specified by:
exec
in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
- Throws:
java.io.IOException
outputSchema
public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
- Overrides:
outputSchema
in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
Matthew Hayes, Sam Shah