|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pig.builtin.Utf8StorageConverter
org.apache.pig.builtin.PigStorage
public class PigStorage
A load function that parses a line of input into fields using a delimiter to set the fields. The delimiter is given as a regular expression. See String.split(delimiter) and http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html for more information.
Field Summary | |
---|---|
protected BufferedPositionedInputStream |
in
|
protected org.apache.commons.logging.Log |
mLog
|
Fields inherited from class org.apache.pig.builtin.Utf8StorageConverter |
---|
mBagFactory, mTupleFactory |
Constructor Summary | |
---|---|
PigStorage()
|
|
PigStorage(String delimiter)
Constructs a Pig loader that uses specified regex as a field delimiter. |
Method Summary | |
---|---|
void |
bindTo(OutputStream os)
Specifies the OutputStream to write to. |
void |
bindTo(String fileName,
BufferedPositionedInputStream in,
long offset,
long end)
Specifies a portion of an InputStream to read tuples. |
Schema |
determineSchema(String fileName,
ExecType execType,
DataStorage storage)
Find the schema from the loader. |
boolean |
equals(Object obj)
|
boolean |
equals(PigStorage other)
|
void |
fieldsToRead(Schema schema)
Indicate to the loader fields that will be needed. |
void |
finish()
Do any kind of post processing because the last tuple has been stored. |
Tuple |
getNext()
Retrieves the next tuple to be processed. |
void |
putNext(Tuple f)
Write a tuple the output stream to which this instance was previously bound. |
Methods inherited from class org.apache.pig.builtin.Utf8StorageConverter |
---|
bytesToBag, bytesToCharArray, bytesToDouble, bytesToFloat, bytesToInteger, bytesToLong, bytesToMap, bytesToTuple, toBytes, toBytes, toBytes, toBytes, toBytes, toBytes, toBytes, toBytes |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.pig.LoadFunc |
---|
bytesToBag, bytesToCharArray, bytesToDouble, bytesToFloat, bytesToInteger, bytesToLong, bytesToMap, bytesToTuple |
Field Detail |
---|
protected BufferedPositionedInputStream in
protected final org.apache.commons.logging.Log mLog
Constructor Detail |
---|
public PigStorage()
public PigStorage(String delimiter)
delimiter
- the single byte character that is used to separate fields.
("\t" is the default.)Method Detail |
---|
public Tuple getNext() throws IOException
LoadFunc
getNext
in interface LoadFunc
IOException
public void bindTo(String fileName, BufferedPositionedInputStream in, long offset, long end) throws IOException
LoadFunc
A common way of handling slices in the middle of records is to start at the given offset and, if the offset is not zero, skip to the end of the first record (which may be a partial record) before reading tuples. Reading continues until a tuple has been read that ends at an offset past the ending offset.
The load function should not do any buffering on the input stream. Buffering will cause the offsets returned by is.getPos() to be unreliable.
bindTo
in interface LoadFunc
fileName
- the name of the file to be readin
- the stream representing the file to be processed, and which can also provide its position.offset
- the offset to start reading tuples.end
- the ending offset for reading.
IOException
public void bindTo(OutputStream os) throws IOException
StoreFunc
bindTo
in interface StoreFunc
os
- The stream to write tuples to.
IOException
public void putNext(Tuple f) throws IOException
StoreFunc
putNext
in interface StoreFunc
f
- the tuple to store.
IOException
public void finish() throws IOException
StoreFunc
finish
in interface StoreFunc
IOException
public Schema determineSchema(String fileName, ExecType execType, DataStorage storage) throws IOException
LoadFunc
determineSchema
in interface LoadFunc
fileName
- Name of the file to be read.(this will be the same as the filename
in the "load statement of the script)execType
- - execution mode of the pig script - one of ExecType.LOCAL or ExecType.MAPREDUCEstorage
- - the DataStorage object corresponding to the execType
IOException
public void fieldsToRead(Schema schema)
LoadFunc
fieldsToRead
in interface LoadFunc
schema
- Schema indicating which columns will be needed.public boolean equals(Object obj)
equals
in class Object
public boolean equals(PigStorage other)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |