org.apache.hadoop.hive.ql.io.orc
Interface Reader


public interface Reader

The interface for reading ORC files. One Reader can support multiple concurrent RecordReader.


Method Summary
 CompressionKind getCompression()
          Get the compression kind.
 int getCompressionSize()
          Get the buffer size for the compression.
 long getContentLength()
          Get the length of the file.
 Iterable<String> getMetadataKeys()
          Get the user metadata keys.
 ByteBuffer getMetadataValue(String key)
          Get a user metadata value.
 long getNumberOfRows()
          Get the number of rows in the file.
 ObjectInspector getObjectInspector()
          Get the object inspector for looking at the objects.
 int getRowIndexStride()
          Get the number of rows per a entry in the row index.
 ColumnStatistics[] getStatistics()
          Get the statistics about the columns in the file.
 Iterable<StripeInformation> getStripes()
          Get the list of stripes.
 List<org.apache.hadoop.hive.ql.io.orc.OrcProto.Type> getTypes()
          Get the list of types contained in the file.
 RecordReader rows(boolean[] include)
          Create a RecordReader that will scan the entire file.
 RecordReader rows(long offset, long length, boolean[] include)
          Deprecated.  
 RecordReader rows(long offset, long length, boolean[] include, SearchArgument sarg, String[] neededColumns)
          Create a RecordReader that will read a section of a file.
 

Method Detail

getNumberOfRows

long getNumberOfRows()
Get the number of rows in the file.

Returns:
the number of rows

getMetadataKeys

Iterable<String> getMetadataKeys()
Get the user metadata keys.

Returns:
the set of metadata keys

getMetadataValue

ByteBuffer getMetadataValue(String key)
Get a user metadata value.

Parameters:
key - a key given by the user
Returns:
the bytes associated with the given key

getCompression

CompressionKind getCompression()
Get the compression kind.

Returns:
the kind of compression in the file

getCompressionSize

int getCompressionSize()
Get the buffer size for the compression.

Returns:
number of bytes to buffer for the compression codec.

getRowIndexStride

int getRowIndexStride()
Get the number of rows per a entry in the row index.

Returns:
the number of rows per an entry in the row index or 0 if there is no row index.

getStripes

Iterable<StripeInformation> getStripes()
Get the list of stripes.

Returns:
the information about the stripes in order

getObjectInspector

ObjectInspector getObjectInspector()
Get the object inspector for looking at the objects.

Returns:
an object inspector for each row returned

getContentLength

long getContentLength()
Get the length of the file.

Returns:
the number of bytes in the file

getStatistics

ColumnStatistics[] getStatistics()
Get the statistics about the columns in the file.

Returns:
the information about the column

getTypes

List<org.apache.hadoop.hive.ql.io.orc.OrcProto.Type> getTypes()
Get the list of types contained in the file. The root type is the first type in the list.

Returns:
the list of flattened types

rows

RecordReader rows(boolean[] include)
                  throws IOException
Create a RecordReader that will scan the entire file.

Parameters:
include - true for each column that should be included
Returns:
A new RecordReader
Throws:
IOException

rows

RecordReader rows(long offset,
                  long length,
                  boolean[] include)
                  throws IOException
Deprecated. 

Create a RecordReader that will start reading at the first stripe after offset up to the stripe that starts at offset + length. This is intended to work with MapReduce's FileInputFormat where divisions are picked blindly, but they must cover all of the rows.

Parameters:
offset - a byte offset in the file
length - a number of bytes in the file
include - true for each column that should be included
Returns:
a new RecordReader that will read the specified rows.
Throws:
IOException

rows

RecordReader rows(long offset,
                  long length,
                  boolean[] include,
                  SearchArgument sarg,
                  String[] neededColumns)
                  throws IOException
Create a RecordReader that will read a section of a file. It starts reading at the first stripe after the offset and continues to the stripe that starts at offset + length. It also accepts a list of columns to read and a search argument.

Parameters:
offset - the minimum offset of the first stripe to read
length - the distance from offset of the first address to stop reading at
include - true for each column that should be included
sarg - a search argument that limits the rows that should be read.
neededColumns - the names of the included columns
Returns:
the record reader for the rows
Throws:
IOException


Copyright © 2012 The Apache Software Foundation