org.apache.hadoop.hive.ql.io
Class HiveFileFormatUtils

java.lang.Object
  extended by org.apache.hadoop.hive.ql.io.HiveFileFormatUtils

public final class HiveFileFormatUtils
extends Object

An util class for various Hive file format tasks. registerOutputFormatSubstitute(Class, Class) getOutputFormatSubstitute(Class) are added for backward compatibility. They return the newly added HiveOutputFormat for the older ones.


Method Summary
static boolean checkInputFormat(org.apache.hadoop.fs.FileSystem fs, HiveConf conf, Class<? extends org.apache.hadoop.mapred.InputFormat> inputFormatCls, ArrayList<org.apache.hadoop.fs.FileStatus> files)
          checks if files are in same format as the given input format.
static List<String> doGetAliasesFromPath(Map<String,ArrayList<String>> pathToAliases, org.apache.hadoop.fs.Path dir)
          Get the list of aliases from the opeerator tree that are needed for the path
static List<Operator<? extends Serializable>> doGetWorksFromPath(Map<String,ArrayList<String>> pathToAliases, Map<String,Operator<? extends Serializable>> aliasToWork, org.apache.hadoop.fs.Path dir)
          Get the list of operatators from the opeerator tree that are needed for the path
static FileSinkOperator.RecordWriter getHiveRecordWriter(org.apache.hadoop.mapred.JobConf jc, TableDesc tableInfo, Class<? extends org.apache.hadoop.io.Writable> outputClass, FileSinkDesc conf, org.apache.hadoop.fs.Path outPath)
           
static Class<? extends InputFormatChecker> getInputFormatChecker(Class<?> inputFormat)
          get an InputFormatChecker for a file format.
static org.apache.hadoop.fs.Path getOutputFormatFinalPath(org.apache.hadoop.fs.Path parent, String taskId, org.apache.hadoop.mapred.JobConf jc, HiveOutputFormat<?,?> hiveOutputFormat, boolean isCompressed, org.apache.hadoop.fs.Path defaultFinalPath)
          Deprecated.  
static Class<? extends HiveOutputFormat> getOutputFormatSubstitute(Class<?> origin)
          get a OutputFormat's substitute HiveOutputFormat.
static PartitionDesc getPartitionDescFromPathRecursively(Map<String,PartitionDesc> pathToPartitionInfo, org.apache.hadoop.fs.Path dir, Map<Map<String,PartitionDesc>,Map<String,PartitionDesc>> cacheMap)
           
static PartitionDesc getPartitionDescFromPathRecursively(Map<String,PartitionDesc> pathToPartitionInfo, org.apache.hadoop.fs.Path dir, Map<Map<String,PartitionDesc>,Map<String,PartitionDesc>> cacheMap, boolean ignoreSchema)
           
static FileSinkOperator.RecordWriter getRecordWriter(org.apache.hadoop.mapred.JobConf jc, HiveOutputFormat<?,?> hiveOutputFormat, Class<? extends org.apache.hadoop.io.Writable> valueClass, boolean isCompressed, Properties tableProp, org.apache.hadoop.fs.Path outPath)
           
static void registerInputFormatChecker(Class<? extends org.apache.hadoop.mapred.InputFormat> format, Class<? extends InputFormatChecker> checker)
          register an InputFormatChecker for a given InputFormat.
static void registerOutputFormatSubstitute(Class<? extends org.apache.hadoop.mapred.OutputFormat> origin, Class<? extends HiveOutputFormat> substitute)
          register a substitute.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

registerOutputFormatSubstitute

public static void registerOutputFormatSubstitute(Class<? extends org.apache.hadoop.mapred.OutputFormat> origin,
                                                  Class<? extends HiveOutputFormat> substitute)
register a substitute.

Parameters:
origin - the class that need to be substituted
substitute -

getOutputFormatSubstitute

public static Class<? extends HiveOutputFormat> getOutputFormatSubstitute(Class<?> origin)
get a OutputFormat's substitute HiveOutputFormat.


getOutputFormatFinalPath

@Deprecated
public static org.apache.hadoop.fs.Path getOutputFormatFinalPath(org.apache.hadoop.fs.Path parent,
                                                                            String taskId,
                                                                            org.apache.hadoop.mapred.JobConf jc,
                                                                            HiveOutputFormat<?,?> hiveOutputFormat,
                                                                            boolean isCompressed,
                                                                            org.apache.hadoop.fs.Path defaultFinalPath)
                                                          throws IOException
Deprecated. 

get the final output path of a given FileOutputFormat.

Parameters:
parent - parent dir of the expected final output path
jc - job configuration
Throws:
IOException

registerInputFormatChecker

public static void registerInputFormatChecker(Class<? extends org.apache.hadoop.mapred.InputFormat> format,
                                              Class<? extends InputFormatChecker> checker)
register an InputFormatChecker for a given InputFormat.

Parameters:
format - the class that need to be substituted
checker -

getInputFormatChecker

public static Class<? extends InputFormatChecker> getInputFormatChecker(Class<?> inputFormat)
get an InputFormatChecker for a file format.


checkInputFormat

public static boolean checkInputFormat(org.apache.hadoop.fs.FileSystem fs,
                                       HiveConf conf,
                                       Class<? extends org.apache.hadoop.mapred.InputFormat> inputFormatCls,
                                       ArrayList<org.apache.hadoop.fs.FileStatus> files)
                                throws HiveException
checks if files are in same format as the given input format.

Throws:
HiveException

getHiveRecordWriter

public static FileSinkOperator.RecordWriter getHiveRecordWriter(org.apache.hadoop.mapred.JobConf jc,
                                                                TableDesc tableInfo,
                                                                Class<? extends org.apache.hadoop.io.Writable> outputClass,
                                                                FileSinkDesc conf,
                                                                org.apache.hadoop.fs.Path outPath)
                                                         throws HiveException
Throws:
HiveException

getRecordWriter

public static FileSinkOperator.RecordWriter getRecordWriter(org.apache.hadoop.mapred.JobConf jc,
                                                            HiveOutputFormat<?,?> hiveOutputFormat,
                                                            Class<? extends org.apache.hadoop.io.Writable> valueClass,
                                                            boolean isCompressed,
                                                            Properties tableProp,
                                                            org.apache.hadoop.fs.Path outPath)
                                                     throws IOException,
                                                            HiveException
Throws:
IOException
HiveException

getPartitionDescFromPathRecursively

public static PartitionDesc getPartitionDescFromPathRecursively(Map<String,PartitionDesc> pathToPartitionInfo,
                                                                org.apache.hadoop.fs.Path dir,
                                                                Map<Map<String,PartitionDesc>,Map<String,PartitionDesc>> cacheMap)
                                                         throws IOException
Throws:
IOException

getPartitionDescFromPathRecursively

public static PartitionDesc getPartitionDescFromPathRecursively(Map<String,PartitionDesc> pathToPartitionInfo,
                                                                org.apache.hadoop.fs.Path dir,
                                                                Map<Map<String,PartitionDesc>,Map<String,PartitionDesc>> cacheMap,
                                                                boolean ignoreSchema)
                                                         throws IOException
Throws:
IOException

doGetWorksFromPath

public static List<Operator<? extends Serializable>> doGetWorksFromPath(Map<String,ArrayList<String>> pathToAliases,
                                                                        Map<String,Operator<? extends Serializable>> aliasToWork,
                                                                        org.apache.hadoop.fs.Path dir)
Get the list of operatators from the opeerator tree that are needed for the path

Parameters:
pathToAliases - mapping from path to aliases
aliasToWork - The operator tree to be invoked for a given alias
dir - The path to look for

doGetAliasesFromPath

public static List<String> doGetAliasesFromPath(Map<String,ArrayList<String>> pathToAliases,
                                                org.apache.hadoop.fs.Path dir)
Get the list of aliases from the opeerator tree that are needed for the path

Parameters:
pathToAliases - mapping from path to aliases
dir - The path to look for


Copyright © 2011 The Apache Software Foundation