RepoNameRelease DateTarballsAptYum
StableCDH1March 2009/cdh/stable/debian/redhat/cdh/stable
TestingCDH2August 2009/cdh/testing/debian/redhat/cdh/testing
Cloudera logo

3.7. Controlling the Input Format

Record classes generated by Sqoop include both a toString() method that formats output records, and a parse() method that interprets text based on an input delimiter set. The input delimiters default to the same ones chosen for output delimiters, but you can override these settings to support converting from one set of delimiters to another.

The following arguments allow you to control the input format of records:

—input-fields-terminated-by (char)
Sets the input field separator
—input-lines-terminated-by (char)
Sets the input end-of-line char
—input-optionally-enclosed-by (char)
Sets an input field-enclosing character
—input-enclosed-by (char)
Sets a required input field encloser
—input-escaped-by (char)
Sets the input escape character

If you have already imported data into HDFS in a text-based representation and want to change the delimiters being used, you should regenerate the class via sqoop —generate-only, specifying the new delimiters with —fields-terminated-by, etc., and the old delimiters with —input-fields-terminated-by, etc. Then run a MapReduce job where your mapper creates an instance of your record class, uses its parse() method to read the fields using the old delimiters, and emits a new Text output value via the record's toString() method, which will use the new delimiters. You'll then want to regenerate the class another time without the —input-fields-terminated-by specified so that the new delimiters are used for both input and output.