Table of Contents
sqoop-importsqoop-import-all-tablessqoop-exportsqoop-codegensqoop-create-hive-tablesqoop-evalsqoop-list-databasessqoop-list-tablessqoop-helpsqoop-versionSqoop is a tool designed to transfer data between Hadoop and relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.
Sqoop automates most of this process, relying on the database to describe the schema for the data to be imported. Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance.
This document describes how to get started using Sqoop to move data between databases and Hadoop and provides reference information for the operation of the Sqoop command-line tool suite. This document is intended for:
Sqoop is an open source software product of Cloudera, Inc.
Software development for Sqoop occurs at http://github.com/cloudera/sqoop. At that site you can obtain:
Sqoop is compatible with Apache Hadoop 0.21 and Cloudera’s Distribution of Hadoop version 3.
The following prerequisite knowledge is required for this product:
bash
Before you can use Sqoop, a release of Hadoop must be installed and configured. We recommend that you download Cloudera’s Distribution for Hadoop (CDH3) from the Cloudera Software Archive at http://archive.cloudera.com for straightforward installation of Hadoop on Linux systems.
This document assumes you are using a Linux or Linux-like environment. If you are using Windows, you may be able to use cygwin to accomplish most of the following tasks. If you are using Mac OS X, you should see few (if any) compatibility errors. Sqoop is predominantly operated and tested on Linux.
With Sqoop, you can import data from a relational database system into HDFS. The input to the import process is a database table. Sqoop will read the table row-by-row into HDFS. The output of this import process is a set of files containing a copy of the imported table. The import process is performed in parallel. For this reason, the output will be in multiple files. These files may be delimited text files (for example, with commas or tabs separating each field), or binary SequenceFiles containing serialized record data.
A by-product of the import process is a generated Java class which can encapsulate one row of the imported table. This class is used during the import process by Sqoop itself. The Java source code for this class is also provided to you, for use in subsequent MapReduce processing of the data. This class can serialize and deserialize data to and from the SequenceFile format. It can also parse the delimited-text form of a record. These abilities allow you to quickly develop MapReduce applications that use the HDFS-stored records in your processing pipeline. You are also free to parse the delimiteds record data yourself, using any other tools you prefer.
After manipulating the imported records (for example, with MapReduce or Hive) you may have a result data set which you can then export back to the relational database. Sqoop’s export process will read a set of delimited text files from HDFS in parallel, parse them into records, and insert them as new rows in a target database table, for consumption by external applications or users.
Sqoop includes some other commands which allow you to inspect the
database you are working with. For example, you can list the available
database schemas (with the sqoop-list-databases tool) and tables
within a schema (with the sqoop-list-tables tool). Sqoop also
includes a primitive SQL execution shell (the sqoop-eval tool).
Most aspects of the import, code generation, and export processes can be customized. You can control the specific row range or columns imported. You can specify particular delimiters and escape characters for the file-based representation of the data, as well as the file format used. You can also control the class or package names used in generated code. Subsequent sections of this document explain how to specify these and other arguments to Sqoop.
Sqoop is a collection of related tools. To use Sqoop, you specify the tool you want to use and the arguments that control the tool.
If Sqoop is compiled from its own source, you can run Sqoop without a formal
installation process by running the bin/sqoop program. Users
of a packaged deployment of Sqoop (such as an RPM shipped with Cloudera’s
Distribution for Hadoop) will see this program installed as /usr/bin/sqoop.
The remainder of this documentation will refer to this program as
sqoop. For example:
$ sqoop tool-name [tool-arguments]
![]() | Note |
|---|---|
The following examples that begin with a |
Sqoop ships with a help tool. To display a list of all available tools, type the following command:
$ sqoop help usage: sqoop COMMAND [ARGS] Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS list-databases List available databases on a server list-tables List available tables in a database version Display version information See 'sqoop help COMMAND' for information on a specific command.
You can display help for a specific tool by entering: sqoop help
(tool-name); for example, sqoop help import.
You can also add the --help argument to any command: sqoop import
--help.
In addition to typing the sqoop (toolname) syntax, you can use alias
scripts that specify the sqoop-(toolname) syntax. For example, the
scripts sqoop-import, sqoop-export, etc. each select a specific
tool.
You invoke Sqoop through the program launch capability provided by
Hadoop. The sqoop command-line program is a wrapper which runs the
bin/hadoop script shipped with Hadoop. If you have multiple
installations of Hadoop present on your machine, you can select the
Hadoop installation by setting the $HADOOP_HOME environment
variable.
For example:
$ HADOOP_HOME=/path/to/some/hadoop sqoop import --arguments...
or:
$ export HADOOP_HOME=/some/path/to/hadoop $ sqoop import --arguments...
If $HADOOP_HOME is not set, Sqoop will use the default installation
location for Cloudera’s Distribution for Hadoop, /usr/lib/hadoop.
The active Hadoop configuration is loaded from $HADOOP_HOME/conf/,
unless the $HADOOP_CONF_DIR environment variable is set.
To control the operation of each Sqoop tool, you use generic and specific arguments.
For example:
$ sqoop help import usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS] Common arguments: --connect <jdbc-uri> Specify JDBC connect string --driver <class-name> Manually specify JDBC driver class to use --hadoop-home <dir> Override $HADOOP_HOME --help Print usage instructions -P Read password from console --password <password> Set authentication password --username <username> Set authentication username --verbose Print more information while working [...] Generic Hadoop command-line arguments: (must preceed any tool-specific arguments) Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|jobtracker:port> specify a job tracker -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions]
You must supply the generic arguments -conf, -D, and so on after the
tool name but before any tool-specific arguments (such as
--connect). Note that generic Hadoop arguments are preceeded by a
single dash character (-), whereas tool-specific arguments start
with two dashes (--), unless they are single character arguments such as -P.
The -conf, -D, -fs and -jt arguments control the configuration
and Hadoop server settings. The -files, -libjars, and -archives
arguments are not typically used with Sqoop, but they are included as
part of Hadoop’s internal argument-parsing system.
The import tool imports an individual table from an RDBMS to HDFS.
Each row from a table is represented as a separate record in HDFS.
Records can be stored as text files (one record per line), or in
binary representation in SequenceFiles.
$ sqoop import (generic-args) (import-args) $ sqoop-import (generic-args) (import-args)
While the Hadoop generic arguments must preceed any import arguments, you can type the import arguments in any order with respect to one another.
![]() | Note |
|---|---|
In this document, arguments are grouped into collections organized by function. Some collections are present in several tools (for example, the "common" arguments). An extended description of their functionality is given only on the first presentation in this document. |
Table 1. Common arguments
| Argument | Description |
|---|---|
--connect <jdbc-uri>
| Specify JDBC connect string |
--driver <class-name>
| Manually specify JDBC driver class to use |
--hadoop-home <dir>
| Override $HADOOP_HOME |
--help
| Print usage instructions |
-P
| Read password from console |
--password <password>
| Set authentication password |
--username <username>
| Set authentication username |
--verbose
| Print more information while working |
Sqoop is designed to import tables from a database into HDFS. To do
so, you must specify a connect string that describes how to connect to the
database. The connect string is similar to a URL, and is communicated to
Sqoop with the --connect argument. This describes the server and
database to connect to; it may also specify the port. For example:
$ sqoop import --connect jdbc:mysql://database.example.com/employees
This string will connect to a MySQL database named employees on the
host database.example.com. It’s important that you do not use the URL
localhost if you intend to use Sqoop with a distributed Hadoop
cluster. The connect string you supply will be used on TaskTracker nodes
throughout your MapReduce cluster; if you specify the
literal name localhost, each node will connect to a different
database (or more likely, no database at all). Instead, you should use
the full hostname or IP address of the database host that can be seen
by all your remote nodes.
You might need to authenticate against the database before you can
access it. You can use the --username and --password or -P parameters
to supply a username and a password to the database. For example:
$ sqoop import --connect jdbc:mysql://database.example.com/employees \
--username aaron --password 12345![]() | Warning |
|---|---|
The |
Sqoop automatically supports several databases, including MySQL.
Connect strings beginning with jdbc:mysql:// are handled
automatically in Sqoop, though you may need to install the driver
yourself. (A full list of databases with built-in support is provided
in the "Supported Databases" section.)
You can use Sqoop with any other
JDBC-compliant database. First, download the appropriate JDBC
driver for the type of database you want to import, and install the .jar
file in the /usr/hadoop/lib directory on all machines in your Hadoop
cluster, or some other directory which is in the classpath
on all nodes. Each driver .jar file also has a specific driver class which defines
the entry-point to the driver. For example, MySQL’s Connector/J library has
a driver class of com.mysql.jdbc.Driver. Refer to your database
vendor-specific documentation to determine the main driver class.
This class must be provided as an argument to Sqoop with --driver.
For example, to connect to a SQLServer database, first download the driver from microsoft.com and install it in your Hadoop lib path.
Then run Sqoop. For example:
$ sqoop import --driver com.microsoft.jdbc.sqlserver.SQLServerDriver \
--connect <connect-string> ...Table 2. Import control arguments:
| Argument | Description |
|---|---|
--append
| Append data to an existing dataset in HDFS |
--as-sequencefile
| Imports data to SequenceFiles |
--as-textfile
| Imports data as plain text (default) |
--columns <col,col,col…>
| Columns to import from table |
--direct
| Use direct import fast path |
--direct-split-size <n>
| Split the input stream every n bytes when importing in direct mode |
--inline-lob-limit <n>
| Set the maximum size for an inline LOB |
-m,--num-mappers <n>
| Use n map tasks to import in parallel |
--split-by <column-name>
| Column of the table used to split work units |
--table <table-name>
| Table to read |
--target-dir <dir>
| HDFS destination dir |
--warehouse-dir <dir>
| HDFS parent for table destination |
--where <where clause>
| WHERE clause to use during import |
-z,--compress
| Enable compression |
Sqoop currently imports data in a table-centric fashion. Use the
--table argument to select the table to import. For example, --table
employees. This argument can also identify a VIEW or other table-like
entity in a database.
By default, all columns within a table are selected for import. Imported data is written to HDFS in its "natural order;" that is, a table containing columns A, B, and C result in an import of data such as:
A1,B1,C1 A2,B2,C2 ...
You can select a subset of columns and control their ordering by using
the --columns argument. This should include a comma-delimited list
of columns to import. For example: --columns "name,employee_id,jobtitle".
You can control which rows are imported by adding a SQL WHERE clause
to the import statement. By default, Sqoop generates statements of the
form SELECT <column list> FROM <table name>. You can append a
WHERE clause to this with the --where argument. For example: --where
"id > 400". Only rows where the id column has a value greater than
400 will be imported.
Sqoop imports data in parallel from most database sources. You can
specify the number
of map tasks (parallel processes) to use to perform the import by
using the -m or --num-mappers argument. Each of these arguments
takes an integer value which corresponds to the degree of parallelism
to employ. By default, four tasks are used. Some databases may see
improved performance by increasing this value to 8 or 16. Do not
increase the degree of parallelism greater than that available within
your MapReduce cluster; tasks will run serially and will likely
increase the amount of time required to perform the import. Likewise,
do not increase the degree of parallism higher than that which your
database can reasonably support. Connecting 100 concurrent clients to
your database may increase the load on the database server to a point
where performance suffers as a result.
When performing parallel imports, Sqoop needs a criterion by which it
can split the workload. Sqoop uses a splitting column to split the
workload. By default, Sqoop will identify the primary key column (if
present) in a table and use it as the splitting column. The low and
high values for the splitting column are retrieved from the database,
and the map tasks operate on evenly-sized components of the total
range. For example, if you had a table with a primary key column of
id whose minimum value was 0 and maximum value was 1000, and Sqoop
was directed to use 4 tasks, Sqoop would run four processes which each
execute SQL statements of the form SELECT * FROM sometable WHERE id
>= lo AND id < hi, with (lo, hi) set to (0, 250), (250, 500),
(500, 750), and (750, 1001) in the different tasks.
If the actual values for the primary key are not uniformly distributed
across its range, then this can result in unbalanced tasks. You should
explicitly choose a different column with the --split-by argument.
For example, --split-by employee_id. Sqoop cannot currently split on
multi-column indices. If your table has no index column, or has a
multi-column key, then you must also manually choose a splitting
column.
By default, the import process will use JDBC which provides a
reasonable cross-vendor import channel. Some databases can perform
imports in a more high-performance fashion by using database-specific
data movement tools. For example, MySQL provides the mysqldump tool
which can export data from MySQL to other systems very quickly. By
supplying the --direct argument, you are specifying that Sqoop
should attempt the direct import channel. This channel may be
higher performance than using JDBC. Currently, direct mode does not
support imports of large object columns.
When importing from PostgreSQL in conjunction with direct mode, you
can split the import into separate files after
individual files reach a certain size. This size limit is controlled
with the --direct-split-size argument.
By default, Sqoop will import a table named foo to a directory named
foo inside your home directory in HDFS. For example, if your
username is someuser, then the import tool will write to
/user/someuser/foo/(files). You can adjust the parent directory of
the import with the --warehouse-dir argument. For example:
$ sqoop import --connnect <connect-str> --table foo --warehouse-dir /shared \
...This command would write to a set of files in the /shared/foo/ directory.
You can also explicitly choose the target directory, like so:
$ sqoop import --connnect <connect-str> --table foo --target-dir /dest \
...This will import the files into the /dest directory. --target-dir is
incompatible with --warehouse-dir.
When using direct mode, you can specify additional arguments which
should be passed to the underlying tool. If the argument
-- is given on the command-line, then subsequent arguments are sent
directly to the underlying tool. For example, the following adjusts
the character set used by mysqldump:
$ sqoop import --connect jdbc:mysql://server.foo.com/db --table bar \
--direct -- --default-character-set=latin1By default, imports go to a new target location. If the destination directory
already exists in HDFS, Sqoop will refuse to import and overwrite that
directory’s contents. If you use the --append argument, Sqoop will import
data to a temporary directory and then rename the files into the normal
target directory in a manner that does not conflict with existing filenames
in that directory.
You can import data in one of two file formats: delimited text or SequenceFiles.
Delimited text is the default import format. You can also specify it
explicitly by using the --as-textfile argument. This argument will write
string-based representations of each record to the output files, with
delimiter characters between individual columns and rows. These
delimiters may be commas, tabs, or other characters. (The delimiters
can be selected; see "Output line formatting arguments.") The
following is the results of an example text-based import:
1,here is a message,2010-05-01 2,happy new year!,2010-01-01 3,another message,2009-11-12
Delimited text is appropriate for most non-binary data types. It also readily supports further manipulation by other tools, such as Hive.
SequenceFiles are a binary format that store individual records in
custom record-specific data types. These data types are manifested as
Java classes. Sqoop will automatically generate these data types for
you. This format supports exact storage of all data in binary
representations, and is appropriate for storing binary data
(for example, VARBINARY columns), or data that will be principly
manipulated by custom MapReduce programs (reading from SequenceFiles
is higher-performance than reading from text files, as records do not
need to be parsed).
By default, data is not compressed. You can compress
your data by using the deflate (gzip) algorithm with the -z or
--compress argument. This applies to both SequenceFiles or text
files.
Sqoop handles large objects (BLOB and CLOB columns) in particular
ways. If this data is truly large, then these columns should not be
fully materialized in memory for manipulation, as most columns are.
Instead, their data is handled in a streaming fashion. Large objects
can be stored inline with the rest of the data, in which case they are
fully materialized in memory on every access, or they can be stored in
a secondary storage file linked to the primary data storage. By
default, large objects less than 16 MB in size are stored inline with
the rest of the data. At a larger size, they are stored in files in
the _lobs subdirectory of the import target directory. These files
are stored in a separate format optimized for large record storage,
which can accomodate records of up to 2^63 bytes each. The size at
which lobs spill into separate files is controlled by the
--inline-lob-limit argument, which takes a parameter specifying the
largest lob size to keep inline, in bytes. If you set the inline LOB
limit to 0, all large objects will be placed in external
storage.
Table 3. Output line formatting arguments:
| Argument | Description |
|---|---|
--enclosed-by <char>
| Sets a required field enclosing character |
--escaped-by <char>
| Sets the escape character |
--fields-terminated-by <char>
| Sets the field separator character |
--lines-terminated-by <char>
| Sets the end-of-line character |
--mysql-delimiters
|
Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '
|
--optionally-enclosed-by <char>
| Sets a field enclosing character |
When importing to delimited files, the choice of delimiter is
important. Delimiters which appear inside string-based fields may
cause ambiguous parsing of the imported data by subsequent analysis
passes. For example, the string "Hello, pleased to meet you" should
not be imported with the end-of-field delimiter set to a comma.
Delimiters may be specified as:
--fields-terminated-by X)
an escape character (--fields-terminated-by \t). Supported escape
characters are:
\b (backspace)
\n (newline)
\r (carriage return)
\t (tab)
\" (double-quote)
\\' (single-quote)
\\ (backslash)
\0 (NUL) - This will insert NUL characters between fields or lines,
or will disable enclosing/escaping if used for one of the --enclosed-by,
--optionally-enclosed-by, or --escaped-by arguments.
\0ooo, where ooo is the octal value.
For example, --fields-terminated-by \001 would yield the ^A character.
\0xhhh, where hhh is the hex value.
For example, --fields-terminated-by \0x10 would yield the carriage
return character.
The default delimiters are a comma (,) for fields, a newline (\n) for records, no quote
character, and no escape character. Note that this can lead to
ambiguous/unparsible records if you import database records containing
commas or newlines in the field data. For unambiguous parsing, both must
be enabled. For example, via --mysql-delimiters.
If unambiguous delimiters cannot be presented, then use enclosing and escaping characters. The combination of (optional) enclosing and escaping characters will allow unambiguous parsing of lines. For example, suppose one column of a dataset contained the following values:
Some string, with a comma. Another "string with quotes"
The following arguments would provide delimiters which can be unambiguously parsed:
$ sqoop import --fields-terminated-by , --escaped-by \\ --enclosed-by '\"' ...
(Note that to prevent the shell from mangling the enclosing character, we have enclosed that argument itself in single-quotes.)
The result of the above arguments applied to the above dataset would be:
"Some string, with a comma.","1","2","3"... "Another \"string with quotes\"","4","5","6"...
Here the imported strings are shown in the context of additional
columns ("1","2","3", etc.) to demonstrate the full effect of enclosing
and escaping. The enclosing character is only strictly necessary when
delimiter characters appear in the imported text. The enclosing
character can therefore be specified as optional:
$ sqoop import --optionally-enclosed-by '\"' (the rest as above)...
Which would result in the following import:
"Some string, with a comma.",1,2,3... "Another \"string with quotes\"",4,5,6...
![]() | Note |
|---|---|
Hive does not support enclosing and escaping characters. You must choose unambiguous field and record-terminating delimiters without the help of escaping and enclosing characters when working with Hive; this is a limitation of Hive’s input parsing abilities. |
The --mysql-delimiters argument is a shorthand argument which uses
the default delimiters for the mysqldump program.
If you use the mysqldump delimiters in conjunction with a
direct-mode import (with --direct), very fast imports can be
achieved.
While the choice of delimiters is most important for a text-mode
import, it is still relevant if you import to SequenceFiles with
--as-sequencefile. The generated class' toString() method
will use the delimiters you specify, so subsequent formatting of
the output data will rely on the delimiters you choose.
Table 4. Input parsing arguments:
| Argument | Description |
|---|---|
--input-enclosed-by <char>
| Sets a required field encloser |
--input-escaped-by <char>
| Sets the input escape character |
--input-fields-terminated-by <char>
| Sets the input field separator |
--input-lines-terminated-by <char>
| Sets the input end-of-line character |
--input-optionally-enclosed-by <char>
| Sets a field enclosing character |
When Sqoop imports data to HDFS, it generates a Java class which can
reinterpret the text files that it creates when doing a
delimited-format import. The delimiters are chosen with arguments such
as --fields-terminated-by; this controls both how the data is
written to disk, and how the generated parse() method reinterprets
this data. The delimiters used by the parse() method can be chosen
independently of the output arguments, by using
--input-fields-terminated-by, and so on. This is useful, for example, to
generate classes which can parse records created with one set of
delimiters, and emit the records to a different set of files using a
separate set of delimiters.
Table 5. Hive arguments:
| Argument | Description |
|---|---|
--hive-home <dir>
|
Override $HIVE_HOME
|
--hive-import
| Import tables into Hive (Uses Hive’s default delimiters if none are set.) |
--hive-overwrite
| Overwrite existing data in the Hive table. |
--hive-table <table-name>
| Sets the table name to use when importing to Hive. |
Sqoop’s import tool’s main function is to upload your data into files
in HDFS. If you have a Hive metastore associated with your HDFS
cluster, Sqoop can also import the data into Hive by generating and
executing a CREATE TABLE statement to define the data’s layout in
Hive. Importing data into Hive is as simple as adding the
--hive-import option to your Sqoop command line.
If the Hive table already exists, you can specify the
--hive-overwrite option to indicate that existing table in hive must
be replaced. After your data is imported into HDFS or this step is
omitted, Sqoop will generate a Hive script containing a CREATE TABLE
operation defining your columns using Hive’s types, and a LOAD DATA INPATH
statement to move the data files into Hive’s warehouse directory.
The script will be executed by calling
the installed copy of hive on the machine where Sqoop is run. If you have
multiple Hive installations, or hive is not in your $PATH, use the
--hive-home option to identify the Hive installation directory.
Sqoop will use $HIVE_HOME/bin/hive from here.
![]() | Note |
|---|---|
This function is incompatible with |
Hive’s text parser does not support escaping or enclosing
characters. Sqoop will print a warning if you use --escaped-by,
--enclosed-by, or --optionally-enclosed-by because Hive does not know
how to parse these. It will pass the field and record delimiters through
to Hive. If you do not set any delimiters and do use --hive-import,
the field delimiter will be set to ^A and the record delimiter will
be set to \n to be consistent with Hive’s defaults. It is important when
importing data to Hive to choose unambiguous field and record delimiters
due to the lack of escape and enclosing characters.
The table name used in Hive is, by default, the same as that of the
source table. You can control the output table name with the --hive-table
option.
Table 6. Code generation arguments:
| Argument | Description |
|---|---|
--bindir <dir>
| Output directory for compiled objects |
--class-name <name>
|
Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class.
|
--jar-file <file>
| Disable code generation; use specified jar |
--outdir <dir>
| Output directory for generated code |
--package-name <name>
| Put auto-generated classes in this package |
As mentioned earlier, a byproduct of importing a table to HDFS is a class which can manipulate the imported data. If the data is stored in SequenceFiles, this class will be used for the data’s serialization container. Therefore, you should use this class in your subsequent MapReduce processing of the data.
The class is typically named after the table; a table named foo will
generate a class named foo. You may want to override this class
name. For example, if your table is named EMPLOYEES, you may want to
specify --class-name Employee instead. Similarly, you can specify
just the package name with --package-name. The following import
generates a class named com.foocorp.SomeTable:
$ sqoop import --connect <connect-str> --table SomeTable --package-name com.foocorp
The .java source file for your class will be written to the current
working directory when you run sqoop. You can control the output
directory with --outdir. For example, --outdir src/generated/.
The import process compiles the source into .class and .jar files;
these are ordinarily stored under /tmp. You can select an alternate
target directory with --bindir. For example, --bindir /scratch.
If you already have a compiled class that can be used to perform the
import and want to suppress the code-generation aspect of the import
process, you can use an existing jar and class by
providing the --jar-file and --class-name options. For example:
$ sqoop import --table SomeTable --jar-file mydatatypes.jar \
--class-name SomeTableTypeThis command will load the SomeTableType class out of mydatatypes.jar.
The following examples illustrate how to use the import tool in a variety of situations.
A basic import of a table named EMPLOYEES in the corp database:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES
A basic import requiring a login:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
--username SomeUser -P
Enter password: (hidden)Selecting specific columns from the EMPLOYEES table:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
--columns "employee_id,first_name,last_name,job_title"Controlling the import parallelism (using 8 parallel tasks):
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
-m 8Enabling the MySQL "direct mode" fast path:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
--directStoring data in SequenceFiles, and setting the generated class name to
com.foocorp.Employee:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
--class-name com.foocorp.Employee --as-sequencefileSpecifying the delimiters to use in a text-mode import:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
--fields-terminated-by '\t' --lines-terminated-by '\n' \
--optionally-enclosed-by '\"'Importing the data to Hive:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
--hive-importImporting only new employees:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
--where "start_date > '2010-01-01'"Changing the splitting column from the default:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
--split-by dept_idVerifying that an import was successful:
$ hadoop fs -ls EMPLOYEES Found 5 items drwxr-xr-x - someuser somegrp 0 2010-04-27 16:40 /user/someuser/EMPLOYEES/_logs -rw-r--r-- 1 someuser somegrp 2913511 2010-04-27 16:40 /user/someuser/EMPLOYEES/part-m-00000 -rw-r--r-- 1 someuser somegrp 1683938 2010-04-27 16:40 /user/someuser/EMPLOYEES/part-m-00001 -rw-r--r-- 1 someuser somegrp 7245839 2010-04-27 16:40 /user/someuser/EMPLOYEES/part-m-00002 -rw-r--r-- 1 someuser somegrp 7842523 2010-04-27 16:40 /user/someuser/EMPLOYEES/part-m-00003 $ hadoop fs -cat EMPLOYEES/part-m-00000 | head -n 10 0,joe,smith,engineering 1,jane,doe,marketing ...
Performing an incremental import of new data, after having already imported the first 100,000 rows of a table:
$ sqoop import --connect jdbc:mysql://db.foo.com/somedb --table sometable \
--where "id > 100000" --target-dir /incremental_dataset --appendThe import-all-tables tool imports a set of tables from an RDBMS to HDFS.
Data from each table is stored in a separate directory in HDFS.
For the import-all-tables tool to be useful, the following conditions
must be met:
WHERE clause.
$ sqoop import-all-tables (generic-args) (import-args) $ sqoop-import-all-tables (generic-args) (import-args)
Although the Hadoop generic arguments must preceed any import arguments, the import arguments can be entered in any order with respect to one another.
Table 7. Common arguments
| Argument | Description |
|---|---|
--connect <jdbc-uri>
| Specify JDBC connect string |
--driver <class-name>
| Manually specify JDBC driver class to use |
--hadoop-home <dir>
| Override $HADOOP_HOME |
--help
| Print usage instructions |
-P
| Read password from console |
--password <password>
| Set authentication password |
--username <username>
| Set authentication username |
--verbose
| Print more information while working |
Table 8. Import control arguments:
| Argument | Description |
|---|---|
--as-sequencefile
| Imports data to SequenceFiles |
--as-textfile
| Imports data as plain text (default) |
--direct
| Use direct import fast path |
--direct-split-size <n>
| Split the input stream every n bytes when importing in direct mode |
--inline-lob-limit <n>
| Set the maximum size for an inline LOB |
-m,--num-mappers <n>
| Use n map tasks to import in parallel |
--warehouse-dir <dir>
| HDFS parent for table destination |
-z,--compress
| Enable compression |
These arguments behave in the same manner as they do when used for the
sqoop-import tool, but the --table, --split-by, --columns,
and --where arguments are invalid for sqoop-import-all-tables.
Table 9. Output line formatting arguments:
| Argument | Description |
|---|---|
--enclosed-by <char>
| Sets a required field enclosing character |
--escaped-by <char>
| Sets the escape character |
--fields-terminated-by <char>
| Sets the field separator character |
--lines-terminated-by <char>
| Sets the end-of-line character |
--mysql-delimiters
|
Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '
|
--optionally-enclosed-by <char>
| Sets a field enclosing character |
Table 10. Input parsing arguments:
| Argument | Description |
|---|---|
--input-enclosed-by <char>
| Sets a required field encloser |
--input-escaped-by <char>
| Sets the input escape character |
--input-fields-terminated-by <char>
| Sets the input field separator |
--input-lines-terminated-by <char>
| Sets the input end-of-line character |
--input-optionally-enclosed-by <char>
| Sets a field enclosing character |
Table 11. Hive arguments:
| Argument | Description |
|---|---|
--hive-home <dir>
|
Override $HIVE_HOME
|
--hive-import
| Import tables into Hive (Uses Hive’s default delimiters if none are set.) |
--hive-overwrite
| Overwrite existing data in the Hive table. |
--hive-table <table-name>
| Sets the table name to use when importing to Hive. |
Table 12. Code generation arguments:
| Argument | Description |
|---|---|
--bindir <dir>
| Output directory for compiled objects |
--jar-file <file>
| Disable code generation; use specified jar |
--outdir <dir>
| Output directory for generated code |
--package-name <name>
| Put auto-generated classes in this package |
The import-all-tables tool does not support the --class-name argument.
You may, however, specify a package with --package-name in which all
generated classes will be placed.
Import all tables from the corp database:
$ sqoop import-all-tables --connect jdbc:mysql://db.foo.com/corp
Verifying that it worked:
$ hadoop fs -ls Found 4 items drwxr-xr-x - someuser somegrp 0 2010-04-27 17:15 /user/someuser/EMPLOYEES drwxr-xr-x - someuser somegrp 0 2010-04-27 17:15 /user/someuser/PAYCHECKS drwxr-xr-x - someuser somegrp 0 2010-04-27 17:15 /user/someuser/DEPARTMENTS drwxr-xr-x - someuser somegrp 0 2010-04-27 17:15 /user/someuser/OFFICE_SUPPLIES
The export tool exports a set of files from HDFS back to an RDBMS.
The target table must already exist in the database. The input files
are read and parsed into a set of records according to the
user-specified delimiters. These are then transformed into a set of
INSERT statements that inject the records into the database.
$ sqoop export (generic-args) (import-args) $ sqoop-export (generic-args) (import-args)
Although the Hadoop generic arguments must preceed any export arguments, the export arguments can be entered in any order with respect to one another.
Table 13. Common arguments
| Argument | Description |
|---|---|
--connect <jdbc-uri>
| Specify JDBC connect string |
--driver <class-name>
| Manually specify JDBC driver class to use |
--hadoop-home <dir>
| Override $HADOOP_HOME |
--help
| Print usage instructions |
-P
| Read password from console |
--password <password>
| Set authentication password |
--username <username>
| Set authentication username |
--verbose
| Print more information while working |
Table 14. Export control arguments:
| Argument | Description |
|---|---|
--direct
| Use direct export fast path |
--export-dir <dir>
| HDFS source path for the export |
-m,--num-mappers <n>
| Use n map tasks to export in parallel |
--table <table-name>
| Table to populate |
The --table and --export-dir arguments are required. These
specify the table to populate in the database, and the
directory in HDFS that contains the source data.
You can control the number of mappers independently from the number of
files present in the directory. Export performance depends on the
degree of parallelism. By default, Sqoop will use four tasks in
parallel for the export process. This may not be optimal; you will
need to experiment with your own particular setup. Additional tasks
may offer better concurrency, but if the database is already
bottlenecked on updating indices, invoking triggers, and so on, then
additional load may decrease performance. The --num-mappers or -m
arguments control the number of map tasks, which is the degree of
parallelism used.
MySQL provides a direct mode for exports as well, using the
mysqlimport tool. When exporting to MySQL, use the --direct argument
to specify this codepath. This may be
higher-performance than the standard JDBC codepath.
Table 15. Input parsing arguments:
| Argument | Description |
|---|---|
--input-enclosed-by <char>
| Sets a required field encloser |
--input-escaped-by <char>
| Sets the input escape character |
--input-fields-terminated-by <char>
| Sets the input field separator |
--input-lines-terminated-by <char>
| Sets the input end-of-line character |
--input-optionally-enclosed-by <char>
| Sets a field enclosing character |
Table 16. Output line formatting arguments:
| Argument | Description |
|---|---|
--enclosed-by <char>
| Sets a required field enclosing character |
--escaped-by <char>
| Sets the escape character |
--fields-terminated-by <char>
| Sets the field separator character |
--lines-terminated-by <char>
| Sets the end-of-line character |
--mysql-delimiters
|
Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '
|
--optionally-enclosed-by <char>
| Sets a field enclosing character |
Sqoop automatically generates code to parse and interpret records of the files containing the data to be exported back to the database. If these files were created with non-default delimiters (comma-separated fields with newline-separated records), you should specify the same delimiters again so that Sqoop can parse your files.
If you specify incorrect delimiters, Sqoop will fail to find enough
columns per line. This will cause export map tasks to fail by throwing
ParseExceptions.
Table 17. Code generation arguments:
| Argument | Description |
|---|---|
--bindir <dir>
| Output directory for compiled objects |
--class-name <name>
|
Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class.
|
--jar-file <file>
| Disable code generation; use specified jar |
--outdir <dir>
| Output directory for generated code |
--package-name <name>
| Put auto-generated classes in this package |
If the records to be exported were generated as the result of a
previous import, then the original generated class can be used to read
the data back. Specifying --jar-file and --class-name obviate
the need to specify delimiters in this case.
Exports are performed by multiple writers in parallel. Each writer
uses a separate connection to the database; these have separate
transactions from one another. Sqoop uses the multi-row INSERT
syntax to insert up to 100 records per statement. Every 100
statements, the current transaction within a writer task is committed,
causing a commit every 10,000 rows. This ensures that transaction
buffers do not grow without bound, and cause out-of-memory conditions.
Therefore, an export is not an atomic process. Partial results from
the export will become visible before the export is complete.
Exports may fail for a number of reasons:
INSERT a row which violates a consistency constraint
(for example, inserting a duplicate primary key value)
If an export map task fails due to these or other reasons, it will
cause the export job to fail. The results of a failed export are
undefined. Each export map task operates in a separate transaction.
Furthermore, individual map tasks commit their current transaction
periodically. If a task fails, the current transaction will be rolled
back. Any previously-committed transactions will remain durable in the
database, leading to a partially-complete export.
A basic export to populate a table named bar:
$ sqoop export --connect jdbc:mysql://db.example.com/foo --table bar \
--export-dir /results/bar_dataThis example takes the files in /results/bar_data and injects their
contents in to the bar table in the foo database on db.example.com.
The target table must already exist in the database. Sqoop performs
a set of INSERT INTO operations, without regard for existing content. If
Sqoop attempts to insert rows which violate constraints in the database
(for example, a particular primary key value already exists), then the export
fails.
The codegen tool generates Java classes which encapsulate and
interpret imported records. The Java definition of a record is
instantiated as part of the import process, but can also be performed
separately. For example, if Java source is lost, it can be recreated.
New versions of a class can be created which use different delimiters
between fields, and so on.
$ sqoop codegen (generic-args) (codegen-args) $ sqoop-codegen (generic-args) (codegen-args)
Although the Hadoop generic arguments must preceed any codegen arguments, the codegen arguments can be entered in any order with respect to one another.
Table 18. Common arguments
| Argument | Description |
|---|---|
--connect <jdbc-uri>
| Specify JDBC connect string |
--driver <class-name>
| Manually specify JDBC driver class to use |
--hadoop-home <dir>
| Override $HADOOP_HOME |
--help
| Print usage instructions |
-P
| Read password from console |
--password <password>
| Set authentication password |
--username <username>
| Set authentication username |
--verbose
| Print more information while working |
Table 19. Code generation arguments:
| Argument | Description |
|---|---|
--bindir <dir>
| Output directory for compiled objects |
--class-name <name>
|
Sets the generated class name. This overrides --package-name.
|
--outdir <dir>
| Output directory for generated code |
--package-name <name>
| Put auto-generated classes in this package |
--table <table-name>
| Name of the table to generate code for. |
Table 20. Output line formatting arguments:
| Argument | Description |
|---|---|
--enclosed-by <char>
| Sets a required field enclosing character |
--escaped-by <char>
| Sets the escape character |
--fields-terminated-by <char>
| Sets the field separator character |
--lines-terminated-by <char>
| Sets the end-of-line character |
--mysql-delimiters
|
Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '
|
--optionally-enclosed-by <char>
| Sets a field enclosing character |
Table 21. Input parsing arguments:
| Argument | Description |
|---|---|
--input-enclosed-by <char>
| Sets a required field encloser |
--input-escaped-by <char>
| Sets the input escape character |
--input-fields-terminated-by <char>
| Sets the input field separator |
--input-lines-terminated-by <char>
| Sets the input end-of-line character |
--input-optionally-enclosed-by <char>
| Sets a field enclosing character |
Table 22. Hive arguments:
| Argument | Description |
|---|---|
--hive-home <dir>
|
Override $HIVE_HOME
|
--hive-import
| Import tables into Hive (Uses Hive’s default delimiters if none are set.) |
--hive-overwrite
| Overwrite existing data in the Hive table. |
--hive-table <table-name>
| Sets the table name to use when importing to Hive. |
If Hive arguments are provided to the code generation tool, Sqoop generates a file containing the HQL statements to create a table and load data.
The create-hive-table tool populates a Hive metastore with a
definition for a table based on a database table previously imported
to HDFS, or one planned to be imported. This effectively performs the
"--hive-import" step of sqoop-import without running the
preceeding import.
If data was already loaded to HDFS, you can use this tool to finish the pipeline of importing the data to Hive. You can also create Hive tables with this tool; data then can be imported and populated into the target after a preprocessing step run by the user.
$ sqoop create-hive-table (generic-args) (create-hive-table-args) $ sqoop-create-hive-table (generic-args) (create-hive-table-args)
Although the Hadoop generic arguments must preceed any create-hive-table arguments, the create-hive-table arguments can be entered in any order with respect to one another.
Table 23. Common arguments
| Argument | Description |
|---|---|
--connect <jdbc-uri>
| Specify JDBC connect string |
--driver <class-name>
| Manually specify JDBC driver class to use |
--hadoop-home <dir>
| Override $HADOOP_HOME |
--help
| Print usage instructions |
-P
| Read password from console |
--password <password>
| Set authentication password |
--username <username>
| Set authentication username |
--verbose
| Print more information while working |
Table 24. Hive arguments:
| Argument | Description |
|---|---|
--hive-home <dir>
|
Override $HIVE_HOME
|
--hive-overwrite
| Overwrite existing data in the Hive table. |
--hive-table <table-name>
| Sets the table name to use when importing to Hive. |
--table
| The database table to read the definition from. |
Table 25. Output line formatting arguments:
| Argument | Description |
|---|---|
--enclosed-by <char>
| Sets a required field enclosing character |
--escaped-by <char>
| Sets the escape character |
--fields-terminated-by <char>
| Sets the field separator character |
--lines-terminated-by <char>
| Sets the end-of-line character |
--mysql-delimiters
|
Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '
|
--optionally-enclosed-by <char>
| Sets a field enclosing character |
Do not use enclosed-by or escaped-by delimiters with output formatting arguments used to import to Hive. Hive cannot currently parse them.
The eval tool allows users to quickly run simple SQL queries against
a database; results are printed to the console. This allows users to
preview their import queries to ensure they import the data they
expect.
$ sqoop eval (generic-args) (eval-args) $ sqoop-eval (generic-args) (eval-args)
Although the Hadoop generic arguments must preceed any eval arguments, the eval arguments can be entered in any order with respect to one another.
Table 26. Common arguments
| Argument | Description |
|---|---|
--connect <jdbc-uri>
| Specify JDBC connect string |
--driver <class-name>
| Manually specify JDBC driver class to use |
--hadoop-home <dir>
| Override $HADOOP_HOME |
--help
| Print usage instructions |
-P
| Read password from console |
--password <password>
| Set authentication password |
--username <username>
| Set authentication username |
--verbose
| Print more information while working |
Table 27. SQL evaluation arguments:
| Argument | Description |
|---|---|
-e,--query <statement>
|
Execute statement in SQL.
|
$ sqoop list-databases (generic-args) (list-databases-args) $ sqoop-list-databases (generic-args) (list-databases-args)
Although the Hadoop generic arguments must preceed any list-databases arguments, the list-databases arguments can be entered in any order with respect to one another.
Table 28. Common arguments
| Argument | Description |
|---|---|
--connect <jdbc-uri>
| Specify JDBC connect string |
--driver <class-name>
| Manually specify JDBC driver class to use |
--hadoop-home <dir>
| Override $HADOOP_HOME |
--help
| Print usage instructions |
-P
| Read password from console |
--password <password>
| Set authentication password |
--username <username>
| Set authentication username |
--verbose
| Print more information while working |
$ sqoop list-tables (generic-args) (list-tables-args) $ sqoop-list-tables (generic-args) (list-tables-args)
Although the Hadoop generic arguments must preceed any list-tables arguments, the list-tables arguments can be entered in any order with respect to one another.
Table 29. Common arguments
| Argument | Description |
|---|---|
--connect <jdbc-uri>
| Specify JDBC connect string |
--driver <class-name>
| Manually specify JDBC driver class to use |
--hadoop-home <dir>
| Override $HADOOP_HOME |
--help
| Print usage instructions |
-P
| Read password from console |
--password <password>
| Set authentication password |
--username <username>
| Set authentication username |
--verbose
| Print more information while working |
$ sqoop help [tool-name] $ sqoop-help [tool-name]
If no tool name is provided (for example, the user runs sqoop help), then
the available tools are listed. With a tool name, the usage
instructions for that specific tool are presented on the console.
List available tools:
$ sqoop help usage: sqoop COMMAND [ARGS] Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table ... See 'sqoop help COMMAND' for information on a specific command.
Display usage instructions for the import tool:
$ bin/sqoop help import usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS] Common arguments: --connect <jdbc-uri> Specify JDBC connect string --driver <class-name> Manually specify JDBC driver class to use --hadoop-home <dir> Override $HADOOP_HOME --help Print usage instructions -P Read password from console --password <password> Set authentication password --username <username> Set authentication username --verbose Print more information while working Import control arguments: --as-sequencefile Imports data to SequenceFiles --as-textfile Imports data as plain text (default) ...
Sqoop uses JDBC to connect to databases and adheres to published standards as much as possible. For databases which do not support standards-compliant SQL, Sqoop uses alternate codepaths to provide functionality. In general, Sqoop is believed to be compatible with a large number of databases, but it is tested with only a few.
Nonetheless, several database-specific decisions were made in the implementation of Sqoop, and some databases offer additional settings which are extensions to the standard.
This section describes the databases tested with Sqoop, any exceptions in Sqoop’s handling of each database relative to the norm, and any database-specific settings available in Sqoop.
While JDBC is a compatibility layer that allows a program to access many different databases through a common API, slight differences in the SQL language spoken by each database may mean that Sqoop can’t use every database out of the box, or that some databases may be used in an inefficient manner.
When you provide a connect string to Sqoop, it inspects the protocol scheme to
determine appropriate vendor-specific logic to use. If Sqoop knows about
a given database, it will work automatically. If not, you may need to
specify the driver class to load via --driver. This will use a generic
code path which will use standard SQL to access the database. Sqoop provides
some databases with faster, non-JDBC-based access mechanisms. These can be
enabled by specfying the --direct parameter.
Sqoop includes vendor-specific support for the following databases:
| Database | version |
--direct support?
| connect string matches |
|---|---|---|---|
| HSQLDB | 1.8.0+ | No |
jdbc:hsqldb:*//
|
| MySQL | 5.0+ | Yes |
jdbc:mysql://
|
| Oracle | 10.2.0+ | No |
jdbc:oracle:*//
|
| PostgreSQL | 8.3+ | Yes (import only) |
jdbc:postgresql://
|
Sqoop may work with older versions of the databases listed, but we have only tested it with the versions specified above.
Even if Sqoop supports a database internally, you may still need to
install the database vendor’s JDBC driver in your $HADOOP_HOME/lib
path.
JDBC Driver: MySQL Connector/J
MySQL v5.0 and above offers very thorough coverage by Sqoop. In builds of Sqoop included with Cloudera’s Distribution for Hadoop, the Connector/J JDBC driver is included with the installation.
MySQL allows values of '0000-00-00\' for DATE columns, which is a
non-standard extension to SQL. When communicated via JDBC, these
values are handled in one of three different ways:
NULL.
'0001-01-01\').
You specify the behavior by using the zeroDateTimeBehavior
property of the connect string. If a zeroDateTimeBehavior property
is not specified, Sqoop uses the convertToNull behavior.
You can override this behavior. For example:
$ sqoop import --table foo \
--connect jdbc:mysql://db.example.com/someDb?zeroDateTimeBehavior=roundColumns with type UNSIGNED in MySQL can hold values between 0 and
2^32 (4294967295), but the database will report the data type to Sqoop
as INTEGER, which will can hold values between -2147483648 and
\+2147483647. Sqoop cannot currently import UNSIGNED values above
2147483647.
Sqoop’s direct mode does not support imports of BLOB, CLOB, or
LONGVARBINARY columns. Use JDBC-based imports for these
columns; do not supply the --direct argument to the import tool.
For performance, each writer will commit the current transaction
approximately every 32 MB of exported data. You can control this
by specifying the following argument before any tool-specific arguments: -D
sqoop.mysql.export.checkpoint.bytes=size, where size is a value in
bytes. Set size to 0 to disable intermediate checkpoints,
but individual files being exported will continue to be committed
independently of one another.
![]() | Important |
|---|---|
Note that any arguments to Sqoop that are of the form |
JDBC Driver:
Oracle
JDBC Thin Driver - Sqoop is compatible with ojdbc6.jar.
Sqoop has been tested with Oracle 10.2.0 Express Edition. Oracle is notable in its different approach to SQL from the ANSI standard, and its non-standard JDBC driver. Therefore, several features work differently.
Oracle JDBC represents DATE and TIME SQL types as TIMESTAMP
values. Any DATE columns in an Oracle database will be imported as a
TIMESTAMP in Sqoop, and Sqoop-generated code will store these values
in java.sql.Timestamp fields.
When exporting data back to a database, Sqoop parses text fields as
TIMESTAMP types (with the form yyyy-mm-dd HH:MM:SS.ffffffff) even
if you expect these fields to be formatted with the JDBC date escape
format of yyyy-mm-dd. Dates exported to Oracle should be formatted
as full timestamps.
Oracle also includes the additional date/time types TIMESTAMP WITH
TIMEZONE and TIMESTAMP WITH LOCAL TIMEZONE. To support these types,
the user’s session timezone must be specified. By default, Sqoop will
specify the timezone "GMT" to Oracle. You can override this setting
by specifying a Hadoop property oracle.sessionTimeZone on the
command-line when running a Sqoop job. For example:
$ sqoop import -D oracle.sessionTimeZone=America/Los_Angeles \
--connect jdbc:oracle:thin:@//db.example.com/foo --table barNote that Hadoop parameters (-D …) are generic arguments and
must appear before the tool-specific arguments (--connect,
--table, and so on).
Legal values for the session timezone string are enumerated at http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14225/applocaledata.htm#i637736.
Hive users will note that there is not a one-to-one mapping between
SQL types and Hive types. In general, SQL types that do not have a
direct mapping (for example, DATE, TIME, and TIMESTAMP) will be coerced to
STRING in Hive. The NUMERIC and DECIMAL SQL types will be coerced to
DOUBLE. In these cases, Sqoop will emit a warning in its log messages
informing you of the loss of precision.
Some general information is available at the Sqoop wiki.
Report bugs in Sqoop to the issue tracker at http://github.com/cloudera/sqoop/issues.
Questions and discussion regarding the usage of Sqoop should be directed to the sqoop-user mailing list.
Before contacting either forum, run your Sqoop job with the
--verbose flag to acquire as much debugging information as
possible. Also report the string returned by sqoop version as
well as the version of Hadoop you are running (hadoop version).