RepoNameRelease DateTarballsAptYum
StableCDH1March 2009/cdh/stable/debian/redhat/cdh/stable
TestingCDH2August 2009/cdh/testing/debian/redhat/cdh/testing
Cloudera logo

1. Introduction

Sqoop is a tool designed to help users of large data import existing relational databases into their Hadoop clusters. Sqoop uses JDBC to connect to a database, examine each table's schema, and auto-generate the necessary classes to import data into HDFS. It then instantiates a MapReduce job to read tables from the database via the DBInputFormat (JDBC-based InputFormat). Tables are read into a set of files loaded into HDFS. Both SequenceFile and text-based targets are supported. Sqoop also supports high-performance imports from select databases including MySQL.

This document describes how to get started using Sqoop to import your data into Hadoop.