RepoNameRelease DateTarballsAptYum
StableCDH1March 2009/cdh/stable/debian/redhat/cdh/stable
TestingCDH2August 2009/cdh/testing/debian/redhat/cdh/testing
Cloudera logo
1.3.2.4. Installing Pig

To install pig, run the following:

Example 58. Installing pig on Debian-based systems

# apt-get -y install pig

Example 59. Installing pig on Redhat-based systems

# yum install hadoop-pig -y

[Note]Note

Pig automatically uses the active Hadoop configuration (whether standalone or pseudo or other). Once you have the Pig package installed, you can just jump into the grub shell.

Example 60. Starting the grub shell

$ pig
0    [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting to hadoop file system at: hdfs://localhost:8020
352  [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting to map-reduce job tracker at: localhost:9001
grunt>

We can run the same grep example we did before from the grunt commandline.

First, let's verify that the input and output directories from our previous example grep job exist:

Example 61. Listing HDFS directory from grub shell

grunt> ls
hdfs://localhost/user/matt/input        <dir>
hdfs://localhost/user/matt/output       <dir>

Let's run the same grep example job using Pig

Example 62. Grep inputs using Pig

grunt> A = LOAD 'input';
grunt> B = FILTER A BY $0 MATCHES '.*dfs[a-z.]+.*';
grunt> DUMP B;

You might want to take a look at the JobTracker web console http://localhost:50030/ while your job is running to check its status.

[Tip]Tip

To learn more about Pig, visit the Pig page at http://hadoop.apache.org/pig/.