Running Cloudera Desktop on EC2


Table of Contents
1. The Cloudera Distribution AMI for Hadoop
2. Updating the configuration
3. Start the cluster

1. The Cloudera Distribution AMI for Hadoop

To run Cloudera Desktop on Amazon EC2 is simply a customization of Cloudera's existing scripts to start Hadoop clusters on EC2. Before proceeding, familiarize yourself with the documentation for Cloudera Distrubtion AMI for Hadoop.

Note:


2. Updating the configuration

When working through the Getting Started section, you established an ec2-clusters.cfg file. To use Cloudera Desktop, you need to download a custom user_data_file script. First, download it.

# wget http://archive.cloudera.com/desktop/hadoop-and-desktop-ec2-init-remote-0.3.0.sh

Then, update the ec2-clusters.cfg file to reference it. For example,

[my-hadoop-cluster]
ami=ami-6159bf08
instance_type=c1.medium
key_name=tom
availability_zone=us-east-1c
private_key=PATH_TO_PRIVATE_KEY
ssh_options=-i %(private_key)s -o StrictHostKeyChecking=no
# Insert the following line!
user_data_file=/path/to/hadoop-and-desktop-ec2-init-remote-0.3.0.sh

3. Start the cluster

To start the cluster with 3 slave nodes, use hadoop-ec2 launch-cluster my-hadoop-cluster 3. You'll then want to access the Cloudera Desktop web page at port 8088. You can either use an SSH proxy, as described at "Launching Cluster", or you'll want to open up port 8088 in the my-hadoop-cluster-master security group. Since you can launch jobs on the cluster through Cloudera Desktop, do note that you don't want to be overly permissive with your firewall rules.