Cloudera Desktop Manual

Table of Contents

1. Introduction
2. Installation (Quickstart): Hadoop and Cloudera Desktop in one go
2.1. Getting RPM packages via yum (for Red Hat/CentOS systems)
2.2. Getting Debian packages via apt (for Debian/Ubuntu systems)
2.3. Start your cluster
3. Installation: Complete Instructions
3.1. Install Hadoop First!
3.2. Install Cloudera Desktop
3.3. Configuring Hadoop for Cloudera Desktop
3.4. Further Hadoop Configuration and Caveats
3.5. Restart Your Hadoop Cluster
4. Configuring Cloudera Desktop
4.1. Webserver Configuration
4.2. Authentication
4.3. Configuring for SSL
4.4. Pointing Cloudera Desktop to Your Master Nodes
5. Starting Cloudera Desktop
6. Administering Cloudera Desktop
6.1. Cloudera Desktop Processes
6.2. Cloudera Desktop Logging
6.3. The Cloudera Desktop Database
7. Conclusion: Using Cloudera Desktop
7.1. Supported Browsers
7.2. Feedback
7.3. Reporting Bugs

1. Introduction

Cloudera Desktop is a graphical user interface to operate and develop for Hadoop. Cloudera Desktop "applications" are collected into a desktop environment and delivered as a Web application, requiring no additional installation for individual users'. With Cloudera Desktop, the world-class performance and scalability of Apache Hadoop is now accessible to anyone in your organization.

This document will you install and configure Cloudera Desktop.

To use Cloudera Desktop, you’ll need to install both Cloudera’s Distribution of Hadoop (we recommend CDH2, the latest stable version as of March 2010), and Cloudera Desktop itself.

If you’re just experimenting, we recommend running everything on one machine; this is called a "pseudo-distributed cluster". The QuickStart instructions will guide you there. For installation on your production cluster, you’ll need the Complete Instructions.

2. Installation (Quickstart): Hadoop and Cloudera Desktop in one go

If you just want to get started as quickly as possible, these instructions instructions describe how to install Cloudera Desktop on one machine, a so-called "pseudo-distributed cluster". If you already have an existing multi-node cluster running, skip ahead to Complete Instructions.


  • All commands that need to be run as root have a # command prompt.
  • All commands that do not require root have a $ command prompt.
[Tip]What is pseudo-distributed mode?

"Pseudo-distributed mode" refers to an installation of Hadoop that contains all of the same pieces as a real production Hadoop cluster, just running on a single node. This mode is ideal if you’re just learning about Hadoop before you go on to set up a multinode cluster.

In order to install Cloudera Desktop and Hadoop in pseudo-distributed mode, you’ll need to add Cloudera’s Desktop’s apt or yum repository to your package manager’s configuration.

2.1. Getting RPM packages via yum (for Red Hat/CentOS systems)

If you are running on a Redhat-based system, you can easily download our software using yum. You can also just download the RPMs directly from

  1. Add the Yum repository. Download the cloudera-desktop.repo file and copy it into your /etc/yum.repos.d/ directory.

After you’ve set up the Cloudera rpm repository, you can install the hadoop-0.20-conf-pseudo-desktop package to automatically set up Desktop on a new pseudo-distributed Hadoop cluster:

# yum install hadoop-0.20-conf-pseudo-desktop

On 64-bit machines, please run:

# yum install hadoop-0.20-conf-pseudo-desktop cloudera-desktop.x86_64

2.2. Getting Debian packages via apt (for Debian/Ubuntu systems)

If you are running on a Debian-based platform, you can download our packages using apt.

  1. Add repository. Create a new file /etc/apt/sources.list.d/cloudera.list with the following contents, taking care to replace DISTRO with the name of your distribution (find out by running lsb_release -c)
deb DISTRO-testing contrib
deb-src DISTRO-testing contrib
  1. Add repository key. (optional) Add the Cloudera Public GPG Key to your repository by executing the following command:
# curl -s | apt-key add -

This allows you to verify that you are downloading genuine packages.

  1. Update APT package index. Simply run:
# apt-get update
  1. Install the Hadoop Pseudo Distributed Package. After you’ve set up the Cloudera apt repository, you can install the hadoop-0.20-conf-pseudo-desktop package to automatically set up Desktop on a new pseudo-distributed Hadoop cluster:
# apt-get install hadoop-0.20-conf-pseudo-desktop

2.3. Start your cluster

At this point, you should have Hadoop installed and configured for pseudo-distributed operation. To start your cluster, run the following commands:

# /etc/init.d/hadoop-0.20-namenode start
# /etc/init.d/hadoop-0.20-datanode start
# /etc/init.d/hadoop-0.20-secondarynamenode start
# /etc/init.d/hadoop-0.20-jobtracker start
# /etc/init.d/hadoop-0.20-tasktracker start

To verify that your cluster is running, visit http://localhost:50030/ in your web browser. You should see the Hadoop JobTracker Web Interface. To verify that the namenode is running, visit http://localhost:50070/.

Additionally, you can now run a simple example job to verify that your cluster can execute MapReduce jobs:

$ hadoop-0.20 jar /usr/lib/hadoop-0.20/hadoop*examples*jar pi 2 10000

After a minute or so, this example job should print out a very rough approximation of the value of Pi.

Finally, start up the Cloudera Desktop server:

# /etc/init.d/cloudera-desktop start

Cloudera Desktop should have started at http://localhost:8088/.

At this point, skip ahead to Using Cloudera Desktop.

3. Installation: Complete Instructions

These instructions will guide you through installing Cloudera Desktop on a multi-node cluster. You will need to update some Hadoop configuration as well as install Cloudera Desktop.


You’ll need to install the Cloudera Desktop Plug-ins on every machine that’s running Hadoop daemons.

3.1. Install Hadoop First!

Cloudera Desktop requires an installation of the Cloudera Distribution of Hadoop (version 2). See link: for instructions.

To use Cloudera Desktop, you must be running Cloudera’s Distribution for Hadoop with a version number of at least 0.20.1+133. If you are not running at least this version of Hadoop, please upgrade your cluster before proceeding.

3.2. Install Cloudera Desktop

Cloudera Desktop consists of a web service that runs on a special node in your cluster. You should pick one node to run Cloudera Desktop. From this point on, we’ll refer to that node as the "Cloudera Desktop Server". This should be one of the nodes within your cluster for optimal performance, though it can be a remote node as long as there are no overly restrictive firewalls.

For small clusters of less than 10 nodes, it is fine to choose your existing master node as the Cloudera Desktop Server.

You can install Cloudera Desktop via package repositories (yum or apt) or by using a tarball.

3.2.1. Redhat-based Repositories (yum)

If you are running on a Redhat-based system, you can easily download our software using yum. You can also just download the RPMs directly from

  1. Add the Yum repository. Download the cloudera-desktop.repo file and copy it into your /etc/yum.repos.d/ directory.
  2. Find and install packages. For example:
# yum search hadoop
# yum install cloudera-desktop

On your Cloudera Desktop Server, run:

# yum install cloudera-desktop

If your server is running a 64-bit Linux, instead run:

# yum install cloudera-desktop.x86_64

On every node in your cluster that is running Hadoop daemons, run:

# yum install cloudera-desktop-plugins

3.2.2. Debian-based Repositories (apt)

If you are running on a Debian-based platform, you can download our packages using apt.

  1. Add repository. Create a new file /etc/apt/sources.list.d/cloudera-desktop.list with the following contents, taking care to replace DISTRO with the name of your distribution (find out by running lsb_release -c)
deb DISTRO-testing contrib
deb-src DISTRO-testing contrib
  1. Add repository key. (optional) Add the Cloudera Public GPG Key to your repository by executing the following command:
# curl -s | apt-key add -

This allows you to verify that you are downloading genuine packages.

  1. Update APT package index. Simply run:
# apt-get update
  1. Find and install packages. You may now find and install packages from the Cloudera repository using your favorite APT package manager (e.g apt-get, aptitude, or dselect). For example:
# apt-cache search cloudera-desktop

On your Cloudera Desktop Server, run:

# apt-get install cloudera-desktop

On every node in your cluster that is running Hadoop daemons, run:

# apt-get install cloudera-desktop-plugins

3.2.3. Installing Cloudera Desktop via the tarball

You can also install Cloudera Desktop via a tarball. Installing Dependencies

Cloudera Desktop employs some Python modules which use native code and depend on certain development libraries being on your system. To install from the tarball you must have the following installed:

Table 1. Required Dependencies

Redhat Debian
libxslt-devel libxslt-dev
libxml2-devel libxml2-dev
sqlite-devel libsqlite3-dev
python-devel python2.4-dev(orpython2.5-dev)
gcc gcc
python-setuptools python-setuptools

Additionally, you must have Cloudera’s Distribution for Hadoop installed and available. Specifically, your Hadoop version must be at least 0.20.1+133. If you have a previous release, upgrade Hadoop before continuing with Cloudera Desktop. Build

With $HADOOP_HOME and $PREFIX configured to the path of your Hadoop installation and the path where you’d like to install Cloudera Desktop, respectively, run:

$ HADOOP_HOME=/path/to/hadoop-0.20.1+152 PREFIX=/path/to/install/into make desktop install

You may install Cloudera Desktop anywhere on your system - it does not need root access. We recommend that you create a new user for Cloudera Desktop and either install in that user’s home directory or in a directory within /usr/local. Install Hadoop Plugins

In order to communicate with Hadoop, Cloudera Desktop requires that you install and configure a plugin JAR. This jar is libs/hadoop/java-lib/cloudera-desktop-plugins-0.4.1.jar.

Symlink this jar into your Hadoop lib directory (/usr/lib/hadoop-0.20/lib if you’ve installed CDH via a Debian or RPM package):

$ cd /usr/lib/hadoop-0.20/lib
$ ln -s /usr/local/cloudera-desktop/libs/hadoop/java-lib/cloudera*jar
# Restart Hadoop

On a multi-node cluster, you will need to install the plug-ins on every node. You do not need to install the entirety of Cloudera Desktop everywhere, but the plug-in jars need to be available on every machine. Restart Hadoop

Once you have made these changes in your Hadoop configuration, go ahead and restart your Hadoop daemons. Running Cloudera Desktop

To start Cloudera Desktop, you’ll use env/bin/supervisor. This will start several subprocesses, corresponding to different bits of Cloudera Desktop. FAQ about Tarball Installation

I moved my Cloudera Desktop installation from one directory to another and it no longer. functions correctly.

Due to the use of absolute paths by some python packages, you’ll have to run a series of commands if you relocate your Cloudera Desktop installation. From within the new location, run:

$ make desktop

This should solve the problem.

Why does "make install" compile all of these other pieces of software? In order to ensure that Cloudera Desktop is stable on a variety of distributions and architectures, it installs a Python "virtual environment" which includes its dependencies. This ensures that the software can depend against specific versions of various Python libraries and not have to worry about what might or might not be installed already on your particular system.

3.3. Configuring Hadoop for Cloudera Desktop

Cloudera Desktop requires that you install and configure some plugins in your Hadoop installation.

When you installed the cloudera-desktop-plugins package above, your package manager automatically added the required plugin jar to your Hadoop installation’s lib directory, making them available to Hadoop. In order to enable the plugins, you’ll need to make some small additions to your configuration.

These configuration changes should be made on each node in your cluster by editing the files in /etc/hadoop-0.20/conf/

3.3.1. hdfs-site.xml

You’ll want to add the following configuration options to hdfs-site.xml.

  <description>Comma-separated list of namenode plug-ins to be activated.
  <description>Comma-separated list of datanode plug-ins to be activated.

3.3.2. mapred-site.xml

Add the following to mapred-site.xml:

  <description>Comma-separated list of jobtracker plug-ins to be activated.


To enable full monitoring in the Health application, the metrics contexts must not be NullContext. You might configure like so:

# Exposes /metrics URL endpoint for metrics information.

3.4. Further Hadoop Configuration and Caveats


If you are setting $HADOOP_CLASSPATH in your, be sure to set it in such a way that user-specified options are preserved. For example,

# Good

# Bad
# HADOOP_CLASSPATH=<your_additions>

This will enable certain portions of Cloudera Desktop to add to Hadoop’s classpath using the environment variable.

3.4.2. hadoop.tmp.dir

If your users are likely to be submitting jobs both using Cloudera Desktop and from the same machine, they will be doing so as the cdesktop user if they’re using Cloudera Desktop and as the their own username if they’re using the command line. This yields to some contention on the directory specified by hadoop.tmp.dir, which defaults to /tmp/hadoop-${}. Specifically, hadoop.tmp.dir is used to unpack jars in bin/hadoop jar. One work around to this is to set hadoop.tmp.dir to /tmp/hadoop-${}-${cdesktop.suffix}:


Unfortunately, when the variable is unset, you’ll end up with directories named /tmp/hadoop-user_name-${cdesktop.suffix} in /tmp. The job submission daemon, however, will

3.5. Restart Your Hadoop Cluster

At this point you should restart all of the daemons in your cluster so that the plugins can be loaded.

You can confirm that the plugins are running correctly by tailing the daemon logs:

$ tail --lines=500 /var/log/hadoop-0.20/hadoop*namenode*.log | grep ThriftPlugin
2009-09-28 16:30:44,337 INFO org.apache.hadoop.thriftfs.ThriftPluginServer: Starting Thrift server
2009-09-28 16:30:44,419 INFO org.apache.hadoop.thriftfs.ThriftPluginServer: Thrift server listening on
[Tip]Configuring Your Firewall for Cloudera Desktop

Cloudera Desktop currently requires that the machines within your cluster can speak to each other freely over TCP. The machines outside your cluster only need to be able to open TCP port 8088 on the Cloudera Desktop Server to interact with the system.

4. Configuring Cloudera Desktop

Cloudera Desktop ships with a default configuration that will work for pseudo-distributed clusters. If you are running on a real cluster, you’ll need to make a few small changes to its configuration file. Here we go through the key configuration options.

Edit /etc/cloudera-desktop/cloudera-desktop.ini in your favorite editor.

[Tip]Full Listing of Configuration Options

In order to see a full listing of the available configuration options, you can run:

/usr/share/cloudera-desktop/env/bin/desktop config_help | less

This will outline the various sections and options in the configuration, as well as provide help and information on the default values.

[Tip]View Current Configuration Options

You can also view the current configuration from within Cloudera Desktop, at:

[Tip]Using Multiple Files to Store Your Configuration

Cloudera Desktop will load and merge all of the files with extension .ini located in the /etc/cloudera-desktop/conf/ directory. Files that are alphabetically later will take precedence.

4.1. Webserver Configuration

Cloudera Desktop uses the CherryPy web server. You can change the IP address and port the web server listens on, which is port 8088 on all configured IP addresses. (Use http_host and http_port.)

4.2. Authentication

By default, the first user who logins to Cloudera Desktop may choose any username and password, and becomes an administrator automatically. This user may create other user and administrator accounts. User information is stored in the Django backend, in the Django database.

To use PAM authentication, follow the instructions under the [[auth]] section of the file. Note that under PAM authentication mode, the User Manager application does not update your system accounts.

[Tip]Password Security and PAM Authentication

Cloudera Desktop does not ship with SSL enabled by default. If you choose to enable PAM authentication, be aware that HTTP is not an encrypted protocol and users' authentication details may be revealed over the network.

4.3. Configuring for SSL

You can configure Cloudera Desktop to serve over HTTPS. To do so, you’ll need to install "pyOpenSSL" within Desktop’s context and configure your keys.

To install pyOpenSSL, from the root of your Desktop installation (/usr/share/cloudera-desktop if you installed from packages, type) execute:

$ ./env/bin/easy_install pyOpenSSL

Then, configure Cloudera Desktop to use your private key by putting the following inside your /etc/cloudera-desktop/cloudera-desktop.ini:


Ideally, you would have an appropriate key signed by a Certificate Authority. If you’re just testing, you can create a self-signed key using the openssl command, that may be installed on your system:

# Create a key
$ openssl genrsa 1024 > host.key
# Create a self-signed certificate
$ openssl req -new -x509 -nodes -sha1 -key host.key > host.cert

4.4. Pointing Cloudera Desktop to Your Master Nodes

If your Hadoop cluster is made up of multiple nodes, you should configure Cloudera Desktop to point to the external hostnames of your Namenode and JobTracker. To do so, simply change the namenode_host and jobtracker_host lines in the configuration file. The inline comments in the existing file will guide you.

5. Starting Cloudera Desktop

Once your cluster is up and running with the plugins enabled, you can start Cloudera Desktop. On your Cloudera Desktop Server, run:

# service cloudera-desktop restart

Congratulations! Your Cloudera Desktop installation is now up and running!

6. Administering Cloudera Desktop

Now that you’ve installed and started Cloudera Desktop, you can feel free to skip ahead to Using Cloudera Desktop. Administrators may want to refer to this section for more details about managing and operating a Cloudera Desktop installation.

6.1. Cloudera Desktop Processes

6.1.1. Process Hierarchy

Cloudera Desktop runs several processes under the hood, all managed by a script called the supervisor. The supervisor is a watchdog process — its only purpose is to spawn and monitor other processes.

A stock Desktop installation will spawn and monitor the following processes: * runcpserver - a web server based on CherryPy that provides the core web functionality of Cloudera Desktop * jobsubd - a daemon which handles submission of jobs to Hadoop * run_healthd - a daemon which constantly monitors the health of nodes in your Hadoop cluster

If you have installed other applications into your Desktop instance, you may see other daemons running under the supervisor as well.

We can see the supervised processes running in the output of ps:

[todd@monster01 ~]$ ps -f -u cdesktop
cdesktop  7899     1  0 12:14 ?        00:00:00 /usr/share/cloudera-desktop/env/bin/python2.4 \
  /usr/share/cloudera-desktop/env/bin/supervisor -p /var/run/cloudera-desktop/s
cdesktop  7903  7899  1 12:14 ?        00:02:59 /usr/share/cloudera-desktop/env/bin/python2.4 \
  /usr/share/cloudera-desktop/env/bin/desktop runcpserver
cdesktop  7906  7899  0 12:14 ?        00:00:00 /usr/share/cloudera-desktop/env/bin/python2.4 \
  /usr/share/cloudera-desktop/env/bin/desktop jobsubd
cdesktop  7907  7899  0 12:14 ?        00:00:12 /usr/share/cloudera-desktop/env/bin/python2.4 \
  /usr/share/cloudera-desktop/env/bin/desktop run_healthd

Note that the supervisor will automatically restart these processes should they fail for any reason. If the processes fail repeatedly within a small time window, the supervisor itself will shut down.

6.1.2. Managing Cloudera Desktop Processes

Cloudera Desktop RPMs and Debian packages ship with an init.d script to manage the Cloudera Desktop processes. You can start and stop the Desktop Supervisor using this init script, for example:

# /etc/init.d/cloudera-desktop stop

If for some reason the init scripts are unable to stop the process, you can kill the daemon manually by locating and killing the supervisor process as described above.

6.2. Cloudera Desktop Logging

The Cloudera Desktop logs are found in /var/log/cloudera-desktop if you have installed via your package manager, or /path/to/cloudera-desktop/logs if you’ve installed via a tarball. Inside the log directory you will find stderr and stdout logs for each of the processes described above.

If users on your cluster experience issues, you can often find error messages in these log files.

If you are unable to start Cloudera Desktop from the init script, the supervisor.stderr log file can often contain clues.

The cpserver.stderr log file provides an access log for all requests against the Desktop web server.

We recommend that production installations configure log rotation software like logrotate to periodically rotate and archive these logs.

6.2.1. Viewing recent log messages through your browser

In addition to logging INFO level messages to the log directory, the Desktop web server keeps a small buffer of log messages at all levels in memory. You can view these logs by visiting http://myserver:8088/logs. The DEBUG level messages present here can sometimes be helpful in troubleshooting issues.

6.3. The Cloudera Desktop Database

Cloudera Desktop requires a SQL database to store small amounts of data, including user account information as well as history of job submissions and Hive queries. By default, Desktop is configured to use the embedded database SQLite for this purpose, and should require no configuration or management by the administrator.

6.3.1. Inspecting the Desktop Database

The default SQLite database used by Cloudera Desktop is located in /usr/share/cloudera-desktop/desktop/desktop.db. You can inspect this database from the command line using the sqlite3 program. For example:

# sqlite3 /usr/share/cloudera-desktop/desktop/desktop.db
SQLite version 3.3.6
Enter ".help" for instructions
sqlite> select username from auth_user;

We strongly advise you not to make any modifications to the database directly using SQLite, though this trick may be handy for management or troubleshooting.

6.3.2. Backing up the Desktop Database

To back up the Desktop Database, you can simply copy the desktop.db file to another node. We recommend that you back it up on a regular schedule, and also that you back it up before any upgrade to a new version of Cloudera Desktop.

6.3.3. Configuring Desktop to access another Database

Although SQLite is the default database type, some advanced users may prefer to have Desktop access an alternate database type. Please note that, if you elect to configure Desktop to use an external database, upgrades may require more manual steps in the future.

Here we provide instructions for MySQL, though Desktop may also be made to work with other common databases including PostgreSQL, Oracle, etc. Configuring Desktop to store data in MySQL

First, you must create a new database in MySQL and grant privileges to a desktop user to manage this database.

mysql> create database desktop_example;
Query OK, 1 row affected (0.01 sec)

mysql> grant all on desktop_example.* to 'desktop_user'@'localhost' identified by 'secretpassword';
Query OK, 0 rows affected (0.00 sec)

Next, shut down Desktop if it is running, and edit cloudera-desktop.ini. Directly below the [desktop] line, add the following:


Next we need to install the python drivers for MySQL into Desktop’s environment:

# su - cdesktop -s /bin/bash
$ /usr/share/cloudera-desktop/env/bin/easy_install MySQL-python

Now, still as the cdesktop user, we instruct Desktop to create the necessary database tables:

$ /usr/share/cloudera-desktop/env/bin/desktop syncdb --noinput

Now you are all set up and can start the Cloudera Desktop server as normal.

7. Conclusion: Using Cloudera Desktop

After installation, you use Cloudera Desktop by simply navigating to http://myserver:8088/. You’ll be greeted with a login screen:


After login, you’ll be greeted with a welcome screen. images/post-login.png

Launch applications on the top-right. images/open-apps.png

The Help application, visible in the screenshot above, guides users through the various installed applications.

7.1. Supported Browsers

Cloudera Desktop is primarily tested on Firefox 3.5 and Firefox 3.6, on Windows, Mac, and Linux. Google Chrome and Safari work as well. Support for Internet Explorer is forthcoming.

7.2. Feedback

Cloudera Desktop 0.4.1 is currently in private beta. We’re excited to receive your feedback and criticisms. We expect that you may uncover some bugs and mishaps. The best way to send feedback is to send us e-mail, at

7.3. Reporting Bugs

If you find that something doesn’t work, it’ll often be helpful to include logs from your server. These are available at the /logs URL on Desktop’s webserver (not part of the graphical Desktop UI). Please download the logs as a zip (or cut and paste the ones that look relevant) and send those with your bug reports. images/logs.png