The Job Designer application enables you to create and submit Hadoop Map/Reduce jobs to the Hadoop cluster. You can include variables with your jobs to enable you and other users to enter values for the variables when they run your job. The Job Designer supports streaming and JAR jobs. For more information about Hadoop Map/Reduce, see the Hadoop Map/Reduce Tutorial.
![]() | Note A job's input files must be uploaded to the cluster before you can submit the job. |
Job Designer is one of the applications that can be installed as part of Hue. For more information about installing Hue, see https://ccp.cloudera.com/display/CDHDOC/Hue+Installation.
The following sections describe how to start and use Job Designer.
To start Job Designer, click this icon in the application bar at the bottom of the Hue web page. The Job Design List window opens in the Hue web page.
The Job Designer sample jobs can help you learn how to use Job Designer. To install the Job Designer samples, click Install Samples in the Job Design List window and then click Ok. The sample jobs are displayed in the Job Design List window. Job Designer removes the Install Samples button after the samples are installed so you can only install the samples once.
In the Job Designer, a job design specifies several meta-level properties of a Map/Reduce job, including the job design name, description, the Map/Reduce executable scripts or classes, and any parameters for those scripts or classes. You can create two types of job designs: a streaming job design and a JAR job design.
Hadoop streaming jobs enable you to create Map/Reduce functions in any non-Java language that reads standard Unix input and writes standard Unix output. For more information about Hadoop streaming jobs, see http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320/streaming.html
To create a streaming job design:
![]() | Note You can use variables of the form $variable_name for the Input, Output, Mapper Cmd, and Reducer Cmd settings described in the following table. When the streaming job is run, a dialog box will appear to enable you or users to specify the values of the variables. |
Setting | Description |
---|---|
Name | The Name identifies the streaming job design including the associated properties and parameters. |
Description | Specify a description of the streaming job design. The description is displayed in the dialog box that appears if you specify variables for the job. |
Input | Specify the path to the file or directory you want to use as the input data for the streaming job. If you specify a directory, all files in that directory are used for input. Equivalent to the Hadoop -input option. |
Output | Specify the path to the directory where you want to save the output of the streaming job. The directory cannot exist before you run the job or else the job will not run. (This requirement is a precaution to prevent overwriting data from other jobs.) Equivalent to the Hadoop -output option. |
Mapper Cmd | Specify the path to the mapper script or class. If the mapper file is not on the machines on the cluster, use the Required Files option to pack it as a part of job submission. Equivalent to the Hadoop -mapper option. |
Reducer Cmd | Specify the path to the reducer script or class. If the reducer file is not on the machines on the cluster, use the Required Files option to pack it as a part of job submission. Equivalent to the Hadoop -reducer option. |
Num Reduce Tasks | Specify the number of reduce tasks you want to use. Specify zero if you do not want to run any reducer tasks. If you don't specify a value for this setting, the default specified in your cluster configuration takes effect. The optimal number of reduce tasks is the product of the following values: -- a factor of 0.95 or 1.75 multiplied by: -- the number of nodes in your cluster multiplied by the mapred.tasktracker.reduce.tasks.maximum property If your reduce tasks are not very big, use a factor of 0.95 to use fewer reduce tasks than the number of nodes in your cluster. This factor allows for a small number of failed reduce tasks without increasing the time required for running the jobs. If your reduce tasks are very big, use a factor of 1.75 to use more reduce tasks than the number of nodes in your cluster. This factor allows for better load balancing and failed reduce tasks do not significantly increase the time required for running the jobs. |
Required Files | Specify the executable files that do not exist on the machines in the cluster to pack your executable files as a part of job submission. |
A Hadoop JAR consists of Map/Reduce functions written in Java.
To create a JAR job design:
![]() | Note You can use variables of the form $variable_name for the Arguments setting described in the following table. When the JAR job is run, a dialog box will appear to enable you or users to specify the values of the variables. |
Setting | Description |
---|---|
Name | The Name identifies the JAR job and it's collection of parameters. |
Description | Specify a description of the JAR job. The description is displayed in the dialog box that appears if you specify variables for the job. |
Jarfile | Specify the name of the JAR file, including the path. |
Arguments | Specify the arguments you want to pass to the running JAR job. |
To submit a job to a cluster:
If you want to edit and use a job but you don't own it, you can make a copy of it and then edit and use the copied job.
To copy a job design:
To edit a job design:
To delete a job design:
You can filter the Job Design List by owner, by job name, or both.
To filter the Job Design list:
To display job results: