Introducing Oozie Editor/Dashboard

The Oozie Editor/Dashboard application allows you to define Oozie workflow and coordinator applications, run workflow and coordinator jobs, and view the status of jobs. 

A workflow application is a collection of actions arranged in a directed acyclic graph (DAG). It includes control flow nodes (start, end, fork, join, and kill) and action nodes (MapReduce, streaming, Java, Pig, Hive, Sqoop, Shell, and ssh actions). The current release does not support the decision control flow node and the fs and Oozie sub-workflow action nodes.

A coordinator application allows you to define and execute recurrent and interdependent workflow jobs. The coordinator application defines the conditions under which the execution of workflows can occur.

Contents

Oozie Editor/Dashboard Installation and Configuration

Oozie Editor/Dashboard is one of the applications that is installed as part of Hue. For more information about installing Hue, see Hue Installation. For information about Oozie, see Oozie Documentation.

Note: In order to run streaming or Pig jobs as part of a workflow, Oozie must be configured to use the Oozie ShareLib. If this is not the case, Pig and streaming actions will not run. See Hue Installation for more information.

Starting Oozie Editor/Dashboard

To start Oozie Editor/Dashboard, click the Oozie Editor/Dashboard icon ( ) in the navigation bar at the top of the Hue browser page. Oozie Editor/Dashboard opens with the following screens:

Filtering Lists in Oozie Editor/Dashboard

The Dashboard, Workflows, Coordinators, and History screens contain lists of workflows, coordinators, and jobs. When you type in the Filter field on these screens, the lists are dynamically filtered to display only those rows containing text that matches the specified substring.

Permissions in Oozie Editor/Dashboard

In the Dashboard workflows and coordinators can only be viewed, submitted, and modified by its owner or a superuser.

Editor permissions for performing actions on workflows and coordinators are summarized in the following table:

Action Superuser or Owner
All
View Y Only if "Is shared" is set
Submit Y Only if "Is shared" is set
Modify Y N

Oozie Dashboard

Oozie Dashboard shows a summary of the running and completed workflow and coordinator jobs.

You can view jobs for a period up to the last 30 days.

You can filter the list by date (1, 7, 15, or 30 days) or status (Succeeded, Running, or Killed). The date and status buttons are toggles.

Workflows

Click the Workflows tab to view the running and completed workflows for the filters you have specified.

Click a workflow row in the Running or Completed table to view detailed information about that workflow.

For the selected workflow, the following tabs and information is available.

Coordinators

Click the Coordinators tab to view the running and completed coordinator jobs for the filters you have specified.

For the selected coordinator, the following tabs and information is available.

Workflow Editor

The Workflow Editor is where you create or edit Oozie workflows and submit them for execution. The Workflow Editor comes with several preinstalled sample workflows.

In Workflow Editor, you can create workflows that include MapReduce, streaming, Java, Pig, Hive, Sqoop, Shell, and ssh actions. You can create these actions in the Workflow Editor, or you can import job designs from Job Designer to be used as actions in your workflow.

Click the Workflows tab to open the Workflow editor.

The main page of the workflow editor shows the current set of workflow designs.

Each row shows a workflow design: its name, description, timestamp of its last modification. It also shows:

Opening a Workflow

To open a workflow, click the workflow. Proceed with Editing a Workflow.

Creating a Workflow

To create a workflow:

  1. Click the Create button at the top right of the Action Chooser. 
  2. In the Name field, type a name.
  3. To specify the HDFS deployment directory and Oozie schema version click advanced.
  4. Click Save. The workflow editor opens. Proceed with Editing a Workflow.

Editing a Workflow

In the workflow editor you can add and delete actions, clone actions, create and remove fork and join control nodes, and move actions as follows:

Each action must have a unique name.

Editing Workflow Properties

  1. In the workflow editor, click the Properties tab.
  2. To share the workflow with all users, check the Is shared checkbox.
  3. To set advanced execution options, click advanced and edit the deployment directory, add parameters and job properties, or Oozie schema version, .
  4. Click Save.

Submitting a Workflow

To submit a workflow for execution, click the radio button next to the workflow and click the Submit button.

Scheduling a Workflow

To schedule a workflow for recurring execution, click the radio button next to the workflow and click the Schedule button. A coordinator is created and opened in the coordinator editor.

Coordinator Editor

The Coordinator Editor is where you create or edit Oozie coordinator applications and submit them for execution. The Workflow Editor contains one pre-installed sample coordinator.

Opening a Coordinator

To open a coordinator, click the coordinator. Proceed with Editing a Coordinator.

Creating a Coordinator

To create a coordinator:

  1. Click the Create button at the top right of the Action Chooser. 
  2. In the Name field, type a name.
  3. In the Workflow drop-down list, choose a workflow that the coordinator will schedule.
  4. In the Frequency area, specify how often the workflow will be scheduled and how many times it will run.
  5. Click Save. The coordinator editor opens. Proceed with Editing a Coordinator.

Editing a Coordinator

Note: Most workflows require either an input dataset, an output dataset, or both.

In the coordinator editor you specify coordinator properties and the datasets on which the workflow scheduled by the coordinator will operate as follows:

  1. To share the coordinator with all users, check the Is shared checkbox.
  2. To set advanced execution options, click advanced and fill in properties that determine how long a coordinator will wait before timing out, how many coordinators can wait and run concurrently, the coordinator scheduling policy, and the coordinator schema version.
  3. In the Frequency area, set how many time thes communicator will run for each specified unit, the start and end time of the coordinator, and the timezone of the start and end times.
  4. The inputs and outputs of the workflow must be mapped to some data. Click Add and select a dataset from the Dataset drop-down menu and map it to one variable of your workflow.
    If no datasets exist, follow the procedure in Creating a Dataset.
  5. Select a dataset from the Dataset drop-down menu.
  6. Click Save.

Creating a Dataset

  1. In the coordinator editor, do one of the following:
  2. Click Create.
  3. In the Start and Frequency fields, specify when and how often input datasets will be available.
  4. In the Uri field, specify a URI template for the location of input and output datasets. You can specify the variables 
    ${YEAR},${MONTH},${DAY},${HOUR},${MINUTE}

    to construct URIs and URI paths containing dates and timestamps. For example:

    hdfs://foo:9000/usr/app/stats/${YEAR}/${MONTH}/data
  5. Specify the timezone of the start date.
  6. In the Done flag field, specify the flag that identifies when input datasets are no longer ready.

Submitting a Coordinator

To submit a coordinator for execution, click the radio button next to the coordinator and click the Submit button.

Submissions History

The Submissions History is where you view the history of workflow and coordinator jobs. Clicking a link in the Name column opens the workflow or coordinator in an editor. Clicking a link in the Submission Id column opens the job in the Dashboard.