Chapter 10. Launch Jobs

After creating a job, you can launch it to process data registered with the OpenStack Data Processing service (see ]. Jobs require a cluster; you can launch the job on an existing cluster (xref:launchexistingcluster[) or an entirely new one (Section 10.2, “Launch a Job on a New Cluster”).

Note

Launching a job involves specifying input data sources and output data destination. Both objects must first be registered with the OpenStack Data Processing service. For more information, see Section 5.1, “Register Input and Output Data Sources”.

10.1. Launch a Job on an Existing Cluster

To view a list of existing clusters in the dashboard, select Project > Data Processing > Clusters. For information on how to launch a cluster, see Chapter 8, Launch a Cluster.

To launch a job on an existing cluster:

  1. In the dashboard, select Project > Data Processing > Jobs. The Jobs table displays all available job templates; see Section 9.2, “Create a Job Template” for details on creating new job templates.
  2. Choose which job template to use; then, select Launch On Existing Cluster from the job template’s Actions drop-down menu.
  3. On the Launch Job wizard, select your input data source from the Input drop-down menu. Then, select your output destination from the Output drop-down menu.

    If needed, you can also register your input data source or output destination from here. To do so, click the + on either Input or Output drop-down menus. Doing so will open the Create Data Source wizard; for more information, see Section 5.1, “Register Input and Output Data Sources”.

  4. From the Cluster drop-down menu, select which cluster the job should run on.
  5. If you need to set any special job properties for this job, click the Configure tab. From there, click Add under either Configuration or Parameters to specify any special name/value pairs. You can specify multiple name/value pairs through this tab.

    For more information about supported job properties, consult your chosen Hadoop plug-in’s documentation.

  6. Click Launch.

To view the status of launched jobs, select Project > Data Processing > Jobs. See Section 10.3, “Delete or Re-Launch Launched Jobs” for instructions on how to re-launch or delete a launched job.

10.2. Launch a Job on a New Cluster

After creating a job template, you can also use it to launch a job on an entirely new cluster. Doing so gives you the option to automatically kill the cluster after the job is finished.

  1. In the dashboard, select Project > Data Processing > Jobs. The Jobs table displays all available jobs; see Section 9.2, “Create a Job Template” for details on creating new jobs.
  2. Choose which job template to use; then, select Launch On New Cluster from the job template’s Actions drop-down menu.
  3. Use the plug-in Name and Version drop-down menus to select the name and version of the Hadoop plug-in that the job will use.
  4. Click Create.
  5. Enter a name for your cluster in the Cluster Name field.
  6. Select a Hadoop image that the cluster should use from the Base Image drop-down menu. For details on creating and registering a Hadoop image, see ] and xref:registercomp[.
  7. If needed, select a key pair from the Keypair drop-down menu. You can also click + beside this menu to create a new key pair. While key pairs are not required to launch a cluster, you will need them to log into cluster nodes (for example, through SSH).

    For information on key pairs, see Manage Key Pairs.

  8. Select which network the cluster should use from the Neutron Management Network drop-down menu. For more details on adding and managing networks in OpenStack, see Common Administrative Tasks.
  9. By default, the OpenStack Data Processing service will delete the cluster as soon as the job finishes. To prevent this from happening, select the Persist cluster after job exit check box.
  10. Next, click the Job tab. From there, select your input data source from the Input drop-down menu. Then, select your output destination from the Output drop-down menu.
  11. If needed, you can also register your input data source or output destination from here. To do so, click the + on either Input or Output drop-down menus. Doing so will open the Create Data Source wizard; for more information, see Section 5.1, “Register Input and Output Data Sources”.
  12. If you need to set any special job properties for this job, click the Configure tab. From there, click Add under either Configuration or Parameters to specify any special name/value pairs. You can specify multiple name/value pairs through this tab.

    For more information about supported job properties, consult your chosen Hadoop plug-in’s documentation.

  13. Click Launch.

To view the status of launched jobs, select Project > Data Processing > Jobs. See Section 10.3, “Delete or Re-Launch Launched Jobs” for instructions on how to re-launch or delete a launched job.

10.3. Delete or Re-Launch Launched Jobs

To view the status of launched jobs, select Project > Data Processing > Jobs. From here, you can delete or re-launch a job.

To delete a launched job, select Delete job execution from its Action drop-down menu. You can also delete multiple launched jobs by selecting their check boxes and clicking the Delete job executions button.

To re-launch a job on an existing cluster, select Relaunch on Existing Cluster from its Action drop-down menu. For instructions on how to continue, see Section 10.1, “Launch a Job on an Existing Cluster”.

Alternatively, you can re-launch a job execution on a completely new cluster. To do so, select Relaunch on New Cluster from its Action drop-down menu. For instructions on how to continue, see Section 10.2, “Launch a Job on a New Cluster”.