Chapter 9. Configure Jobs

With the OpenStack Data Processing service, jobs define the actual data processing tasks. Each job specifies a job type (for example, Pig, Hive, or MapReduce), binary, script, and library. A job can only use binaries, scripts, or libraries that are registered with the OpenStack Data Processing service.

After creating a job, you can then launch it on a cluster and run it against an input data source. Input and output data sources, like job binaries, must also be registered first with the OpenStack Data Processing service (see Section 5.1, “Register Input and Output Data Sources”).

9.1. Register Job Binaries, Scripts, or Libraries

The process for registering job binaries, scripts, and libraries is similar to image and data source registration. You can register them directly from the Object Storage service; for instructions on how to upload objects to the Object Storage service, see Upload an Object. Alternatively, you can also upload binaries and libraries directly from your local file system directly into the OpenStack Data Processing service.

  1. In the dashboard, select Project > Data Processing > Job Binaries.
  2. Click Create Job Binary.
  3. Enter a name for the object (namely, your script, binary, or library). This name will be used when selecting the object. If your object requires a particular name or extension (for example, .jar), include it here.
  4. Use the Description field to describe the script, binary, or library you are registering (optional).
  5. Configure the object depending on its storage type.

    1. If the object is available through the Object Storage service, select Swift from the Storage type drop-down menu. Then:

      • Provide the container and object name of your script, binary, or library as swift://CONTAINER/OBJECT in the URL field.
      • If your script, binary, or library requires a login, supply the necessary credentials in the Username and Password fields.
    2. If the object is an S3 job binary, select S3 from the Storage type drop-down menu. Then:

      • Enter the S3 URL in the following format: s3://bucket/path/to/object. You must also enter the accesskey, secretkey, and endpoint. The endpoint is the URL of the S3 service, including the http or https protocol.
    3. Otherwise, select Internal database from the Storage type drop-down menu. Then, use the Internal binary drop-down menu to either:

      • Select an available binary, library, or script from the OpenStack Data Processing service, or
      • Input a script directly into the dashboard (Create a script), or
      • Upload a binary, library, or script directly from your local file system (Upload a new file).
  6. Click Create. The binary, library, or script should now be available in the Job Binaries table.

9.2. Create a Job Template

Once the required binaries, scripts, and libraries are registered with OpenStack Data Processing, perform the following steps:

  1. In the dashboard, select Project > Data Processing > Job Templates.
  2. Click Create Job Template.
  3. Enter a name for your job in the Name field.
  4. Select the correct type from the Job Type drop-down menu. For more information about job types, consult your chosen plug-in’s documentation regarding supported job types.
  5. Select the binary that should be used for this job from the Choose a main binary drop-down menu. The options in this menu are populated with job binaries and scripts registered with the OpenStack Data Processing service; for more information, see Section 9.1, “Register Job Binaries, Scripts, or Libraries”.
  6. Use the Description field to describe the job you are creating (optional).
  7. If the job binary you specified requires libraries, add them. To do so, click the Libs tab and select a library from the Choose libraries drop-down menu. Then, click Choose to add the library to the job; the library should be included in the Chosen libraries list. Repeat this for every library required by the job binary. Like binaries, the options in the Choose libraries drop-down menu are populated with libraries registered with the OpenStack Data Processing service. For more information, see Section 9.1, “Register Job Binaries, Scripts, or Libraries”.
  8. Click Create. The job should now be available in the Jobs table.