Chapter 8. Configuring custom models

As an organization administrator, you can create and use fine-tuned, custom models that are trained on your organization’s existing Ansible content. With this capability, you can tune the models to your organization’s automation patterns and improve the code recommendation experience.

You can configure multiple custom models for your organization. For example, you can create a custom model for your corporate IT automation team and a different one for your engineering team’s infrastructure. You can also configure a custom model to make it available for all Ansible users or select Ansible users in your organization.

8.1. Process for configuring custom models

To configure a custom model, perform the following tasks:

8.2. Creating a training data set by using content parser tool

Use the content parser tool, a command-line interface (CLI) tool, to scan your existing Ansible files and generate a custom model training data set. The training data set includes a list of Ansible files and their paths relative to the project root. You can then upload this data set to IBM watsonx Code Assistant, and use it to create a custom model that is trained on your organization’s existing Ansible content.

8.2.1. Methods of creating training data sets

You can generate a training data set by using one of the following methods:

  • With ansible-lint preprocessing

    By default, the content parser tool generates training data sets by using ansible-lint preprocessing. The content parser tool uses ansible-lint rules to scan your Ansible files and ensure that the content adheres to Ansible best practices. If rule violations are found, the content parser tool excludes these files from the generated output. In such scenarios, you must resolve the rule violations, and run the content parser tool once again so that the generated output includes all your Ansible files.

  • Without ansible-lint preprocessing

    You can generate a training data set without ansible-lint preprocessing. In this method, the content parser tool does not scan your Ansible files for ansible-lint rule violations; therefore, the training data set includes all files. Although the training data set includes all files, it might not adhere to Ansible best practices and could affect the quality of your code recommendation experience.

8.2.2. Supported data sources

The content parser tool scans the following directories and file formats:

  • Local directories
  • Archived files, such as .zip, .tar, .tar.gz, .tar.bz2, and .tar.xz files
  • Git repository URLs (includes both private and public repositories)

8.2.3. Process of creating a training data set

To create a custom model training data set, perform the following tasks:

  1. Install content parser tool on your computer
  2. Generate a custom model training data set
  3. View the generated training data set
  4. (Optional: If you generated a training data set with ansible-lint preprocessing and detected ansible-lint rule violations) Resolve ansible-lint rule violations
  5. (Optional: If you generated multiple training data sets) Merge multiple training data sets into a single JSONL file

8.2.4. Installing content parser tool

Install the content parser tool, a command-line interface (CLI) tool, on your computer.

Prerequisites

Ensure that your computer has one of the following supported OS:

  • Python version 3.10 or later.
  • UNIX OS, such as Linux or Mac OS.

    Note

    Installation of the content parser tool on Microsoft Windows OS is not supported.

    Procedure

    1. Create a working directory and set up venv Python virtual environment:

      $ python -m venv ./venv

      $ source ./venv/bin/activate

    2. Install the latest version of content parser tool from the pip repository:

      $ pip install --upgrade pip

      $ pip install --upgrade ansible-content-parser

    3. Perform one of the following tasks:

      • To generate a training data set without ansible-lint preprocessing, go to section Generating a custom model training data set.
      • To generate a training data set with ansible-lint preprocessing, ensure that you have the latest version of ansible-lint installed on your computer:

        1. View the ansible-lint versions that are installed on your computer.

          $ ansible-content-parser --version

          $ ansible-lint --version

          A list of application versions and their dependencies are displayed.

        2. In the output, verify that the version of ansible-lint that was installed with the content parser tool is the same as that of the previously-installed ansible-lint. A mismatch in the installed ansible-lint versions causes inconsistent results from the content parser tool and ansible-lint.

          For example, in the following output, the content parser tool installation includes ansible-lint version 6.20.0 which is a mismatch from previously-installed ansible-lint version 6.13.1:

          $ ansible-content-parser --version
          ansible-content-parser 0.0.1 using ansible-lint:6.20.0 ansible-core:2.15.4
          $ ansible-lint --version
          ansible-lint 6.13.1 using ansible 2.15.4
          A new release of ansible-lint is available: 6.13.1 → 6.20.0
        3. If there is a mismatch in the ansible-lint versions, deactivate and reactivate venv Python virtual environment:

          $ deactivate

          $ source ./venv/bin/activate

        4. Verify that the version of ansible-lint that is installed with the content parser tool is the same as that of the previously-installed ansible-lint:

          $ ansible-content-parser --version

          $ ansible-lint --version

          For example, the following output shows that both ansible-lint installations on your computer are of version 6.20.0:

          $ ansible-content-parser --version
          ansible-content-parser 0.0.1 using ansible-lint:6.20.0 ansible-core:2.15.4
          $ ansible-lint --version
          ansible-lint 6.20.0 using ansible-core:2.15.4
          ansible-compat:4.1.10 ruamel-yaml:0.17.32 ruamel-yaml-clib:0.2.7

8.2.5. Generating a custom model training data set

After installing content parser tool, run it to scan your custom Ansible files and generate a custom model training data set. The training data set includes a list of Ansible files and their paths relative to the project root. You can then upload the training data set to IBM watsonx Code Assistant and create a custom model for your organization. If you used ansible-lint preprocessing and encountered rule violations, you must resolve the rule violations before uploading the training data set to IBM watsonx Code Assistant.

8.2.5.1. Methods of generating a training data set

You can generate a training data set by using one of the following methods:

  • With ansible-lint preprocessing

    By default, the content parser tool generates training data sets by using ansible-lint preprocessing. The content parser tool uses ansible-lint rules to scan your Ansible files and ensure that the content adheres to Ansible best practices. If rule violations are found, the content parser tool excludes these files from the generated output. In such scenarios, you must resolve the rule violations, and run the content parser tool once again so that the generated output includes all your Ansible files.

  • Without ansible-lint preprocessing

    You can generate a training data set without ansible-lint preprocessing. In this method, the content parser tool does not scan your Ansible files for ansible-lint rule violations; therefore, the training data set includes all files. Although the training data set includes all files, it might not adhere to Ansible best practices and could affect the quality of your code recommendation experience.

Prerequisites

  • You must have installed content parser tool on your computer.
  • You must have verified that the version of ansible-lint that is installed with the content parser tool is the same as that of the previously-installed ansible-lint.

Procedure

  1. Run the content parser tool to generate a training data set:

    • With ansible-lint preprocessing: $ ansible-content-parser source output
    • Without ansible-lint preprocessing: $ ansible-content-parser source output -S

      The following table lists the required parameters.

      ParameterDescription

      source

      Specifies the source of the training data set.

      output

      Specifies the output of the training data set.

      -S or --skip-ansible-lint

      Specifies to skip ansible-lint preprocessing while generating the training data set.

    For example: If the source is a Github URL https://github.com/ansible/ansible-tower-samples.git, and the output directory is /tmp/out, the command prompt is as follows:
    $ ansible-content-parser https://github.com/ansible/ansible-tower-samples.git /tmp/out

  2. Optional: To generate a training data set with additional information, specify the following parameters while running the content parser tool.

    ParameterDescription

    --source-license

    Specifies to include the licensing information of the source directory in the training data set.

    --source-description

    Specifies to include the descriptions of the source directory in the training data set.

    --repo-name

    Specifies to include the repository name in the training data set. If you do not specify the repository name, the content parser tool automatically generates it from the source name.

    --repo-url

    Specifies to include the repository URL in the training data set. If you do not specify the repository URL, the content parser tool automatically generates it from the source URL.

    -v or --verbose

    Displays the console logging information.

    Example of a command prompt for Github repository ansible-tower-samples

    $ ansible-content-parser --profile min \
    --source-license undefined \
    --source-description Samples \
    --repo-name ansible-tower-samples \
    --repo-url 'https://github.com/ansible/ansible-tower-samples' \
    git@github.com:ansible/ansible-tower-samples.git /var/tmp/out_dir

    Example of a generated training data set for Github repository ansible-tower-samples

    The training data set is formatted with Jeff Goldblum (jg), a command-line JSON processing tool.

    $ cat out_dir/ftdata.jsonl| jq
    {
    "data_source_description": "Samples",
    "input": "---\n- name: Hello World Sample\n hosts: all\n tasks:\n - name: Hello Message",
    "license": "undefined",
    "module": "debug",
    "output": " debug:\n msg: Hello World!",
    "path": "hello_world.yml",
    "repo_name": "ansible-tower-samples",
    "repo_url": "https://github.com/ansible/ansible-tower-samples"
    }

8.2.6. Viewing the generated training data set

After content parser tool scans your Ansible files, it generates the training data set in an output subdirectory within your local directory. The training data set includes a ftdata.jsonl file, which is the main output of the content parser tool. The file is available in JSON Lines files format, where each line entry represents a JSON object. You must upload this JSONL file to IBM watsonx Code Assistant to create a custom model.

8.2.6.1. Structure of custom model training data set

The following is the file structure of an output subdirectory:

output/
  |-- ftdata.jsonl  # Training dataset 1
  |-- report.txt   # A human-readable report 2
  |
  |-- repository/ 3
  |     |-- (files copied from the source repository)
  |
  |-- metadata/ 4
        |-- (metadata files generated during the execution)

Where:

1
ftdata.jsonl: A training data set file, which is the main output of the content parser tool. The file is available in JSON Lines files format, where each line entry represents a JSON object. You must upload this JSONL file in IBM watsonx Code Assistant to create a custom model.
2
report.txt: A human-readable text file that provides a summary of all content parser tool executions.
3
repository: A directory that contains files from the source repository. Sometimes, ansible-lint updates the directory according to the configured rules, so the file contents of the output directory might differ from the source repository.
4
metadata: A directory that contains multiple metadata files that are generated during each content parser tool execution.
8.2.6.1.1. Using report.txt file to resolve ansible-lint rule violations

The report.txt file, that can be used to resolve ansible-lint rule violations, contains the following information:

  • File counts per type: A list of files according to their file types, such as playbooks, tasks, handlers, and jinja2.
  • List of Ansible files that were identified: A list of files identified by ansible-lint with a file name, a file type, and whether the file was excluded from further processing, or automatically fixed by ansible-lint.
  • List of Ansible modules found in tasks: A list of modules identified by ansible-lint with a module name, a module type, and whether the file was excluded from further processing, or automatically fixed by ansible-lint.
  • Issues found by ansible-lint: A list of issues along with a brief summary of ansible-lint execution results. If ansible-lint encounters files with syntax-check errors in the first execution, then ansible-runs initiates a second execution and excludes the files with errors from the scan. You can use this information to resolve ansible-lint rule violations.

8.2.7. Resolving ansible-lint rule violations

By default, the content parser tool uses ansible-lint rules to scan your Ansible files and ensure that the content adheres to Ansible best practices. If rule violations are found, the content parser tool excludes these files from the generated output. In such scenarios, it is recommended that you fix the files with rule violations before uploading the training data set to IBM watsonx Code Assistant.

By default, ansible-lint applies the rules that are configured in ansible-lint/src/ansiblelint/rules while scanning your Ansible files. For more information about ansible-lint rules, see the Ansible Lint documentation.

8.2.7.1. How does the content parser tool handle rule violations?

  • Using autofixes

    The content parser tool runs ansible-lint with the --fix=all option to perform autofixes, which can fix or simplify fixing issues identified by that rule.

    If ansible-lint identifies rule violations that have an associated autofix, it automatically fixes or simplifies the issues that violate the rules. If ansible-lint identifies rule violations that do not have an associated autofix, it reports these instances as rule violations which you must fix manually. For more information about autofixes, see Autofix in Ansible Lint Documentation.

  • Using syntax-checks

    Ansible-lint also performs syntax checks while scanning your Ansible files. If any syntax-check errors are found, ansible-lint stops processing the files. For more information about syntax-check errors, see syntax-check in Ansible Lint Documentation.

    The content parser tool handles syntax-check rule violations in the following manner:

    • If syntax-check errors are found in the first execution of ansible-lint, content parser tool generates a list of files that contain the rule violations.
    • If one or more syntax-check errors are found in the first execution of ansible-lint, the content parser tool runs ansible-lint again but excludes the files with syntax-check errors. After the scan is completed, the content parser tool generates a list of files that contain rule violations. The list includes all files that caused syntax-check errors as well as other rule violations. The content parser tool excludes files with rule violations in all future scans, and the final training data set does not include data from the excluded files.

Procedure

Use one of the following methods to resolve ansible-lint rule violations:

  • Run the content parser tool with the --no-exclude option If any rule violations, including syntax-check errors, are found, the execution is aborted with an error and no training data set is created.
  • Limit the set of rules that ansible-lint uses to scan your data with the --profile option

    It is recommended that you fix the files with rule violations. However, if you do not want to modify the source files, you can limit the set of rules that ansible-lint uses to scan your data. To limit the set of rules that ansible-lint uses to scan your data, specify the --profile option with a predefined profile (for example, minimum, basic, moderate, safety, shared, or production profiles) or by using ansible-lint configuration files. For more information, see the Ansible Lint documentation.

  • Run the content parser tool by skipping ansible-lint preprocessing You can run the content parser without ansible-lint preprocessing. The content parser tool generates a training data set without scanning for ansible-lint rule violations.

    To run content parser tool without ansible-lint preprocessing, execute the following command:
    $ ansible-content-parser source output -S

    Where:

    • source: Specifies the source of the training data set.
    • output: Specifies the output of the training data set.
    • -S or --skip-ansible-lint: Specifies to skip ansible-lint preprocessing while generating the training data set.

8.2.8. Merging multiple training data sets into a single file

For every execution, the content parser tool creates a training data set JSONL file named ftdata.jsonl that you upload to IBM watsonx Code Assistant for creating a custom model. If the content parser tool runs multiple times, multiple JSONL files are created. IBM watsonx Code Assistant supports a single JSONL file upload only; therefore, if you have multiple JSONL files, you must merge them into a single, concatenated file. You can also merge the multiple JSONL files that are generated in multiple subdirectories within a parent directory into a single file.

Procedure

  1. Using the command prompt, go to the parent directory.
  2. Run the following command to create a single, concatenated file:
    find . -name ftdata.json | xargs cat > concatenated.json
  3. Optional: Rename the concatenated file for easy identification.

You can now upload the merged JSONL file to IBM watsonx Code Assistant and create a custom model.

8.3. Create and deploy a custom model in IBM watsonx Code Assistant

After the content parser tool generates a custom model training data set, upload the JSONL file ftdata.jsonl to IBM watsonx Code Assistant and create a custom model for your organization.

Important

IBM watsonx Code Assistant might take a few hours to create a custom model, depending on the size of your training data set. You must continue monitoring the IBM Tuning Studio for the status of custom model creation.

For information about how to create and deploy a custom model in IBM watsonx Code Assistant, see the IBM watsonx Code Assistant documentation.

8.4. Configuring Red Hat Ansible Lightspeed to use custom models

After you create and deploy a custom model in IBM watsonx Code Assistant, you must configure Red Hat Ansible Lightspeed so that you can use the custom model for your organization.

You can specify one of the following configurations for using the custom model:

  • Enable access for all users in your organization

    You can configure the custom model as the default model for your organization. All users in your organization can use the custom model.

  • Enable access for select Ansible users in your organization

    Using the model-override setting, you can configure the custom model and make it available for select Ansible users only. For example, if you are using Red Hat Ansible Lightspeed both as an organization administrator and an end user, you can test the custom model for select Ansible users before making it available for all users in your organization.

8.4.1. Configuring the custom model for all Ansible users in your organization

You can configure the custom model as the default model for your organization, so that all users in your organization can use the custom model.

Procedure

  1. Log in to the Ansible Lightspeed with IBM watsonx Code Assistant Hybrid Cloud Console as an organization administrator.
  2. Specify the model ID of the custom model:

    1. Click Model Settings.
    2. Under Model ID, click Add Model ID. A screen to enter the Model ID is displayed.
    3. Enter the Model ID of the custom model.
    4. Optional: Click Test model ID to validate the model ID.
    5. Click Save.

8.4.2. Configuring the custom model for select Ansible users in your organization

Using the model-override setting in Ansible Visual Studio (VS) Code, you can configure the custom model and make it available for select Ansible users only. For example, If you are using Red Hat Ansible Lightspeed as both an organization administrator and an end user, you can test the custom model for select Ansible users before making it available for all users in your organization.

Procedure

  1. Log in to the VS Code application using your Red Hat account.
  2. From the Activity bar, click the Extensions icon Extensions .
  3. From the Installed Extensions list, select Ansible.
  4. From the Ansible extension page, click the Settings icon and select Extension Settings.
  5. From the list of settings, select Ansible Lightspeed.
  6. In the Model ID Override field, enter the model ID of the custom model.

    Your settings are automatically saved in VS Code, and you can now use the custom model.