Red Hat Training

A Red Hat training course is available for Red Hat Fuse

Chapter 2. Connect to Amazon S3

An integration can retrieve data from an Amazon S3 bucket or copy data into an Amazon S3 bucket. To do this, you create an Amazon S3 connection and then add that Amazon S3 connection to an integration. For details, see:

2.1. Prerequisites for creating an Amazon S3 connection

To create an Amazon S3 connection, you must know the following:

  • Amazon S3 access key ID that is associated with the Amazon Web Services (AWS) account that created, or will create, the bucket that you want the connection to access.

    You can create a connection that accesses a bucket that does not yet exist. In this case, when the integration starts running then it use the AWS account associated with this access key ID to try to create the bucket.

  • Amazon S3 secret access key that is associated with the AWS account that created or will try to create (when the integration starts running) the bucket that you want the connection to access.
  • Name of the bucket that you want to access or its Amazon Resource Name (ARN).

    If the bucket you specify does not yet exist then the connection tries to create a bucket with the name that you specify. Because S3 allows a bucket to be used as a URL that can be accessed publicly, the bucket name that you specify must be globally unique. Also, it must meet S3 bucket naming requirements.

    If the bucket you specify does not exist in the AWS account that is associated with the Amazon S3 access key ID, but it does exist in another AWS account, then the connection does not create the bucket and an integration that uses this connection cannot start running.

  • Region in which the bucket is located or the region in which you want the connection to create the bucket.

A user with the login credentials for the AWS account that created or will create the bucket obtains the Amazon S3 keys as follows:

  1. Go to https://aws.amazon.com/s3/.
  2. Hover over My Account, select AWS Management Console and sign in to the console with the AWS account that created the bucket that you want to access or with the account that you want the connection to use to create the bucket.
  3. In the console, in the upper right, click the down arrow next to the user name and click My Security Credentials.
  4. Expand Access Keys and click Create New Access Keys.
  5. Follow the prompts to obtain the keys.

2.2. Create Amazon S3 connections

You must create an Amazon S3 connection before you can add an Amazon S3 connection to an integration.

Procedure

  1. In Fuse Online, in the left panel, click Connections to display any available connections.
  2. In the upper right, click Create Connection to display Fuse Online connectors.
  3. Click the Amazon S3 connector.
  4. In the Access Key field, enter the Amazon S3 access key ID, provided by AWS, for the AWS account that created the bucket that you want this connection to access. If the bucket you want the connection to access does not already exist then when Fuse Online tries to start running the integration, it uses the AWS account associated with this access key to create the bucket. However, if the bucket already exists in some other AWS account, then the connection cannot create the bucket and the integration cannot start.
  5. In the Bucket Name or Amazon Resource Name field, enter the name of the bucket that you want this connection to access or enter the bucket’s ARN. If the bucket does not already exist in either the AWS account being used or in any other AWS account, then the connection creates it.
  6. In the Region field, select the AWS region in which the bucket resides. If the connection creates the bucket, then it creates it in the selected region.
  7. In the Secret Key field, enter the Amazon S3 secret access key, provided by AWS, for the account that created, or will create, the bucket that you want this connection to access.
  8. Click Validate. Fuse Online immediately tries to validate the connection and displays a message that indicates whether or not validation is successful. If validation fails, revise the configuration details as needed and try again.
  9. When validation is successful, click Next.
  10. In the Connection Name field, enter your choice of a name that helps you distinguish this connection from any other connections. For example, enter Obtain S3 Data.
  11. In the Description field, optionally enter any information that is helpful to know about this connection. For example, enter Sample S3 connection that obtains data from the northeast bucket.
  12. In the upper right, click Create to see that the connection you created is now available. If you entered the example name, you would see that Obtain S3 Data appears as a connection that you can choose to add to an integration.

2.3. Start an integration by obtaining data from Amazon S3

To start an integration by obtaining data from an Amazon S3 bucket, add an Amazon S3 connection as the start connection in an integration.

Prerequisite

You created an Amazon S3 connection.

Procedure

  1. In the Fuse Online panel on the left, click Integrations.
  2. Click Create Integration.
  3. On the Choose a Start Connection page, click the Amazon S3 connection that you want to use to start the integration.
  4. On the Choose an Action page, click the action that you want the connection to perform:

    • Get Object obtains a file from the bucket that the connection accesses. In the File Name field, enter the name of the file that you want to obtain. If the specified file is not in the bucket, it is a runtime error.
    • Poll an Amazon S3 Bucket periodically obtains files from the bucket that the connection accesses. To configure this action:

      1. In the Delay field, accept the default of 500 milliseconds as the time that elapses between polls. Or, to specify a different polling interval, enter a number and select its time unit.
      2. In the Maximum Objects to Retrieve field, enter the largest number of files that one poll operation can obtain. The default is 10.

        To have no limit on the number of files that can be obtained, specify 0 or a negative integer. When Maximum Objects to Retrieve is unlimited, the poll action obtains all files in the bucket.

        If the bucket contains more than the specified maximum number of files then the action obtains the files that were most recently modified or created.

      3. In the Prefix field, optionally specify a regular expression that evaluates to a string. If you specify a prefix then this action retrieves a file only when its name starts with that string.
      4. Indicate whether you want to Obtain files and then delete them from the bucket.
  5. After you configure the action, click Done to specify the action’s output type.
  6. In the Select Type field, if the data type does not need to be known, accept Type specification not required and then, at the bottom, click Done. You do not need to follow the rest of these instructions.

    Otherwise, select one of the following as the schema type:

    • JSON schema is a document that describes the structure of JSON data. The document’s media type is application/schema+json.
    • JSON instance is a document that contains JSON data. The document’s media type is application/json.
    • XML schema is a document that describes the structure of XML data. The document’s file extension is .xsd.
    • XML instance is a document that contains XML data. The document’s file extension is .xml.
  7. In the Definition input box, paste a definition that conforms to the schema type you selected. For example, if you select JSON schema then you would paste the content of a JSON schema file, which has a media type of application/schema+json.
  8. In the Data Type Name field, enter a name that you choose for the data type. For example, suppose you are specifying a JSON schema for vendors. You can specify Vendor as the data type name.

    You will see this data type name when you are creating or editing an integration that uses the connection for which you are specifying this type. Fuse Online displays the type name in the integration visualization panel and in the data mapper.

  9. In the Data Type Description field, provide information that helps you distinguish this type. This description appears in the data mapper when you hover over the step that processes this type.
  10. Click Done.

The connection appears in the integration flow in the location where you added it.

2.4. Finish an integration by adding data to Amazon S3

To finish an integration by copying data to Amazon S3, add an Amazon S3 connection as the finish connection in an integration.

Prerequisite

You created an Amazon S3 connection.

Procedure

  1. Start creating the integration.
  2. Add and configure the start connection.
  3. On the Choose a Finish Connection page, click the Amazon S3 connection that you want to use to finish the integration.
  4. Click the action that you want the connection to perform:

    1. Copy Object adds one or more objects to the bucket.

      To add one file to the bucket, you can enter its name in the File Name field.

      To add multiple files to the bucket, do not specify a file name. In this case, the action adds all objects that it obtains from the previous integration step(s).

      If you used the poll action to obtain multiple files and you specify a file name then the Copy Object action adds only the last file that was received from the poll action.

    2. Delete Object deletes an object from the bucket. In the File Name field, specify the name of the object that you want to delete. If the specified file is not in the bucket, the integration continues with no error.
  5. After you configure the chosen action, click Next to specify the action’s input type.
  6. In the Select Type field, if the data type does not need to be known, accept Type specification not required and then, at the bottom, click Done. You do not need to follow the rest of these instructions.

    Otherwise, select one of the following as the schema type:

    • JSON schema is a document that describes the structure of JSON data. The document’s media type is application/schema+json.
    • JSON instance is a document that contains JSON data. The document’s media type is application/json.
    • XML schema is a document that describes the structure of XML data. The document’s file extension is .xsd.
    • XML instance is a document that contains XML data. The document’s file extension is .xml.
  7. In the Definition input box, paste a definition that conforms to the schema type you selected. For example, if you select JSON schema then you would paste the content of a JSON schema file, which has a media type of application/schema+json.
  8. In the Data Type Name field, enter a name that you choose for the data type. For example, suppose you are specifying a JSON schema for vendors. You can specify Vendor as the data type name.

    You will see this data type name when you are creating or editing an integration that uses the connection for which you are specifying this type. Fuse Online displays the type name in the integration visualization panel and in the data mapper.

  9. In the Data Type Description field, provide information that helps you distinguish this type. This description appears in the data mapper when you hover over the step that processes this type.
  10. Click Done.

The connection appears in the integration flow in the location where you added it.

2.5. Add data to Amazon S3 in the middle of an integration

In the middle of an integration, to add data to Amazon S3, add an Amazon S3 connection between the start and finish connections of an integration.

Prerequisite

You created an Amazon S3 connection.

Procedure

  1. Add the start and finish connections.
  2. In the left panel, hover over the plus sign that is in the location where you want to add the Amazon S3 connection.
  3. In the pop-up, click Add a Connection.
  4. Click the Amazon S3 connection that you want to use as a middle connection in the integration.
  5. Click the action that you want the connection to perform:

    1. Copy Object adds one or more objects to the bucket.

      To add one file to the bucket, you can enter its name in the File Name field.

      To add multiple files to the bucket, do not specify a file name. In this case, the action adds all objects that it obtains from the previous integration step(s).

      If you used the poll action to obtain multiple files and you specify a file name then the Copy Object action adds only the last file that was received from the poll action.

    2. Delete Object deletes an object from the bucket. In the File Name field, specify the name of the object that you want to delete. If the specified file is not in the bucket, the integration continues with no error.
  6. After you configure the chosen action, click Next to specify the action’s input type.
  7. In the Select Type field, if the data type does not need to be known, accept Type specification not required and then, at the bottom, click Done. You do not need to follow the rest of these instructions.

    Otherwise, select one of the following as the schema type:

    • JSON schema is a document that describes the structure of JSON data. The document’s media type is application/schema+json.
    • JSON instance is a document that contains JSON data. The document’s media type is application/json.
    • XML schema is a document that describes the structure of XML data. The document’s file extension is .xsd.
    • XML instance is a document that contains XML data. The document’s file extension is .xml.
  8. In the Definition input box, paste a definition that conforms to the schema type you selected. For example, if you select JSON schema then you would paste the content of a JSON schema file, which has a media type of application/schema+json.
  9. In the Data Type Name field, enter a name that you choose for the data type. For example, suppose you are specifying a JSON schema for vendors. You can specify Vendor as the data type name.

    You will see this data type name when you are creating or editing an integration that uses the connection for which you are specifying this type. Fuse Online displays the type name in the integration visualization panel and in the data mapper.

  10. In the Data Type Description field, provide information that helps you distinguish this type. This description appears in the data mapper when you hover over the step that processes this type.
  11. Click Done.

The connection appears in the integration flow in the location where you added it.