Chapter 16. Connecting to Apache Kudu

Apache Kudu is a columnar storage manager developed for the Apache Hadoop platform. A Fuse Online integration can connect to a Kudu data store to scan a table, which returns all records in the table to the integration, or to insert records into a table. Details are in the following topics:

16.1. Creating a connection to an Apache Kudu data store

In an integration, to obtain records from or insert records into a Kudu table, create a connection to a Kudu master host and then add that connection to an integration.

Prerequisite

  • You must know the IP address or the hostname for the Kudu master host that you want to connect to.
  • You must know the port that Kudu is listening on.

Procedure

  1. In Fuse Online, in the left panel, click Connections to display any available connections.
  2. Click Create Connection to display connectors.
  3. Click the Apache Kudu connector.
  4. To configure the connection:

    1. In the Address of Kudu master host field, enter the hostname or the IP address of the Kudu master host.
    2. In the Port to establish connection to field, enter the port that Kudu is listening on. The default is 7051.
  5. Click Next.
  6. In the Name field, enter your choice of a name that helps you distinguish this connection from any other connections. For example, you might enter Kudu North.
  7. In the Description field, optionally enter any information that is helpful to know about this connection.
  8. Click Save to see that the connection you created is now available. If you entered the example name, you would see that Kudu North appears as a connection that you can choose to add to an integration.

16.2. Triggering an integration when scanning returns records from a Kudu table

To trigger execution of an integration upon obtaining data from a Kudu table, add a Kudu connection to a simple integration as its start connection. When the integration is running, the Kudu connection scans the table that you specified at the interval that you specified, obtains all records in the table, and passes a collection of the records to the next step in the integration.

A Kudu connection can obtain data from only one table. Between scans, if there are no changes to the data in the table that the connection scans, then the next scan returns the same data as the previous scan.

Prerequisite

  • You created a Kudu connection.
  • The table that you want to obtain records from exists.

Procedure

  1. In the Fuse Online panel on the left, click Integrations.
  2. Click Create Integration.
  3. On the Choose a connection page, click the Kudu connection that you want to use to start the integration.
  4. On the Choose an action page, select the Scan a Kudu table action.
  5. In the Table field, enter the name of the table that you want to obtain records from.
  6. In the Period field, accept the default of one minute, or enter the interval at which you want the connection to scan the table and return the records that are in the table.
  7. Click Next.

Result

The connection is the simple integration’s start connection.

Next steps

Add the integration’s finish connection and any other connections that you want to include in the integration. When the integration contains all the connections that are needed, consider whether you need to split the collection of records that the Kudu connection returns. If you want to execute integration steps for each record that you obtained from the Kudu table, then after the Kudu connection, add a split step. Also, you probably need to follow the Kudu connection with a data mapping step that maps data obtained from Kudu to fields in subsequent connections in the integration.

16.3. Inserting records into a Kudu table

In an integration, you can add records to a Kudu table in the middle of a flow or to finish a simple integration. To do this, add a Kudu connection to the middle of a flow or as a simple integration’s finish connection.

Prerequisites

  • You created a Kudu connection.
  • You are creating or editing an integration and Fuse Online is prompting you to add to the integration. Or, Fuse Online is prompting you to choose a finish connection.
  • The table that you want to add records to exists.

Procedure

  1. On the Add to Integration page, click the plus sign where you want to add the connection. Skip this step if Fuse Online is prompting you to choose the finish connection.
  2. Click the Kudu connection that you want to use. Note that when a Kudu connection inserts data, the connection does not return anything.
  3. On the Choose an action page, select Insert data into a Kudu table.
  4. To configure the action, in the Table field, specify the name of the table to add records to.

    It is important for you to have an understanding of how the Kudu table that you are adding records to is set up. For example, a Kudu table that you are adding records to might have a unique key. If you try to add a record that contains a key value that is already in the table, the Kudu connection does not add that record.

  5. Click Next.

Result

The connection appears in the integration visualization where you added it.

Next steps

Consider whether you need to split a collection of records into individual records that a Kudu connection can add to a table. To do this, add a split step to the integration. The split step executes the subsequent steps in the integration once for each record. Also, you probably need a data mapper step before a Kudu connection that adds records to a table.