Chapter 4. Creating a virtual database as a Maven artifact

You can use a Teiid Maven plugin to convert a DDL file into a Maven artifact. You define the structure of the virtual database in a DDL file and use the file to generate an artifact to deploy to a Maven repository. The Data Virtualization Operator can then deploy the artifact from the Maven repository to an OpenShift project.

This is an advanced method that provides a high level of flexibility and is suitable for complex projects. Using this method, you can create multi-module Maven projects in which you import one or more other virtual databases and incorporate them into your design.

You specify use of the Teiid plugin in your pom.xml file. You can also define other Maven dependencies in the pom.xml file. When you run the build, the plugin reads the file and resolves its contents.

Advantages of creating a virtual database as a Maven artifact
  • Flexible, clean separation between the DDL code that represents the virtual database and other configuration settings.
  • Enables easy deployment into multiple environments.
  • Provides for versioning at the virtual database level.
  • Enables importing of one virtual database into another, by adding IMPORT DATABASE statements to the DDL.
  • Enables a virtual database to be shared across projects and teams in a consistent way.
  • Supports continuous integration and continuous delivery (CI/CD) workflows.
Disadvantages of creating a virtual database as a Maven artifact
  • Requires a working knowledge of Maven.

Prerequisites

  • You have a compatible data source and the OpenShift cluster can access it.
  • You know how to create a pom.xml file to specify the dependencies that are required to build your virtual database.
  • You have information about the connection settings for your data sources, including login credentials.
  • The Data Virtualization Operator has access to the Maven repositories that contain build dependencies for the virtual database.
  • You have Maven 3.2 or later installed.

Procedure

  1. From a text editor, create a POM file to define build dependencies. For example,

    Example: POM file for building a Maven-based virtual database

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
      <modelVersion>4.0.0</modelVersion>
      <groupId>org.teiid</groupId>
      <artifactId>dv-customer</artifactId>
      <name>dv-customer</name>
      <description>Demo project to showcase maven based vdb</description>
      <packaging>vdb</packaging>
      <version>1.0</version>
    
      <build>
        <plugins>
          <plugin>
            <groupId>org.teiid</groupId>
            <artifactId>vdb-plugin</artifactId> 1
            <version>1.2.0</version>
            <extensions>true</extensions>
            <executions>
              <execution>
                <goals>
                  <goal>vdb</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
        </plugins>
      </build>
    </project>

    1
    For virtual databases that import from other virtual databases, supply a unique artifactId name for each virtual database that you want to import.

    The preceding example can serve as a model for the pom.xml for your virtual database.

  2. Create a Maven project to import the virtual database definition from a DDL file. For example:

    vdb-project
    ├── pom.xml
    └── src
        └── main
            └── vdb
                └── vdb.ddl
  3. If you do not already have one, create a DDL file to specify the structure of the virtual database, and save it to the /src/main/vdb directory of your project. Maven-based virtual databases use the same DDL structure as other virtual databases. The DDL file for a Maven-based virtual database must have the name vdb.ddl.

    The following example shows a sample DDL file for a virtual database that uses a postgreSQL data source:

    Example: vdb.ddl

    CREATE DATABASE customer OPTIONS (ANNOTATION 'Customer VDB');
    USE DATABASE customer;
    
    CREATE FOREIGN DATA WRAPPER postgresql;
    CREATE SERVER sampledb TYPE 'NONE' FOREIGN DATA WRAPPER postgresql;
    
    CREATE SCHEMA accounts SERVER sampledb;
    CREATE VIRTUAL SCHEMA portfolio;
    
    SET SCHEMA accounts;
    IMPORT FOREIGN SCHEMA public FROM SERVER sampledb INTO accounts OPTIONS("importer.useFullSchemaName" 'false');
    
    SET SCHEMA portfolio;
    
    CREATE VIEW CustomerZip(id bigint PRIMARY KEY, name string, ssn string, zip string) AS
       SELECT c.ID as id, c.NAME as name, c.SSN as ssn, a.ZIP as zip
       FROM accounts.CUSTOMER c LEFT OUTER JOIN accounts.ADDRESS a
       ON c.ID = a.CUSTOMER_ID;

Additional resources

  • For information about how to use DDL to define a virtual database, see DDL metadata for schema objects in the Data Virtualization Reference. Defining the complete DDL is beyond the scope of this document.
  • For more information about importing an existing virtual database into the current one, see Section 4.4, “Virtual database import”.

4.1. Building a virtual database artifact

After you have all components of your Maven-based virtual database project, you can build the artifact and deploy it to your Maven repository.

Prerequisites

  • You set up your virtual database Maven project.
  • You have a DDL file, saved as vdb.ddl, that describes the virtual database that you want to build.
  • You have a pom.xml file that defines the dependencies for building the virtual database.

Procedure

  1. Open a terminal window to the root folder of your Maven project, and type the following command:

    mvn deploy

    The command builds the virtual database and deploys it to a local or remote Maven repository. The Maven repository can be public or private. The command generates a PROJECT_NAME-VERSION.vdb file in your target repository.

After the virtual database artifact is available in a Maven repository, you can use a YAML-based custom resource to deploy the virtual database to OpenShift. For information about using YAML to create a custom resource for deploying virtual database Maven artifacts, see Section 4.2, “Creating a custom resource (CR) to deploy a Maven artifact”.

For information about using the Data Virtualization Operator to deploy a virtual database, see Chapter 6, Running the Data Virtualization Operator to deploy a virtual database.

4.2. Creating a custom resource (CR) to deploy a Maven artifact

Before you can deploy a virtualization that you create as a Maven artifact, you must create a CR that defines the location of the Maven repository. When you are ready to deploy the virtualization, you provide this CR to the Data Virtualization Operator.

Prerequisites

  • You created a virtualization according to the instructions in Chapter 4, Creating a virtual database as a Maven artifact.
  • You deployed the virtualization to a Maven repository that the Data Virtualization Operator can access.
  • You have the login credentials to access the data source.
  • You are familiar with the creation of custom resource files in YAML format.

Procedure

  1. Open a text editor, create a file with the name of the virtualization, and save it with the extension .yaml, for example, dv-customer.yaml.
  2. Add information to define the custom resource kind, name, and source. The following annotated example provides guidance on the contents to include in the CR:

    dv-customer.yaml

    apiVersion: teiid.io/v1alpha1
    kind: VirtualDatabase
    metadata:
      name: dv-customer
    spec:
      replicas: 1
      datasources:
        - name: sampledb
          type: postgresql
          properties:
            - name: username 1
              value: user
            - name: password
              value: mypassword
            - name: jdbc-url
              value: jdbc:postgresql://sampledb/sampledb  2
      resources:
        memory: 1024Mi
        cpu: 2.0
      build:
        source:
          maven: com.example:customer-vdb:1.0.0:vdb      3
        mavenRepositories:
          central: https://repo.maven.apache.org/maven2  4

    1
    Specifies the credentials for signing in to the data source. Although this example shows credentials that are defined within the CR, in production use, use secrets to specify credentials, rather than exposing them in plain text. For information about adding credentials to secrets, see Section 7.2, “Using secrets to store data source credentials”.
    2
    Specifies the URL for connecting to the data source.
    3
    Specifies the Maven location of the virtual database by providing the groupId, artifactId, and version (GAV) coordinates.
    4
    If you are using a private Maven repository, specify its URL. You can configure multiple repositories.

After you create the CR YAML file, you can run the Data Virtualization Operator to deploy the virtual database to OpenShift.

Run the Data Virtualization Operator with the CR to generate the virtual database and deploy it to OpenShift.

4.3. Private Maven repositories

When you run the Data Virtualization Operator to build a virtual database, the Operator initiates a Maven build. The Maven build converts the DDL in the custom resource that you provide into a container image that can be deployed to OpenShift. If the Operator requires additional packages to complete the build process (for example, JDBC drivers, client libraries, and other dependency libraries), it retrieves these build dependencies from a Maven repository. By default, the Operator retrieves build dependencies from the public Maven Central repository.

In some environments the network configuration does not permit direct access to the internet, preventing the Operator from connecting to the Maven Central repository. To enable the Operator to build virtual databases when it cannot connect to the Maven Central repository, you can configure the use of a local, private Maven repository.

When you run the Operator to build a virtual database, it searches for a secret or ConfigMap object with a specific name (either teiid-maven-settings or VDB_NAME-maven-settings). If it finds a matching object, the Operator uses the Maven repositories specified in the settings.xml section of the object to resolve any dependencies. As long as the settings.xml key in the named object is correctly specified, no other configuration is required.

Methods for specifying private Maven repositories for virtual database builds

You can use several different methods to specify the local private Maven repository that the Operator uses to retrieve dependencies. The method that you choose depends the following factors:

  • Whether the private Maven repository requires authentication
  • Whether you want to use a single repository to build all of your virtual database.

The following table lists the methods that are available.

Table 4.1. Methods for specifying a private Maven repository

MethodDescriptionLimitations

Specify a global repository.

Applies to all virtual database in a namespace.

List repositories in a settings.xml key that you add to an OpenShift secret or ConfigMap.

Secret or ConfigMap must use the name teiid-maven-settings

Specify unique repositories for individual virtual databases.

Applies to a single virtual database.

Lists repositories in a settings.xml key that you add to an OpenShift secret or ConfigMap.

The secret or ConfigMap must mirror the name of the virtual database in the format VDB_NAME-maven-settings.xml

Specify repositories in the custom resource.

Lists repositories in the custom resource that you use to deploy the virtual database.

The Maven repository cannot require authentication.

4.3.1. Specifying private Maven repositories to build all virtual databases in a namespace

You can provide a full settings.xml file that specifies a single common Maven repository for the Operator to use in building any virtual database. Use this method if your Maven repository requires authentication, or if you have multiple repositories and you want to use a specific one for all your virtual database builds.

When the Operator builds the image for your virtual database, it checks for a ConfigMap named teiid-maven-settings. If it finds it, it then uses the settings.xml file in the ConfigMap for the build.

Note

You an override the use of a common repository by specifying a settings.xml to apply to a particular virtual database build. For more information, see Section 4.3.2, “Specifying private Maven repositories for building an individual virtual database”.

Procedure

  1. Create a ConfigMap or secret and assign the following value to the metadata/name key:

    teiid-maven-settings

  2. Add a key to the ConfigMap or secret and assign the key the name settings.xml. The value of the key contains a full Maven settings file. The following example provides an excerpt that shows how to include a settings.xml file in a ConfigMap:

    Sample ConfigMap

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: teiid-maven-settings
      namespace: myproject
    data:
      settings.xml: |-
        <?xml version="1.0" encoding="UTF-8"?>
        <settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
        <localRepository />
        <profiles>
            <profile>
                <id>maven-settings</id>
                <activation>
                    <activeByDefault>true</activeByDefault>
                </activation>
                <repositories>
                    <repository>
                    <id>private</id>
                    <url>https://myprivate.host.com/maven2</url>
                    <snapshots>
                        <enabled>false</enabled>
                        <checksumPolicy>fail</checksumPolicy>
                    </snapshots>
                    <releases>
                        <enabled>false</enabled>
                        <checksumPolicy>fail</checksumPolicy>
                    </releases>
                    </repository>
                </repositories>
            </profile>
        </profiles>
        <activeProfiles>
          <activeProfile>maven-settings</activeProfile>
        </activeProfiles>
        </settings>

    Note

    The preceding example does not represent a working settings.xml file. You can use the example as the basis for your own ConfigMap, but you must provide details that are are specific to your Maven repository.

  3. Save the ConfigMap YAML file as maven-settings.yaml.
  4. Deploy the ConfigMap to OpenShift by typing following command:

    oc create -f maven-settings.yaml

4.3.2. Specifying private Maven repositories for building an individual virtual database

In some cases, you might want the Data Virtualization Operator to use a specific Maven repository when it builds a particular virtual database. You can specify the Maven repository to use in a settings.xml file that you add to a ConfigMap or secret that applies only to a virtual database with a specific name.

The configuration that you add for a specific virtual database build takes precedence over any global configuration that you set for the namespace.

The repositories that you add designate for use in building an individual virtual database cannot require authentication.

Procedure

  1. Create a ConfigMap or secret and assign the metadata/name key a value that matches the name of the virtual database, using the format VDB_NAME-maven-settings.xml. For example,
metadata:
  name: dv-customer-maven-settings
  1. Add a key to the ConfigMap or secret and assign the key the name settings.xml. The value of the key contains a full Maven settings file.
  2. Save the ConfigMap YAML file as maven-settings.yaml.
  3. Deploy the ConfigMap to OpenShift by typing following command:

    oc create -f maven-settings.yaml

4.3.3. Specifying the private Maven repositories for building an individual virtual database in its custom resource

You can specify the private Maven repositories to use in building an individual virtual database in the custom resource for the virtual database. The method of specifying Maven repositories in a custom resource applies only to repositories that do not require authentication.

The repositories that you add to the CR cannot require authentication.

When you run the Data Virtualization Operator to build a virtual database, it uses the settings in the CR. For example,

dv-customer.yaml

apiVersion: teiid.io/v1alpha1
kind: VirtualDatabase
metadata:
  name: dv-customer
spec:
  replicas: 1
  build:
    source:
      ddl: |
        CREATE DATABASE customer OPTIONS (ANNOTATION 'Customer VDB');
        USE DATABASE customer;
        ...
      mavenRepositories: 1
        private: https://myprivate.host.com/maven2
        private2: https://myprivate.host2.com/maven2

1
Define one or more Maven private repositories to be used with the build.

If you specify Maven repositories in the CR, the Operator uses them in addition to the repositories specified in the settings.xml file.

4.4. Virtual database import

You can use a Maven project to build a special type of virtual database that imports from one or more existing virtual databases. The project structure and build process for these importing virtual databases is the same as for other virtual databases that you create as Maven artifacts.

Virtual database import can be useful in organizations in which departments control access to their own data sources, but must share subsets of their original data with other teams. The data owners might not want to prevent outside groups from viewing details about the physical structure of the data. By setting up a virtual database, the owning group can establish an abstraction layer that exposes only targeted data without directly exposing your data sources.

Consuming groups typically have their own data stores that they use for data received from other groups. Instead of accessing the source data directly, they can create their own virtual database layer to import the data. Through this second abstraction layer, the consuming group can access the data in a way that insulates them from changes to the physical data store schema. When the group that owns the data updates the base virtual databases, changes can be easily pushed to the consuming groups virtual databases.

For example, suppose that a Sales team has the following two postgreSQL databases:

  • Accounts Database (AccountsDB)
  • Sales Database (SalesDB)

Should members of the Operations team require data from the Sales databases, the Sales team can create a virtual database to expose the source data. The Operations team can then consume the Sales data by creating their own virtual database, as in the following example:

Example: Operations team virtual database that imports data from two virtual databases created by the Sales team

CREATE DATABASE OperationsDB;
USE DATABASE OperationsDB;

IMPORT DATABASE AccountsDB VERSION 1.0;
IMPORT DATABASE SalesDB VERSION 1.0;

In the preceding example, the DDL defines the database OperationsDB, which then imports metadata from two source databases, AccountsDB and SalesDB. Users who connect to OperationsDB do not know anything about the two virtual databases that supply the data, but they have full access to the data that they expose. If future changes occur in the source databases, the Operations team can rebuild and redeploy a new version of the OperationsDB database to make the changes available to its users.

4.4.1. How virtual database importing works

A virtual database developer can deploy a virtual database to a Maven repository, where it is assigned an identifier, version number, and a defined location. The resulting Maven artifact can then be defined as a dependency to the build process of a second virtual database. When this second virtual database is built, the build reads the contents of the original virtual database from the Maven repository and incorporates it into current new virtual database.

For example, imagine that a Sales team developer creates the virtual databases AccountsDB and SalesDB, and then deploys them to a Maven repository. When the Operations team developer creates a Maven project for a secondary virtual database, OperationsDB, the project defines the AccountsDB and SalesDB databases as dependencies. The new project can then use an IMPORT statement to extract content from the two original virtual databases. After the developer deploys the OperationsDB database, users can connect to it to access data from both of its source databases.

Because the OperationsDB database is also available as a Maven artifact, it can be used as a data source by other virtual databases.

To deploy the OperationsDB virtual database, you must supply the Maven coordinates of the OperationsDB to the Data Virtualization Operator. During the build, the Operator retrieves the contents from the Maven repository, including the dependency information, and deploys the virtual database to OpenShift.

The Maven build process enables build tasks to be automated as part of an automated CI/CD workflow in which you can configure Maven to rebuild virtual databases automatically with no user intervention after changes occur in the sources.

4.4.2. POM and DDL for virtual databases that import from other virtual databases

You use a Maven build process to create a virtual database that import from other virtual databases.

The process and the project structure for developing virtual databases that use importing is the same as the process for creating any virtual database as a Maven artifact. You use the standard Maven project structure for your virtual database. Resources in the project must be added to the expected locations in the project structure. If you want to include additional metadata files with the virtual database, add them within this structure.

For more information about the structure for creating virtual databases as Maven artifacts, see Chapter 4, Creating a virtual database as a Maven artifact.

POM file

As with any virtual database that you create as a Maven artifact, the project for an importing virtual database must include a POM file. For an example of a POM file for building a virtual database as a Maven artifact, see Chapter 4, Creating a virtual database as a Maven artifact.

Along with the standard entries, the POM for an importing virtual database must list the following entries:

  • The artifactId for each virtual databases to import from.
  • Dependency definitions for each imported virtual database.

For example, if you want to import the virtual database accountsdb into the secondary virtual database OperationsDB, the POM file for the OperationsDB database must include a dependency entry for the accountsdb database as in the following example:

Example: Entry in the POM file for the OperationsDB database, listing the accountsdb virtual database as a dependency

<dependencies>
  <dependency>
    <groupId>org.example</groupId>
    <artifactId>accountsdb</artifactId>
    <version>1.0</version>
  </dependency>
</dependencies>

If you want to import multiple virtual databases, the <dependencies> section of the POM must include separate <dependency> entries for each virtual database. The <dependencies> block is contained within the <projects section of the POM.

DDL

The DDL file for a Maven-based virtual database must have the name vdb.ddl. Maven-based virtual databases use the same DDL structure as other virtual databases.

The DDL for the importing virtual database must contain an IMPORT statement similar to the following statement:

IMPORT DATABASE IMPORTED_VIRTUAL_DATABASE_ VERSION VERSION_NUMBER__;

For example, if you want to import the virtual database accountsdb into a virtual database with the name OperationsDB, you would include the following entry in the DDL for the OperationsDB database:

IMPORT DATABASE AccountsDB VERSION 1.0;

The Maven build reads the contents of the vdb.ddl file for the AccountsDB and merges it into the virtual database that is being built. The result is a virtual database comprised of multiple virtual databases.

After you configure the source files for the Maven build, you can run the build to deploy the Maven artifact to a repository. The process for building the virtual database is the same as with any Maven-based virtual database. For more information, see Section 4.1, “Building a virtual database artifact”

4.4.3. DDL limitations

You should avoid the use of some statements in the DDL for a virtual database.

Foreign schema import

For example:

IMPORT FOREIGN SCHEMA public FROM SERVER sampledb INTO accounts;

The IMPORT FOREIGN SCHEMA statement is an expensive operation, that queries the underlying physical data source every time a pod restarts. Introducing this query places a strain on the underlying physical data source, increasing the time that it takes for the pod to start. The problem is magnified if you have multiple pods trying to access the data source at once.

Another problem with importing the foreign schema from another virtual database is that images deployed to OpenShift are assumed to be in an immutable state. That is, no matter how many times the image is stopped and started, their behavior should persist. However, if you define a SQL IMPORT SCHEMA operation in the DDL, the virtual database loads the schema from the source virtual database every time that the image starts up. As a result, the image contents can be modified, which runs counter to the principle of immutability for this architecture.

If you can guarantee that the underlying data source always returns the same metadata, there is no problem. Problems arise if the data source returns different metadata when the image starts.

To ensure that the contents of an image remain stable, it is best to define all metadata explicitly in the virtual database DDL, including tables, procedures, and any functions that data source represents.

ALTER statements to modify metadata

Do not use ALTER TABLE statements in the DDL for a virtual database if the DDL includes either of the following statements:

  • IMPORT FOREIGN SCHEMA
  • IMPORT DATABASE

ALTER TABLE statements are intended to modify the structure of a table by adding, removing, or modifying columns. However, in the case of imported virtual databases, the actual structure of the imported tables is not available at build time. From the perspective of the virtual database, the runtime metadata does not yet exist to be modified.

When you include an ALTER TABLE statement in the DDL with an IMPORT statement, it attempts to change a table or column that does not exist in the the virtual database metadata. A deployment failure can result, or if the virtual database is deployed, its contents might not include the expected data.