Chapter 6. Building the virtual database

We now have a working postgreSQL database with some sample data in it. The next step is to define a virtual database that can read and write from the postgreSQL data source. We’ll also configure the REST data source that we mentioned earlier, an online live stock quote service.

The design of a virtual database is defined in a custom resource (CR) file. The CR is written as a YAML file, which contains a section of embedded DDL. After the CR YAML file is ready, you provide it as input to the Data Virtualization Operator, and the Operator deploys a virtual database based on the specification.

This tutorial uses a sample CR file, portfolio.yaml, that is available from the Teiid OpenShift examples repository that you cloned earlier. If you want to get started right away, you can use the sample file as-is, and run the Operator. Just skip to Chapter 7, Deploying the virtual database to try it out.

Afterwards you can return here to review the next few sections to learn about how the CR is constructed.

6.1. Virtual database custom resources

To specify the provisioning instructions for a virtual database, you create a custom resource (CR) file that describes how to build the virtual database. The CR is a YAML file that defines the following elements of the virtual database:

  • The configuration of the data sources.
  • The SQL DDL specifies the structure of the database schema.

6.1.1. Data source configuration

For this tutorial the CR defines the following data sources:

accountdb
Represents the postgreSQL database that you created earlier.
quotesvc
Represents a REST API for retrieving stock quotes from the online quote service at https://finnhub.io/api/v1/
Note

You must use lowercase letters to specify data source names.

The following CR excerpt specifies the data sources for the virtual database:

apiVersion: teiid.io/v1alpha1
kind: VirtualDatabase
metadata:
  name: portfolio 1
spec:
  replicas: 1
  datasources:
    - name: accountdb 2
      type: postgresql
      properties:
        - name: username
          value: user
        - name: password
          value: changeit
        - name: jdbc-url
          value: jdbc:postgresql://accounts/accounts
    - name: quotesvc 3
      type: rest
      properties:
        - name: endpoint
          value: https://finnhub.io/api/v1/
1
Defines the resource name for the virtual database when it is deployed to OpenShift.
2
Represents the postgreSQL database and assigns to it the name accountdb.
3
Represents a REST connection to the online stock quote service at https://finnhub.io/api/v1/, and assigns to it the name quotesvc.
Note

The stock quote service that serves as a data source in this tutorial is intended to help demonstrate how virtual databases can integrate data from REST services. The service is not a component of Red Hat Integration data virtualization and is not affiliated with Red Hat in any way.

To use the stock quote service, visit the web site to request a free API key. After you obtain your key, add its value to the portfolio.yaml to the CR file that you obtain from the Teiid OpenShift Examples repository. Add your API key in place of the STOCK_QUOTE_API_KEY variable in the file.

6.1.2. DDL for defining the virtual database

We embed SQL DDL in our CR to define the structure of the virtual database. For example, the DDL specifies the following properties:

  • The translators to use for standardizing the format of the source data.
  • The source schema to use for representing data elements in the data sources.
  • The metadata (tables, procedures, and so forth) that we want to import from the data source.
  • The virtual schema to use to map elements in the data sources to elements in the virtual database.

The next few sections provide more detail about how and where to define these properties in the DDL. After reviewing the individual sections of the file we’ll look at how they are assembled in the final CR.

After the DDL is complete, we insert it into a section of the CR, as shown in the following example:

spec:
  build:
    source:
      ddl: |
       PLACEHOLDER_FOR_VIRTUAL_DATABASE_DDL

6.1.2.1. Virtual database creation

The CREATE DATABASE and USE DATABASE statements specify the name of the virtual database and indicate that subsequent SQL operations in the DDL apply to this database.

The DDL in our CR uses the following statements to indicate that we’ll create and use a database called Portfolio for our virtual database.

CREATE DATABASE Portfolio OPTIONS (ANNOTATION 'The Portfolio VDB');
USE DATABASE Portfolio;

In this case, the CREATE DATABASE statement assigns a database name (Portfolio) that matches the name portfolio that we assigned to the metadata:name property in the CR:

  apiVersion: teiid.io/v1alpha1
  kind: VirtualDatabase
  metadata:
    name: portfolio

While it is not required that the two names match, using the same name makes it easier to understand that the two entities are related.

The annotation in the CREATE DATABASE statement assigns a metadata description to the database. The metadata is purely descriptive and it could be used as a label in a user interface. It is not directly used during the virtual database build or deployment process.

6.1.2.2. Translator definition

A virtual database uses adapters, known as translators or data wrappers, to convert the data that it receives from different types of external data sources into a standard format. Data virtualization uses a wide range of data wrappers. The DDL for our virtual database uses the following SQL statements to define data wrappers for the postgreSQL and REST data sources that we’re using in this tutorial:

CREATE FOREIGN DATA WRAPPER rest;
CREATE FOREIGN DATA WRAPPER postgresql;

6.1.2.3. External data source definitions

Our virtual database connects to external two data sources to retrieve data. In the preceding sections, we defined the translators and the connection properties for these data sources. Now we’ll add the DDL to define two SERVER elements to represent the connections to the data sources. The names that we assign to the SERVER objects tie them to the datasource properties that we specified earlier.

The SQL in this section also defines a FOREIGN DATA WRAPPER (FDW) for each connection. We add these foreign data wrappers to enable the virtual database to manage the data that is stored on these external servers. By defining these FDWs for the virtual database we enable it to query the postgreSQL and REST data sources as if they were local tables.

CREATE SERVER "accountdb" FOREIGN DATA WRAPPER postgresql;
CREATE SERVER "quotesvc" FOREIGN DATA WRAPPER rest;

6.1.2.4. Schema creation

A virtual database is a collection of multiple schemas. These schemas can be physical, representing a foreign data source such as our postgreSQL database, or they can be virtual as with the schema that provides in the data virtualization layer.

CREATE SCHEMA marketdata SERVER "quotesvc";
CREATE SCHEMA accounts SERVER "accountdb";

CREATE VIRTUAL SCHEMA Portfolio;

Physical schema such as the marketdata and accounts schema that we define here, represent data elements in the external data sources. The physical schema definitions in the preceding DDL excerpt refer to the SERVER objects that we defined for those external data sources in Section 6.1.2.3, “External data source definitions”. The final statement in the preceding excerpt establishes a virtual schema for our Portfolio virtual database.

6.1.2.5. Metadata import

In Section 6.1.2.3, “External data source definitions” we defined the foreign schema and associated them with their corresponding SERVER objects. However, we have not provided the data virtualization service with the information that it needs to work with the schema in the foreign data sources. To provide that information, we add the following statements to our DDL:

SET SCHEMA marketdata;
IMPORT FROM SERVER "quotesvc" INTO marketdata;

SET SCHEMA accounts;
IMPORT FROM SERVER "accountdb" INTO accounts OPTIONS (
        "importer.useFullSchemaName" 'false',
        "importer.tableTypes" 'TABLE,VIEW');

In the preceding DDL excerpt we import metadata from our data sources into the schema that we created in Section 6.1.2.4, “Schema creation”.

  • The metadata from the quotesvc data source server is imported into the marketdata foreign schema.
  • The metadata from the accountdb data source server is imported into the accounts foreign schema.
Note

You can specify the full metadata for tables, procedures, and so forth. However, for the purposes of this tutorial, we will dynamically import that information.

6.1.2.6. Virtual schema definition

The final section of the DDL defines the data abstraction layer of the virtual database. Here we define a schema that uses the elements of the metadata that we defined in the preceding PHYSICAL schema. This virtual schema is where the heavy lifting occurs. It defines the logic for combining information from our source tables.

You can define any number of virtual schema layers, but for simplicity, in this example we create a single layer.

In this final portion of the DDL, we create the following virtual views:

StockPrice
This view retrieves stock prices from the quotessvc, our REST service that provides real-time stock quotes.
AccountValues
This view uses the value obtained from the StockPrice view to calculate the portfolio values of the customers listed in the Accounts postgreSQL database.
CustomerHolding
This view shows the value of individual customer accounts based on their stock holdings.
SET SCHEMA Portfolio;

      CREATE VIEW StockPrice (
          symbol string PRIMARY KEY,
          price double,
          CONSTRAINT ACS ACCESSPATTERN (symbol)
      ) AS
          SELECT p.symbol, y.price
          FROM accounts.PRODUCT as p, TABLE(call invokeHttp(action=>'GET', endpoint=>QUERYSTRING('quote', p.symbol as "symbol", `STOCK_QUOTE_API_KEY` as "token"), headers=>jsonObject('application/json' as "Content-Type"))) as x,
          JSONTABLE(JSONPARSE(x.result,true), '$' COLUMNS price double path '@.c') as y

      CREATE VIEW AccountValues (
          LastName string PRIMARY KEY,
          FirstName string,
          StockValue double
      ) AS
          SELECT c.lastname as LastName, c.firstname as FirstName, sum((h.shares_count*sp.price)) as StockValue
          FROM Customer c JOIN Account a on c.SSN=a.SSN
          JOIN Holdings h on a.account_id = h.account_id
          JOIN product p on h.product_id=p.id
          JOIN StockPrice sp on sp.symbol = p.symbol
          WHERE a.type='Active'
          GROUP BY c.lastname, c.firstname;

      CREATE VIEW CustomerHoldings (
          LastName string PRIMARY KEY,
          FirstName string,
          symbol string,
          ShareCount integer,
          StockValue double,
          CONSTRAINT ACS ACCESSPATTERN (LastName)
      ) AS
          SELECT c.lastname as LastName, c.firstname as FirstName, p.symbol as symbol, h.shares_count as ShareCount, (h.shares_count*sp.price) as StockValue
          FROM Customer c JOIN Account a on c.SSN=a.SSN
          JOIN Holdings h on a.account_id = h.account_id
          JOIN product p on h.product_id=p.id
          JOIN StockPrice sp on sp.symbol = p.symbol
          WHERE a.type='Active';

That completes our tour of the virtual database CR. See Section 6.2, “Completed virtual database custom resource file” to review the CR in its entirety.

6.2. Completed virtual database custom resource file

Bringing together the individual sections of the custom resource file that we reviewed in the preceding sections, we can now review the complete portfolio.yaml CR file:

portfolio.yaml

apiVersion: teiid.io/v1alpha1
kind: VirtualDatabase
metadata:
  name: portfolio
spec:
  replicas: 1
  datasources:
    - name: accountdb
      type: postgresql
      properties:
        - name: username
          value: user
        - name: password
          value: password
        - name: jdbc-url
          value: jdbc:postgresql://accounts/accounts
    - name: quotesvc
      type: rest
      properties:
        - name: endpoint
          value: https://finnhub.io/api/v1/
  build:
    source:
      ddl: |
        CREATE DATABASE Portfolio OPTIONS (ANNOTATION 'The Portfolio VDB');
        USE DATABASE Portfolio;

        -- translators 
        CREATE FOREIGN DATA WRAPPER rest;
        CREATE FOREIGN DATA WRAPPER postgresql;

        -- Servers 
        CREATE SERVER "accountdb" FOREIGN DATA WRAPPER postgresql;
        CREATE SERVER "quotesvc" FOREIGN DATA WRAPPER rest;

        -- Schemas 
        CREATE SCHEMA marketdata SERVER "quotesvc";
        CREATE SCHEMA accounts SERVER "accountdb";

        CREATE VIRTUAL SCHEMA Portfolio;

        -- Schema:marketdata 
        SET SCHEMA marketdata;

        IMPORT FROM SERVER "quotesvc" INTO marketdata;

        -- Schema:accounts 
        SET SCHEMA accounts;

        IMPORT FROM SERVER "accountdb" INTO accounts OPTIONS (
                "importer.useFullSchemaName" 'false',
                "importer.tableTypes" 'TABLE,VIEW');

        -- Schema:Portfolio 
        SET SCHEMA Portfolio;

        CREATE VIEW StockPrice (
            symbol string,
            price double,
            CONSTRAINT ACS ACCESSPATTERN (symbol)
        ) AS
            SELECT p.symbol, y.price
            FROM accounts.PRODUCT as p, TABLE(call invokeHttp(action=>'GET', endpoint=>QUERYSTRING('quote', p.symbol as "symbol", `STOCK_QUOTE_API_KEY` as "token"), headers=>jsonObject('application/json' as "Content-Type"))) as x,
            JSONTABLE(JSONPARSE(x.result,true), '$' COLUMNS price double path '@.c') as y

        CREATE VIEW AccountValues (
            LastName string PRIMARY KEY,
            FirstName string,
            StockValue double
        ) AS
            SELECT c.lastname as LastName, c.firstname as FirstName, sum((h.shares_count*sp.price)) as StockValue
            FROM Customer c JOIN Account a on c.SSN=a.SSN
            JOIN Holdings h on a.account_id = h.account_id
            JOIN product p on h.product_id=p.id
            JOIN StockPrice sp on sp.symbol = p.symbol
            WHERE a.type='Active'
            GROUP BY c.lastname, c.firstname;

We can now provide the CR to the Data Virtualization Operator to build and deploy the virtual database.