Red Hat Training

A Red Hat training course is available for Red Hat JBoss Data Virtualization

13.21. MongoDB Translator

13.21.1. MongoDB

MongoDB is a document based "schema-less" database with it own query language. It does not map perfectly with relational concepts or the SQL query language. More and more systems are using this type of NoSQL store for scalability and performance. For example, applications like storing audit logs or managing web site data fits well with MongoDB, and does not require using a structural database like Oracle, Postgres ect. MongoDB uses JSON documents as its primary storage unit, and it can have additional embedded documents inside the parent document. By using embedded documents it co-locates the related information to achieve de-normalization that typically requires either duplicate data or joins to achieve in a relational database.
For MongoDB to work with JBoss Data Virtualization, the challenge for the MongoDB translator is to design a MongoDB store that can achieve the balance between relational and document based storage. In our opinion the advantages of "schema-less" design are great at development time. "Schema-less" can also be a problem with migration of application versions and the ability to query and make use of returned information effectively.
Since it is hard and may be impossible in certain situations to derive a schema based on existing the MongoDB collection(s), JBoss Data Virtualization approaches the problem in reverse compared to other translators. When working with MongoDB, JBoss Data Virtualization requires the user to define the MongoDB schema upfront using JBoss Data Virtualization metadata. Since JBoss Data Virtualization only allows relational schema as its metadata, the user needs to define their MongoDB schema in relational terms using tables, procedures, and functions. For the purposes of MongoDB, the JBoss Data Virtualization metadata has been extended to support extension properties that can be defined on the table to convert it into a MongoDB based document. These extension properties let users define how a MongoDB document is structured and stored. Based on the relationships (primary-key, foreign-key) defined on a table and the cardinality (ONE-to-ONE, ONE-to-MANY, MANY-to-ONE), relations between tables are mapped such that related information can be embedded along with the parent document for co-location (see the de-normalization comment above). Thus a relational schema based design, but document based storage in MongoDB.

13.21.2. MongoDB Translator

The MongoDB translator, known by the type name mongodb, provides a relational view of data that resides in a MongoDB database. This translator is capable of converting JBoss Data Virtualization SQL queries into MongoDB based queries. It supports a full range of SELECT, INSERT, UPDATE and DELETE calls.
The document structure in MongoDB can be more complex than what JBoss Data Virtualization can currently define. This translator is currently designed for:
  • Users that are using relational databases and would like to move/migrate their data to MongoDB to take advantage of scaling and performance, without modifying end user applications that are currently running.
  • Users that are starting out with MongoDB and do not have experience with MongoDB, but are seasoned SQL developers. This provides a low barrier of entry compared to using MongoDB directly as an application developer.
  • Integrating other enterprise data sources with MongoDB based data.

Note

The MongoDB translator does not currently support native queries.

Note

The resource adapter for this translator is provided by configuring the "mongodb" data source in the JBoss EAP instance. See the Red Hat JBoss Data Virtualization Administration Guide for more configuration information. An example configuration file is found at EAP_HOME/docs/teiid/datasources/mongodb.

13.21.3. MongoDB Translator: Example DDL

The name of the translator to use in vdb.xml is "mongodb":
<vdb name="nothwind" version="1">
    <model name="northwind">
        <source name="local" translator-name="mongodb" connection-jndi-name="java:/mongoDS"/>
    </model>
<vdb>
The translator does not provide a connection to the MongoDB. For that purpose, Teiid has a JCA adapter that provides a connection to MongoDB using the MongoDB Java Driver. To define such connector, use the following XML fragment in standalone-teiid.xml.
<resource-adapters>
    <resource-adapter id="mongodb">
        <module slot="main" id="org.jboss.teiid.resource-adapter.mongodb"/>
        <transaction-support>NoTransaction</transaction-support>
        <connection-definitions>
            <connection-definition class-name="org.teiid.resource.adapter.mongodb.MongoDBManagedConnectionFactory"
                    jndi-name="java:/mongoDS"
                    enabled="true"
                    use-java-context="true"
                    pool-name="teiid-mongodb-ds">
                     
                  <!-- MongoDB server list (host:port[;host:port...]) -->
                  <config-property name="RemoteServerList">localhost:27017</config-property>
                  <!-- Database Name in the MongoDB -->
                  <config-property name="Database">test</config-property>
                    <!--
                        Uncomment these properties to supply user name and password
                    <config-property name="Username">user</config-property>
                    <config-property name="Password">user</config-property>
                    --> 
            </connection-definition>
        </connection-definitions>
    </resource-adapter>
</resource-adapters>
MongoDB translator can derive the metadata based on existing document collections in some scenarios, however when working with complex documents the interpretation of metadata may be accurate, in those situations the user MUST define the metadata. For example, you can define a schema using DDL:
<vdb name="nothwind" version="1">
    <model name="northwind">
        <source name="local" translator-name="mongodb" connection-jndi-name="java:/mongoDS"/>
            <metadata type="DDL"><![CDATA[
                CREATE FOREIGN TABLE  Customer (
                    customer_id integer,
                    FirstName varchar(25),
                    LastName varchar(25)
                ) OPTIONS(UPDATABLE 'TRUE');
            ]> </metadata>
    </model>
<vdb>
When this INSERT operation is executed against table using Teiid, MongoDB translator will create a document in the MongoDB.
INSERT INTO Customer(customer_id, FirstName, LastName) VALUES (1, 'John', 'Doe');
{
  _id: ObjectID("509a8fb2f3f4948bd2f983a0"),
  customer_id: 1,
  FirstName: "John",
  LastName: "Doe"
}
If a PRIMARY KEY is defined on the table, then that column name is automatically used as "_id" field in the MongoDB collection, then document structure is stored in the MongoDB.
CREATE FOREIGN TABLE  Customer (
    customer_id integer PRIMARY KEY,
    FirstName varchar(25),
    LastName varchar(25)
) OPTIONS(UPDATABLE 'TRUE');
{
  _id: 1,
  FirstName: "John",
  LastName: "Doe"
}
If you defined the composite PRIMARY KEY on Customer table, the document structure will look like this:
CREATE FOREIGN TABLE  Customer (
    customer_id integer,
    FirstName varchar(25),
    LastName varchar(25),
    PRIMARY KEY (FirstName, LastName)
) OPTIONS(UPDATABLE 'TRUE');
{
  _id: {
         FirstName: "John",
         LastName:  "Doe"
       },
  customer_id: 1,
}
MongoDB translator supports automatic mapping of Teiid data types into MongoDB data types, including the support for Blobs, Clobs and XML. The LOB support is based on GridFS in MongoDB. Arrays are in this form:
{
  _id: 1,
  FirstName: "John",
  LastName: "Doe"
  Score: [89, "ninety", 91.0]
}
User can get individual items in the array using function array_get, or can transform the array into tabular structure using ARRATTABLE.

Note

Note that even though embedded documents can also be in arrays, the handling of embedded documents is different from array with scalar values.

Note

Regular Expressions, MongoDB::Code, MongoDB::MinKey, MongoDB::MaxKey and MongoDB::OID are not supported.

13.21.4. MongoDB Translator: Metadata Extensions

Using the above DDL or any other metadata facility, a user can map a table in a relational store into a document in MongoDB, however to make effective use of MongoDB, you need to be able to build complex documents, that can co-locate related information, so that data can queried in a single MongoDB query. Otherwise, since MongoDB does not support join relationships like relational database, you need to issue multiple queries to retrieve and join data manually. The power of MongoDB comes from its "embedded" documents and its support of complex data types like arrays and use of the aggregation framework to be able to query them. This translator provides way to achieve that goals.
When you do not define the complex embedded documents in MongoDB, Teiid can step in for join processing and provide that functionality, however if you want to make use of the power of MongoDB itself in querying the data and avoid bringing the unnecessary data and improve performance, you need to look into building these complex documents.
MongoDB translator defines two additional metadata properties along with other Teiid metadata properties to aid in building the complex "embedded" documents. You can use the following metadata properties in your DDL.
  • teiid_mongo:EMBEDDABLE - Means that data defined in this table is allowed to be included as an "embeddable" document in any parent document. The parent document is referenced by the foreign key relationships. In this scenario, Teiid maintains more than one copy of the data in MongoDB store, one in its own collection and also a copy in each of the parent tables that have relationship to this table. You can even nest embeddable table inside another embeddable table with some limitations. Use this property on table, where table can exist, encompass all its relations on its own. For example, a "Category" table that defines a "Product"'s category is independent of Product, which can be embeddable in "Products" table.
  • teiid_mongo:MERGE - Means that data of this table is merged with the defined parent table. There is only a single copy of the data that is embedded in the parent document. Parent document is defined using the foreign key relationships.

Important

A given table can contain either the "teiid_mongo:EMBEDDABLE" property or the "teiid_mongo:MERGE" property defining the type of nesting in MongoDB. A table is not allowed to have both properties.
  • EMBEDDABLE - Means that data defined in this table is allowed to be included as an "embeddable" document in a parent document. The parent document is defined by the foreign key relationships. In this situation, JBoss Data Services maintains more than one copy of the data in a MongoDB store: one in its own collection and also a copy in each of the parent tables that have relationship to this table.
  • EMBEDIN - Means that data of this table is embedded in the defined parent table. There is only a single copy of the data that is embedded in the parent document.
These properties behave differently for particular relationship types on the schema:
  • ONE-2-ONE: Here is the DDL structure representing the ONE-2-ONE relationship:
    CREATE FOREIGN TABLE  Customer (
        CustomerId integer PRIMARY KEY,
        FirstName varchar(25),
        LastName varchar(25)
    ) OPTIONS(UPDATABLE 'TRUE');
     
    CREATE FOREIGN TABLE Address (
        CustomerId integer,
        Street varchar(50),
        City varchar(25),
        State varchar(25),
        Zipcode varchar(6),
        FOREIGN KEY (CustomerId) REFERENCES Customer (CustomerId)
     ) OPTIONS(UPDATABLE 'TRUE');
    
    By default, this will produce two different collections in MongoDB, like with sample data it will look like this:
    Customer
    {
      _id: 1,
      FirstName: "John",
      LastName: "Doe"
    }
     
    Address
    { 
      _id: ObjectID("..."),
       CustomerId: 1,
       Street: "123 Lane"
       City: "New York",
       State: "NY"
       Zipcode: "12345"
    }
    
    You can enhance the storage in MongoDB to a single collection by using "teiid_mongo:MERGE' extension property on the table's OPTIONS clause:
    CREATE FOREIGN TABLE  Customer (
        CustomerId integer PRIMARY KEY,
        FirstName varchar(25),
        LastName varchar(25)
    ) OPTIONS(UPDATABLE 'TRUE');
     
    CREATE FOREIGN TABLE Address (
        CustomerId integer PRIMARY KEY,
        Street varchar(50),
        City varchar(25),
        State varchar(25),
        Zipcode varchar(6),
        FOREIGN KEY (CustomerId) REFERENCES Customer (CustomerId)
     ) OPTIONS(UPDATABLE 'TRUE', "teiid_mongo:MERGE" 'Customer');
    
    This will produce a single collection in the MongoDB:
    Customer
    {
      _id: 1,
      FirstName: "John",
      LastName: "Doe",
      Address:
         { 
            Street: "123 Lane",
            City: "New York",
            State: "NY",
            Zipcode: "12345"
         }
    }
    
    Both tables are merged into a single collection that can be queried together using the JOIN clause in the SQL command. Since the existence of child/additional record has no meaning with out parent table using the "teiid_mongo:MERGE" extension property is right choice in this situation.

    Note

    Note that the Foreign Key defined on child table, must refer to Primary Keys on both parent and child tables to form a One-2-One relationship.
  • ONE-2-MANY: Typically there are only two tables involved in this relationship. If MANY side is only associated one table, then use "EMBEDIN" property on MANY side of table and define the parent. If associated with more than single table, then use "EMBEDDABLE". When MANY side is stored in ONE side, they are stored as array of embedded document. If associated with more than single table then use "teiid_mongo:EMBEDDABLE".
    Here is a sample DDL:
    CREATE FOREIGN TABLE  Customer (
        CustomerId integer PRIMARY KEY,
        FirstName varchar(25),
        LastName varchar(25)
    ) OPTIONS(UPDATABLE 'TRUE');
     
    CREATE FOREIGN TABLE  Order (       
        OrderID integer PRIMARY KEY,
        CustomerId integer,
        OrderDate date,
        Status integer,
        FOREIGN KEY (CustomerId) REFERENCES Customer (CustomerId)
    ) OPTIONS(UPDATABLE 'TRUE');
    
    In this sample, a single Customer can have many orders. There are two options to define the how we store the MongoDB document. If in your schema, the Customer table's CustomerId is only referenced in Order table (i.e. Customer information used for only Order purposes), you can use
    CREATE FOREIGN TABLE  Customer (
        CustomerId integer PRIMARY KEY,
        FirstName varchar(25),
        LastName varchar(25)
    ) OPTIONS(UPDATABLE 'TRUE');
     
    CREATE FOREIGN TABLE  Order (       
        OrderID integer PRIMARY KEY,
        CustomerId integer,
        OrderDate date,
        Status integer,
        FOREIGN KEY (CustomerId) REFERENCES Customer (CustomerId)
    ) OPTIONS(UPDATABLE 'TRUE', "teiid_mongo:MERGE" 'Customer');
    
    This will produce a single document for the customer table:
    {
      _id: 1,
      FirstName: "John",
      LastName: "Doe",
      Order:
      [
         { 
           _id: 100,
            OrderDate: ISODate("2000-01-01T06:00:00Z")
            Status: 2
         },
         { 
           _id: 101,
            OrderDate: ISODate("2001-03-06T06:00:00Z")
            Status: 5
         }
         ...
       ]
    }
    
    If the customer table is referenced in more tables other than Order table, then use the "teiid_mongo:EMBEDDABLE" property:
    CREATE FOREIGN TABLE Customer (
        CustomerId integer PRIMARY KEY,
        FirstName varchar(25),
        LastName varchar(25)
    ) OPTIONS(UPDATABLE 'TRUE', "teiid_mongo:EMBEDDABLE" 'TRUE');
     
    CREATE FOREIGN TABLE Order (       
        OrderID integer PRIMARY KEY,
        CustomerId integer,
        OrderDate date,
        Status integer,
        FOREIGN KEY (CustomerId) REFERENCES Customer (CustomerId)
    ) OPTIONS(UPDATABLE 'TRUE');
     
    CREATE FOREIGN TABLE Comments (       
        CommentID integer PRIMARY KEY,
        CustomerId integer,
        Comment varchar(140),
        FOREIGN KEY (CustomerId) REFERENCES Customer (CustomerId)
    ) OPTIONS(UPDATABLE 'TRUE');
    
    This creates three different collections in MongoDB:
    Customer
    {
      _id: 1,
      FirstName: "John",
      LastName: "Doe"
    }
     
    Order
    { 
      _id: 100,
      CustomerId: 1,
      OrderDate: ISODate("2000-01-01T06:00:00Z")
      Status: 2
      Customer:
       {
         FirstName: "John",
         LastName: "Doe"
       }
    }
     
    Comment
    {
      _id: 12,
      CustomerId: 1,
      Comment: "This works!!!"
      Customer:
       {
         FirstName: "John",
         LastName: "Doe"
       }
    }
    
    Here the Customer table contents are embedded along with other table's data where they were referenced. This creates duplicated data where multiple of these embedded documents are managed automatically in the MongoDB translator.

    Warning

    All the SELECT, INSERT, DELETE operations that are generated against the tables with "teiid_mongo:EMBEDDABLE" property are atomic, except for UPDATES, as there can be multiple operations involved to update all the copies.
  • MANY-2-ONE: This is the same as ONE-2-MANY. Apply them in reverse.

    Note

    A parent table can have multiple "embedded" and as well as "merge" documents inside it, it not limited so either one or other. However, please note that MongoDB imposes document size is limited can not exceed 16MB.
  • Many-to-Many: This can also mapped with combination of "teiid_mongo:MERGE" and "teiid_mongo:EMBEDDABLE" properties (partially). Here is a sample DDL:
    CREATE FOREIGN TABLE Order (       
        OrderID integer PRIMARY KEY,
        OrderDate date,
        Status integer
    ) OPTIONS(UPDATABLE 'TRUE');
     
    CREATE FOREIGN TABLE OrderDetail (               
        OrderID integer,
        ProductID integer,
        PRIMARY KEY (OrderID,ProductID),
        FOREIGN KEY (OrderID) REFERENCES Order (OrderID),
        FOREIGN KEY (ProductID) REFERENCES Product (ProductID)
    ) OPTIONS(UPDATABLE 'TRUE');
     
    CREATE FOREIGN TABLE Products (
       ProductID integer PRIMARY KEY,
       ProductName varchar(40)
    ) OPTIONS(UPDATABLE 'TRUE');
    
    Modify the DDL so that it looks like this:
    CREATE FOREIGN TABLE Order (       
        OrderID integer PRIMARY KEY,
        OrderDate date,
        Status integer
    ) OPTIONS(UPDATABLE 'TRUE');
     
    CREATE FOREIGN TABLE OrderDetail (               
        OrderID integer,
        ProductID integer,
        PRIMARY KEY (OrderID,ProductID),
        FOREIGN KEY (OrderID) REFERENCES Order (OrderID),
        FOREIGN KEY (ProductID) REFERENCES Product (ProductID)
    ) OPTIONS(UPDATABLE 'TRUE', "teiid_mongo:MERGE" 'Order');
     
    CREATE FOREIGN TABLE Products (
       ProductID integer PRIMARY KEY,
       ProductName varchar(40)
    ) OPTIONS(UPDATABLE 'TRUE',  "teiid_mongo:EMBEDDABLE" 'TRUE');
    
    A document that looks like this is produced:
    {
       _id : 10248,
       OrderDate : ISODate("1996-07-04T05:00:00Z"),
       Status : 5
       OrderDetails : [
         {
           _id : {
                   OrderID : 10248,
                   ProductID : 11
                   Products : {
                      ProductID: 11
                      ProductName: "Hammer"
                   }
           }
         },
         {
           _id : {
             OrderID : 10248,
             ProductID : 14
             Products : {
                 ProductID: 14
                 ProductName: "Screw Driver"
             }
           }
         }
       ]
    }
     
    Products
    {
        {
          ProductID: 11
          ProductName: "Hammer"
        }
        {
          ProductID: 14
          ProductName: "Screw Driver"
        }
    }
    

    Warning

    • Currently nested embedding of documents has limited support due to capabilities of handling nested arrays is limited in the MongoDB. Nesting of "EMBEDDABLE" property with multiple levels is allowed but more than one level with MERGE is not. Also, be careful not to exceed the document size of 16 MB for a single row, (hence deep nesting is not recommended).
    • JOINS between related tables, must use either the "EMBEDDABLE" or "MERGE" property, otherwise the query will result in error. In order for Teiid to correctly plan and support the JOINS, in the case that any two tables are NOT embedded in each other, use allow-joins=false property on the Foreign Key that represents the relation. Here is an example:
      REATE FOREIGN TABLE  Customer (
          CustomerId integer PRIMARY KEY,
          FirstName varchar(25),
          LastName varchar(25)
      ) OPTIONS(UPDATABLE 'TRUE');
       
      CREATE FOREIGN TABLE  Order (       
          OrderID integer PRIMARY KEY,
          CustomerId integer,
          OrderDate date,
          Status integer,
          FOREIGN KEY (CustomerId) REFERENCES Customer (CustomerId) OPTIONS (allow-join 'FALSE')
      ) OPTIONS(UPDATABLE 'TRUE');
      
      In this case, Teiid will create two collections. However when a user issues query such as this, instead of resulting in error, the JOIN processing will happen in the Teiid engine, without the above property it will result in an error:
      SELECT OrderID, LastName FROM Order JOIN Customer ON Order.CustomerId = Customer.CustomerId;
      
MongoDB translator supports geo-spatial query operators in the "WHERE" clause, when the data is stored in the GeoJSon format in the MongoDB Document. These functions are supported:
CREATE FOREIGN FUNCTION geoIntersects (columnRef string,  type string, coordinates double[][]) RETURNS boolean;
CREATE FOREIGN FUNCTION geoWithin (ccolumnRef string,  type string, coordinates double[][]) RETURNS boolean;
CREATE FOREIGN FUNCTION near (ccolumnRef string,  coordinates double[], maxdistance integer) RETURNS boolean;
CREATE FOREIGN FUNCTION nearSphere (ccolumnRef string, coordinates double[], maxdistance integer) RETURNS boolean;
CREATE FOREIGN FUNCTION geoPolygonIntersects (ref string, north double, east double, west double, south double) RETURNS boolean;
CREATE FOREIGN FUNCTION geoPolygonWithin (ref string, north double, east double, west double, south double) RETURNS boolean;
Here is a sample query:
SELECT loc FROM maps where mongo.geoWithin(loc, 'LineString', ((cast(1.0 as double), cast(2.0 as double)), (cast(1.0 as double), cast(2.0 as double))))
MongoDB translator designed on top of the MongoDB aggregation framework, use of MongoDB version that supports this framework is mandatory. Apart from SELECT queries, this translator also supports INSERT, UPDATE and DELETE queries. It also supports grouping, matching, sorting, filtering, limit, support for LOBs using GridFS and composite primary and foreign keys.
MongoDB source procedures may be created using the teiid_rel:native-query extension. The procedure will invoke the native-query similar to a direct procedure call with the benefits that the query is predetermined and that result column types are known, rather than requiring the use of ARRAYTABLE or similar functionality.

Warning

This feature is turned off by default because of the security risk this exposes to execute any command against the source. To enable this feature, override the execution property called SupportsDirectQueryProcedure to true.

Note

By default the name of the procedure that executes the queries directly is called native. Override the execution property DirectQueryProcedureName to change it to another name.
The MongoDB translator provides a procedure to execute any ad-hoc aggregate query directly against the source without Teiid parsing or resolving. Since the metadata of this procedure's results are not known to Teiid, they are returned as an object array containing single blob at array location one(1). This blob contains the JSON document. XMLTABLE can be used construct tabular output for consumption by client applications.
select x.* from TABLE(call native('city;{$match:{"city":"FREEDOM"}}')) t,
      xmltable('/city' PASSING JSONTOXML('city', cast(array_get(t.tuple, 1) as BLOB)) COLUMNS city string, state string) x
In this example, a collection called "city" is looked up with filter that matches the "city" name with "FREEDOM", using "native" procedure and then using the nested tables feature the output is passed to a XMLTABLE construct, where the output from the procedure is sent to a JSONTOXML function to construct a XML then the results of that are exposed in tabular form.

Important

The direct query must be in this format:
"collectionName;{$pipeline instr}+"
MongoDB translator also allows to execute Shell type java script commands like remove, drop, createIndex.
The commands need to be in this format:
"$ShellCmd;collectionName;operationName;{$instr}+"
Here is an example:
"$ShellCmd;MyTable;remove;{ qty: { $gt: 20 }}"