Chapter 19. DDL File Sequencer

19.1. DDL File Sequencer

The DDL file sequencer included in ModeShape is capable of parsing the more important DDL statements from SQL-92, Oracle, Derby, and PostgreSQL, and constructing a graph structure containing a structured representation of these statements. The resulting graph structure is largely the same for all dialects, though some dialects have non-standard additions to their grammar, and thus require dialect-specific additions to the graph structure.
The sequencer is designed to behave as intelligently as possible with as little configuration. Thus, the sequencer automatically determines the dialect used by a given DDL stream. This can be tricky, of course, since most dialects are very similar and the distinguishing features of a dialect may only be apparent in some of the statements.
To get around this, the sequencer uses a "best fit" algorithm: run the DDL stream through the parser for each of the dialects, and determine which parser was able to successfully read the greatest number of statements and tokens.

Note

It is possible to define which DDL dialects (or grammars) should be considered during sequencing using the "grammars" property in the sequencer configuration. Set the values of this property to the names of the grammars (e.g., "oracle", "postgres", "standard", or "derby"), specified in the order they should be used. To use a custom DDL parser not provided by ModeShape, provide the fully-qualified class name of the DdlParser implementation class.
One very interesting capability of this sequencer is that, although only a subset of the (more common) DDL statements are supported, the sequencer is still extremely functional since it does still add all statements into the output graph, just without much detail other than just the statement text and the position in the DDL file. Thus, if a DDL file contains statements the sequencer understands and statements the sequencer does not understand, the graph will still contain all statements, where those statements understood by the sequencer will have full detail. Since the underlying parsers are able to operate upon a single statement, it is possible to go back later (after the parsers have been enhanced to support additional DDL statements) and re-parse only those incomplete statements in the graph.
At this time, the sequencer supports SQL-92 standard DDL as well as dialects from Oracle, Derby, and PostgreSQL. It supports:
  • Detailed parsing of CREATE SCHEMA, CREATE TABLE and ALTER TABLE.
  • Partial parsing of DROP statements
  • General parsing of remaining schema definition statements (i.e. CREATE VIEW, CREATE DOMAIN, etc.
Note that the sequencer does not perform detailed parsing of SQL (i.e. SELECT, INSERT, UPDATE, etc....) statements.

19.2. DDL File Sequencer Properties

For information about configurable properties relating to the DDL File Sequencer, refer to the org.modeshape.sequencer.ddl.DdlSequencer class in the Data Services JavaDoc.

19.3. DDL File Sequencer Example

Sequencing results in graph nodes basically representing the BNF structure of each DDL statement. Below is an example DDL schema definition statement containing table and view definition statements.
CREATE SCHEMA hollywood
	  CREATE TABLE films (title varchar(255), release date, producerName varchar(255))
	  CREATE VIEW winners AS SELECT title, release FROM films WHERE producerName IS NOT NULL;
The resulting graph structure contains the raw statement expression, pertinent table, column and key reference information and position of the statement in the text stream (e.g., line number, column number and character index) so the statement can be tied back to the original DDL:
<nt:unstructured jcr:name="statements" 
                 jcr:mixinTypes = "mode:derived" 
                 mode:derivedAt="2011-05-13T13:12:03.925Z" 
                 mode:derivedFrom="/files/foo.sql"
                 ddl:parserId="POSTGRES">
  <nt:unstructured jcr:name="hollywood" jcr:mixinTypes="ddl:createSchemaStatement" 
	                 ddl:startLineNumber="1"
                   ddl:startColumnNumber="1"
                   ddl:expression="CREATE SCHEMA hollywood"
                   ddl:startCharIndex="0">
    <nt:unstructured jcr:name="films" jcr:mixinTypes="ddl:createTableStatement"
                   ddl:startLineNumber="2"
                   ddl:startColumnNumber="5"
                   ddl:expression="CREATE TABLE films (title varchar(255), release date, producerName varchar(255))"
                   ddl:startCharIndex="28"/>
    <nt:unstructured jcr:name="title" jcr:mixinTypes="ddl:columnDefinition"
                   ddl:datatypeName="VARCHAR"
                   ddl:datatypeLength="255"/>
    <nt:unstructured jcr:name="release" jcr:mixinTypes="ddl:columnDefinition"
                   ddl:datatypeName="DATE"/>
    <nt:unstructured jcr:name="producerName" jcr:mixinTypes="ddl:columnDefinition"
                   ddl:datatypeName="VARCHAR"
                   ddl:datatypeLength="255"/>
  <nt:unstructured jcr:name="winners" jcr:mixinTypes="ddl:createViewStatement"
                   ddl:startLineNumber="3"
                   ddl:startColumnNumber="5"
                   ddl:expression="CREATE VIEW winners AS SELECT title, release FROM films WHERE producerName IS NOT NULL;"
                   ddl:queryExpression="SELECT title, release FROM films WHERE producerName IS NOT NULL"
                   ddl:startCharIndex="113"/>
</nt:unstructured>
Note that all nodes are of type nt:unstructured while the type of statement is identified using mixins. Also, each of the nodes representing a statement contain: a ddl:expression property with the exact statement as it appeared in the original DDL stream; a ddl:startLineNumber and ddl:startColumnNumber property defining the position in the original DDL stream of the first character in the expression; and a ddl:startCharIndex property that defines the integral index of the first character in the expression as found in the DDL stream. All of these properties make sure the statement can be traced back to its location in the original DDL.

19.4. Configuring a DDL File Sequencer

  1. Include the relevant libraries

    Include modeshape-sequencer-ddl-VERSION.jar in your application.
  2. Choose one of the following for sequencing configuration

    • Define sequencing configuration based on standard example provided in SOA-ROOT/eds/modeshape/resources/modeshape-config-standard.xml:
      <mode:sequencer jcr:name="DDL File Sequencer" mode:classname="org.modeshape.sequencer.ddl.DdlSequencer">
        <mode:description>
          Sequences DDL files loaded under '/files', extracting the structured abstract syntax tree of the DDL commands and expressions.
        </mode:description>
        <mode:pathExpression>
          eds-store:default:/files(//(*.ddl[*]))/jcr:content[@jcr:data] => eds-store:default:/sequenced/ddl/$1
        </mode:pathExpression>
      </mode:sequencer>
      
    • Configure via org.modeshape.jcr.JcrConfiguration:
      JcrConfiguration config = ...
      
      config.sequencer("DDL File Sequencer")
            .usingClass("org.modeshape.sequencer.ddl.DdlSequencer")
            .loadedFromClasspath()
            .setDescription("Sequences DDL files loaded under '/files', extracting the structured abstract syntax tree of the DDL commands and expressions.")
            .sequencingFrom("/files(//(*.ddl[*]))/jcr:content[@jcr:data]")
            .andOutputtingTo("/sequenced/ddl/$1");
      This will use all of the built-in grammars (e.g., "standard", "oracle", "postgres", and "derby"). To specify a different order or subset of the grammars, use the setProperty(...) method. The following example uses the standard grammar followed by the PostgreSQL grammar:
      config.sequencer("DDL File Sequencer")
            .usingClass("org.modeshape.sequencer.ddl.DdlSequencer")
            .loadedFromClasspath()
            .setDescription("Sequences DDL files loaded under '/files', extracting the structured abstract syntax tree of the DDL commands and expressions.")
            .setProperty("grammar","standard","postgres")
            .sequencingFrom("/files(//(*.ddl[*]))/jcr:content[@jcr:data]")
            .andOutputtingTo("/sequenced/ddl/$1");
      To use a custom implementation of DdlParser, use the fully-qualified name of the implementation class (which must have a no-arg constructor) as the name of the grammar:
      config.sequencer("DDL File Sequencer")
            .usingClass("org.modeshape.sequencer.ddl.DdlSequencer")
            .loadedFromClasspath()
            .setDescription("Sequences DDL files loaded under '/files', extracting the structured abstract syntax tree of the DDL commands and expressions.")
            .setProperty("grammar","standard","postgres","org.example.ddl.MyCustomDdlParser")
            .sequencingFrom("/files(//(*.ddl[*]))/jcr:content[@jcr:data]")
            .andOutputtingTo("/sequenced/ddl/$1");

    Note

    Refer to SOA-ROOT/eds/modeshape/resources/modeshape-config-standard.xml for more information.