Red Hat Training

A Red Hat training course is available for Red Hat JBoss Data Virtualization

14.8. Query Plans

14.8.1. Query Plans

When integrating information using a federated query planner, it is useful to be able to view the query plans that are created, to better understand how information is being accessed and processed, and to troubleshoot problems.
A query plan is a set of instructions created by a query engine for executing a command submitted by a user or application. The purpose of the query plan is to execute the user's query in as efficient a way as possible.

14.8.2. Getting a Query Plan

You can get a query plan any time you execute a command. The SQL options available are as follows:
  • SET SHOWPLAN [ON|DEBUG]- Returns the processing plan or the plan and the full planner debug log.
With the above options, the query plan is available from the Statement object by casting to the org.teiid.jdbc.TeiidStatement interface or by using the "SHOW PLAN" statement.

Example 14.13. Retrieving a Query Plan

statement.execute("set showplan on");
ResultSet rs = statement.executeQuery("select ...");
TeiidStatement tstatement = statement.unwrap(TeiidStatement.class);
PlanNode queryPlan = tstatement.getPlanDescription();
System.out.println(queryPlan);
The query plan is made available automatically in several JBoss Data Virtualization tools.

14.8.3. Analyzing a Query Plan

Once a query plan has been obtained you will most commonly be looking for:
  • Source pushdown - what parts of the query were pushed to each source? Ensure that any predicates, especially against, indexes are pushed.
  • Join ordering - as federated joins can be quite expensive. They are typically influenced by costing.
  • Join criteria type mismatches.
  • Join algorithm used - merge, enhanced merge, nested loop and so forth.
  • Presence of federated optimizations, such as dependent joins.
  • Join criteria type mismatches.
All of these issues presented above will be present subsections of the plan that are specific to relational queries. If you are executing a procedure or generating an XML document, the overall query plan will contain additional information related the surrounding procedural execution.
A query plan consists of a set of nodes organized in a tree structure. As with the above example, you will typically be interested in analyzing the textual form of the plan.
In a procedural context the ordering of child nodes implies the order of execution. In most other situation, child nodes may be executed in any order even in parallel. Only in specific optimizations, such as dependent join, will the children of a join execute serially.

14.8.4. Relational Plans

Relational plans represent the actually processing plan that is composed of nodes that are the basic building blocks of logical relational operations. Physical relational plans differ from logical relational plans in that they will contain additional operations and execution specifics that were chosen by the optimizer.
The nodes for a relational query plan are:
Access
Access a source. A source query is sent to the connection factory associated with the source. [For a dependent join, this node is called Dependent Access.]
Dependent Procedure Access
Access a stored procedure on a source using multiple sets of input values.
Batched Update
Processes a set of updates as a batch.
Project
Defines the columns returned from the node. This does not alter the number of records returned.
Project Into
Like a normal project, but outputs rows into a target table.
Select
Select is a criteria evaluation filter node (WHERE / HAVING). When there is a subquery in the criteria, this node is called Dependent Select.
Insert Plan Execution
Similar to a project into, but executes a plan rather than a source query. Typically created when executing an insert into view with a query expression.
Window Function Project
Like a normal project, but includes window functions.
Join
Defines the join type, join criteria, and join strategy (merge or nested loop).
Union All
There are no properties for this node; it just passes rows through from its children. Depending upon other factors, such as if there is a transaction or the source query concurrency allowed, not all of the union children will execute in parallel.
Sort
Defines the columns to sort on, the sort direction for each column, and whether to remove duplicates or not.
Dup Remove
Removes duplicate rows. The processing uses a tree structure to detect duplicates so that results will effectively stream at the cost of IO operations.
Grouping
Groups sets of rows into groups and evaluates aggregate functions.
Null
A node that produces no rows. Usually replaces a Select node where the criteria is always false (and whatever tree is underneath). There are no properties for this node.
Plan Execution
Executes another sub plan. Typically the sub plan will be a non-relational plan.
Dependent Procedure Execution
Executes a sub plan using multiple sets of input values.
Limit
Returns a specified number of rows, then stops processing. Also processes an offset if present.
XML Table
Evaluates XMLTABLE. The debug plan will contain more information about the XQuery/XPath with regards to their optimization - see the XQuery section below or XQuery Optimization.
Text Table
Evaluates TEXTTABLE
Array Table
Evaluates ARRAYTABLE
Object Table
Evaluates OBJECTTABLE

14.8.5. Relational Plans: Node Statistics

Every node has a set of statistics that are output. These can be used to determine the amount of data flowing through the node. Before execution a processor plan will not contain node statistics. Also the statistics are updated as the plan is processed, so typically you will want the final statistics after all rows have been processed by the client.

Table 14.1. Node Statistics

Statistic
Description
Units
Node Output Rows
Number of records output from the node
count
Node Next Batch Process Time
Time processing in this node only
millisec
Node Cumulative Process Time
Elapsed time from beginning of processing to end
millisec
Node Cumulative Next Batch Process Time
Time processing in this node + child nodes
millisec
Node Next Batch Calls
Number of times a node was called for processing
count
Node Blocks
Number of times a blocked exception was thrown by this node or a child
count
In addition to node statistics, some nodes display cost estimates computed at the node.

Table 14.2. Node Cost Estimates

Cost Estimates
Description
Units
Estimated Node Cardinality
Estimated number of records that will be output from the node; -1 if unknown
count
The root node will display additional information.

14.8.6. Source Hints

Table 14.3. Registry Properties

Top level Statistics Description Units
Data Bytes Sent The size of the serialized data result (row and lob values) sent to the client bytes
The query processor plan can be obtained in a plain text or xml format. The plan text format is typically easier to read, while the xml format is easier to process by tooling. When possible tooling should be used to examine the plans as the tree structures can be deeply nested.
Data flows from the leafs of the tree to the root. Sub plans for procedure execution can be shown inline, and are differentiated by different indentation. Given a user query of "SELECT pm1.g1.e1, pm1.g2.e2, pm1.g3.e3 from pm1.g1 inner join (pm1.g2 left outer join pm1.g3 on pm1.g2.e1=pm1.g3.e1) on pm1.g1.e1=pm1.g3.e1" the text for a processor plan that does not push down the joins would look like:
ProjectNode
  + Output Columns:
    0: e1 (string)
    1: e2 (integer)
    2: e3 (boolean)
  + Cost Estimates:Estimated Node Cardinality: -1.0
  + Child 0:
    JoinNode
      + Output Columns:
        0: e1 (string)
        1: e2 (integer)
        2: e3 (boolean)
      + Cost Estimates:Estimated Node Cardinality: -1.0
      + Child 0:
        JoinNode
          + Output Columns:
            0: e1 (string)
            1: e1 (string)
            2: e3 (boolean)
          + Cost Estimates:Estimated Node Cardinality: -1.0
          + Child 0:
            AccessNode
              + Output Columns:e1 (string)
              + Cost Estimates:Estimated Node Cardinality: -1.0
              + Query:SELECT g_0.e1 AS c_0 FROM pm1.g1 AS g_0 ORDER BY c_0
              + Model Name:pm1
          + Child 1:
            AccessNode
              + Output Columns:
                0: e1 (string)
                1: e3 (boolean)
              + Cost Estimates:Estimated Node Cardinality: -1.0
              + Query:SELECT g_0.e1 AS c_0, g_0.e3 AS c_1 FROM pm1.g3 AS g_0 ORDER BY c_0
              + Model Name:pm1
          + Join Strategy:MERGE JOIN (ALREADY_SORTED/ALREADY_SORTED)
          + Join Type:INNER JOIN
          + Join Criteria:pm1.g1.e1=pm1.g3.e1
      + Child 1:
        AccessNode
          + Output Columns:
            0: e1 (string)
            1: e2 (integer)
          + Cost Estimates:Estimated Node Cardinality: -1.0
          + Query:SELECT g_0.e1 AS c_0, g_0.e2 AS c_1 FROM pm1.g2 AS g_0 ORDER BY c_0
          + Model Name:pm1
      + Join Strategy:ENHANCED SORT JOIN (SORT/ALREADY_SORTED)
      + Join Type:INNER JOIN
      + Join Criteria:pm1.g3.e1=pm1.g2.e1
  + Select Columns:
    0: pm1.g1.e1
    1: pm1.g2.e2
    2: pm1.g3.e3
Note that the nested join node is using a merge join and expects the source queries from each side to produce the expected ordering for the join. The parent join is an enhanced sort join which can delay the decision to perform sorting based upon the incoming rows. Note that the outer join from the user query has been modified to an inner join since none of the null inner values can be present in the query result.
The same plan in xml form looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<node name="ProjectNode">
    <property name="Output Columns">
        <value>e1 (string)</value>
        <value>e2 (integer)</value>
        <value>e3 (boolean)</value>
    </property>
    <property name="Cost Estimates">
        <value>Estimated Node Cardinality: -1.0</value>
    </property>
    <property name="Child 0">
        <node name="JoinNode">
            <property name="Output Columns">
                <value>e1 (string)</value>
                <value>e2 (integer)</value>
                <value>e3 (boolean)</value>
            </property>
            <property name="Cost Estimates">
                <value>Estimated Node Cardinality: -1.0</value>
            </property>
            <property name="Child 0">
                <node name="JoinNode">
                    <property name="Output Columns">
                        <value>e1 (string)</value>
                        <value>e1 (string)</value>
                        <value>e3 (boolean)</value>
                    </property>
                    <property name="Cost Estimates">
                        <value>Estimated Node Cardinality: -1.0</value>
                    </property>
                    <property name="Child 0">
                        <node name="AccessNode">
                            <property name="Output Columns">
                                <value>e1 (string)</value>
                            </property>
                            <property name="Cost Estimates">
                                <value>Estimated Node Cardinality: -1.0</value>
                            </property>
                            <property name="Query">
                                <value>SELECT g_0.e1 AS c_0 FROM pm1.g1 AS g_0 ORDER BY c_0</value>
                            </property>
                            <property name="Model Name">
                                <value>pm1</value>
                            </property>
                        </node>
                    </property>
                    <property name="Child 1">
                        <node name="AccessNode">
                            <property name="Output Columns">
                                <value>e1 (string)</value>
                                <value>e3 (boolean)</value>
                            </property>
                            <property name="Cost Estimates">
                                <value>Estimated Node Cardinality: -1.0</value>
                            </property>
                            <property name="Query">
                                <value>SELECT g_0.e1 AS c_0, g_0.e3 AS c_1 FROM pm1.g3 AS g_0
                                    ORDER BY c_0</value>
                            </property>
                            <property name="Model Name">
                                <value>pm1</value>
                            </property>
                        </node>
                    </property>
                    <property name="Join Strategy">
                        <value>MERGE JOIN (ALREADY_SORTED/ALREADY_SORTED)</value>
                    </property>
                    <property name="Join Type">
                        <value>INNER JOIN</value>
                    </property>
                    <property name="Join Criteria">
                        <value>pm1.g1.e1=pm1.g3.e1</value>
                    </property>
                </node>
            </property>
            <property name="Child 1">
                <node name="AccessNode">
                    <property name="Output Columns">
                        <value>e1 (string)</value>
                        <value>e2 (integer)</value>
                    </property>
                    <property name="Cost Estimates">
                        <value>Estimated Node Cardinality: -1.0</value>
                    </property>
                    <property name="Query">
                        <value>SELECT g_0.e1 AS c_0, g_0.e2 AS c_1 FROM pm1.g2 AS g_0
                            ORDER BY c_0</value>
                    </property>
                    <property name="Model Name">
                        <value>pm1</value>
                    </property>
                </node>
            </property>
            <property name="Join Strategy">
                <value>ENHANCED SORT JOIN (SORT/ALREADY_SORTED)</value>
            </property>
            <property name="Join Type">
                <value>INNER JOIN</value>
            </property>
            <property name="Join Criteria">
                <value>pm1.g3.e1=pm1.g2.e1</value>
            </property>
        </node>
    </property>
    <property name="Select Columns">
        <value>pm1.g1.e1</value>
        <value>pm1.g2.e2</value>
        <value>pm1.g3.e3</value>
    </property>
</node>
Note that the same information appears in each of the plan forms. In some cases it can actually be easier to follow the simplified format of the debug plan final processor plan. From the Debug Log the same plan as above would appear as:
OPTIMIZATION COMPLETE:
PROCESSOR PLAN:
ProjectNode(0) output=[pm1.g1.e1, pm1.g2.e2, pm1.g3.e3] [pm1.g1.e1, pm1.g2.e2, pm1.g3.e3]
  JoinNode(1) [ENHANCED SORT JOIN (SORT/ALREADY_SORTED)] [INNER JOIN] criteria=[pm1.g3.e1=pm1.g2.e1] output=[pm1.g1.e1, pm1.g2.e2, pm1.g3.e3]
    JoinNode(2) [MERGE JOIN (ALREADY_SORTED/ALREADY_SORTED)] [INNER JOIN] criteria=[pm1.g1.e1=pm1.g3.e1] output=[pm1.g3.e1, pm1.g1.e1, pm1.g3.e3]
      AccessNode(3) output=[pm1.g1.e1] SELECT g_0.e1 AS c_0 FROM pm1.g1 AS g_0 ORDER BY c_0
      AccessNode(4) output=[pm1.g3.e1, pm1.g3.e3] SELECT g_0.e1 AS c_0, g_0.e3 AS c_1 FROM pm1.g3 AS g_0 ORDER BY c_0
    AccessNode(5) output=[pm1.g2.e1, pm1.g2.e2] SELECT g_0.e1 AS c_0, g_0.e2 AS c_1 FROM pm1.g2 AS g_0 ORDER BY c_0
    
These are the node properties:
Common
  • Output Columns - what columns make up the tuples returned by this node
  • Data Bytes Sent - how many data byte, not including messaging overhead, were sent by this query
  • Planning Time - the amount of time in milliseconds spent planning the query
Relational
  • Relational Node ID - matches the node ids seen in the debug log Node(id)
  • Criteria - the boolean expression used for filtering
  • Select Columns - the columns that define the projection
  • Grouping Columns - the columns used for grouping
  • Query - the source query
  • Model Name - the model name
  • Sharing ID - nodes sharing the same source results will have the same sharing id
  • Dependent Join - if a dependent join is being used
  • Join Strategy - the join strategy (Nested Loop, Sort Merge, Enhanced Sort, etc.)
  • Join Type - the join type (Left Outer Join, Inner Join, Cross Join)
  • Join Criteria - the join predicates
  • Execution Plan - the nested execution plan
  • Into Target - the insertion target
  • Sort Columns - the columns for sorting
  • Sort Mode - if the sort performs another function as well, such as distinct removal
  • Rollup - if the group by has the rollup option
  • Statistics - the processing statistics
  • Cost Estimates - the cost/cardinality estimates including dependent join cost estimates
  • Row Offset - the row offset expression
  • Row Limit - the row limit expression
  • With - the with clause
  • Window Functions - the window functions being computed
  • Table Function - the table function (XMLTABLE, OBJECTTABLE, TEXTTABLE, etc.)
XML
  • Message
  • Tag
  • Namespace
  • Data Column
  • Namespace Declarations
  • Optional Flag
  • Default Value
  • Recursion Direction
  • Bindings
  • Is Staging Flag
  • Source In Memory Flag
  • Condition
  • Default Program
  • Encoding
  • Formatted Flag
Procedure
  • Expression
  • Result Set
  • Program
  • Variable
  • Then
  • Else
XML document model queries and procedure execution (including instead of triggers) use intermediate and final plan forms that include relational plans. Generally the structure of the xml/procedure plans will closely match their logical forms. It is the nested relational plans that will be of interest when analyzing performance issues.

14.8.7. Statistics Gathering and Single Partitions

The statistics-gathering feature in the Red Hat JBoss Data Virtualization engine does not take partition statistics into account. For most queries, using the global statistics will not provide accurate results for a single partition.
Currently, there is a manual approach that will require modeling each partition as a table. Here is an example:
CREATE FOREIGN TABLE q1 (id integer primary key, company varchar(10), order_date timestamp) OPTIONS (NAMEINSOURCE 'dvqe_order_partitioned partition(dvqe_order_partitioned_q1, CARDINALITY '20000')');

CREATE FOREIGN TABLE q2 (id integer primary key, company varchar(10), order_date timestamp) OPTIONS (NAMEINSOURCE 'dvqe_order_partitioned partition(dvqe_order_partitioned_q2, CARDINALITY '10000')');

CREATE FOREIGN TABLE q3 (id integer primary key, company varchar(10), order_date timestamp) OPTIONS (NAMEINSOURCE 'dvqe_order_partitioned partition(dvqe_order_partitioned_q3, CARDINALITY '1000')');

CREATE FOREIGN TABLE q4 (id integer primary key, company varchar(10), order_date timestamp) OPTIONS (NAMEINSOURCE 'dvqe_order_partitioned partition(dvqe_order_partitioned_q4, CARDINALITY '3000000')');

CREATE VIEW orders (id integer primary key, company varchar(10), order_date timestamp) AS SELECT * FROM q1 UNION SELECT * FROM q2 UNION SELECT * FROM q3 UNION SELECT * FROM q4;
The statistics can be updated using Teiid Designer, by setting the cardinality on the table or, alternatively you can use the System procedure:
SYSADMIN.setTableStats(IN tableName string NOT NULL, IN cardinality long NOT NULL)