-
Language:
English
-
Language:
English
Red Hat Training
A Red Hat training course is available for Red Hat JBoss Data Virtualization
Development Guide Volume 5: Caching Guide
This guide is intended for developers
Red Hat Customer Content Services
Abstract
Chapter 1. Some Key Definitions
1.1. Result Set Caching
1.2. Internal Materialization
1.3. External Materialization
1.4. Materialized Views
1.5. Materialization Table
1.6. NOCACHE Option
Chapter 2. Using Caching
2.1. Caching
Note
2.2. Create Materialized Views
insert into target_table select * from matview option nocache matview
Users when they are designing their views, they can define additional metadata on their views to control the loading and refreshing of external materialization cache. This option provides a limited but a powerful way to manage the materialization views. For this purpose, SYSADMIN Schema model in your VDB defines three stored procedures (loadMatView, updateMatView, matViewStatus) in its schema. Based on the defined metadata on the view, and these SYSADMIN procedures a simple scheduler automatically starts during the VDB deployment and loads and keeps the cache fresh.
CREATE TABLE status ( VDBName varchar(50) not null, VDBVersion integer not null, SchemaName varchar(50) not null, Name varchar(256) not null, TargetSchemaName varchar(50), TargetName varchar(256) not null, Valid boolean not null, LoadState varchar(25) not null, Cardinality long, Updated timestamp not null, LoadNumber long not null, PRIMARY KEY (VDBName, VDBVersion, SchemaName, Name) );
Note
Table 2.1. Extension Properties
Property | Description | Optional | Default |
---|---|---|---|
teiid_rel:ALLOW_MATVIEW_MANAGEMENT |
Allow Red Hat JBoss Data Virtualization-based management
|
False
|
False
|
teiid_rel:MATVIEW_STATUS_TABLE |
fully qualified Status Table Name defined above
|
False
|
NA
|
teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT |
semi-colon(;) separated DDL/DML commands to run before the actual load of the cache, typically used to truncate staging table
|
True
|
When not defined, no script will be run
|
teiid_rel:MATVIEW_LOAD_SCRIPT |
semi-colon(;) separated DDL/DML commands to run for loading of the cache
|
True
|
will be determined based on view transformation
|
teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT |
semi-colon(;) separated DDL/DML commands to run after the actual load of the cache. Typically used to rename staging table to actual cache table. Required when MATVIEW_LOAD_SCRIPT is not defined in order to copy data from the teiid_rel:MATVIEW_STAGE_TABLE the MATVIEW table.
|
True
|
When not defined, no script will be run
|
teiid_rel:MATVIEW_SHARE_SCOPE |
Allowed values are {NONE, VDB, SCHEMA}, which define if the cached contents are shared among different VDB versions and different VDBs as long as schema names match
|
True
|
None
|
teiid_rel:MATERIALIZED_STAGE_TABLE |
When MATVIEW_LOAD_SCRIPT property not defined, Red Hat JBoss Data Virtualization loads the cache contents into this table. Required when MATVIEW_LOAD_SCRIPT not defined
|
False
|
NA
|
teiid_rel:ON_VDB_START_SCRIPT |
DML commands to run start of vdb
|
True
|
NA
|
teiid_rel:ON_VDB_DROP_SCRIPT |
DML commands to run at VDB un-deploy; typically used for cleaning the cache/status tables
|
True
|
NA
|
teiid_rel:MATVIEW_ONERROR_ACTION |
Action to be taken when mat view contents are requested but cache is invalid. Allowed values are (THROW_EXCEPTION = throws an exception, IGNORE = ignores the warning and supplied invalidated data, WAIT = waits until the data is refreshed and valid then provides the updated data)
|
True
|
WAIT
|
teiid_rel:MATVIEW_TTL |
time to live in milliseconds. Provide property or cache hint on view transformation - property takes precedence.
|
True
|
2^63 milliseconds - effectively the table will not refresh, but will be loaded a single time initially
|
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <vdb name="sakila" version="1"> <description>Shows how to call JPA entities</description> <model name="pg"> <source name="pg" translator-name="postgresql-override" connection-jndi-name="java:/sakila-ds"/> </model> <model name="sakila" type="VIRTUAL"> <metadata type="DDL"><![CDATA[ CREATE VIEW actor ( actor_id integer, first_name varchar(45) NOT NULL, last_name varchar(45) NOT NULL, last_update timestamp NOT NULL ) OPTIONS (MATERIALIZED 'TRUE', UPDATABLE 'TRUE', MATERIALIZED_TABLE 'pg.public.mat_actor', "teiid_rel:MATERIALIZED_STAGE_TABLE" 'pg.public.mat_actor_staging', "teiid_rel:ALLOW_MATVIEW_MANAGEMENT" 'true', "teiid_rel:MATVIEW_STATUS_TABLE" 'pg.public.status', "teiid_rel:MATVIEW_BEFORE_LOAD_SCRIPT" 'execute pg.native(''truncate table mat_actor_staging'');', "teiid_rel:MATVIEW_AFTER_LOAD_SCRIPT" 'execute pg.native(''ALTER TABLE mat_actor RENAME TO mat_actor_temp'');execute pg.native(''ALTER TABLE mat_actor_staging RENAME TO mat_actor'');execute pg.native(''ALTER TABLE mat_actor_temp RENAME TO mat_actor_staging;'')', "teiid_rel:MATVIEW_SHARE_SCOPE" 'NONE', "teiid_rel:MATVIEW_ONERROR_ACTION" 'THROW_EXCEPTION', "teiid_rel:MATVIEW_TTL" 300000, "teiid_rel:ON_VDB_DROP_SCRIPT" 'DELETE FROM pg.public.status WHERE Name=''actor'' AND schemaname = ''sakila''') AS SELECT actor_id, first_name, last_name, last_update from pg."public".actor; </metadata> </model> <translator name="postgresql-override" type="postgresql"> <property name="SupportsNativeQueries" value="true"/> </translator> </vdb>
2.2.1. Configure External Materialization In Teiid Designer
- Build a VDB using the Teiid Designer for your use case.
- Identify all the "Virtual Tables", that you think can use caching,
- Click on the table, then in the Properties panel, switch the Materialized property to "true".
- Right click on each materialized table, then choose Modeling - Create Materialized Views.
- Click on ... button on the Materialization Model input box.
- Select a "physical model" that already exists or create a new name for "physical model".
- Click Finish.This will create the new model (if applicable) and a table with exact schema as your selected virtual table.
- Verify that the "Materialization Table" property is now updated with name of table that has just been created.
- Navigate to the new materialized table that has been created, and click on "Name In Source" property and change it from "MV1000001" to "mv_{your_table_name}". Typically this could be same name as your virtual table name, (for example, you might name it "mv_UpdateProduct".)
- Save your model.
Note
The data source this materialized view physical model represents will be the data source for storing the materialized tables. You can select different "physical models" for different materialized tables, creating multiple places to store your materialized tables. - Once you are have finished creating all materialized tables, right click on each model (in most cases this will be a single physical model used for all the materialized views), then use Export - Teiid Designer - Data Definition Language (DDL) File to generate the DDL for the physical model.
- Select the type of the database and DDL file name and click Finish.A DDL file that contains all of the "create table" commands is generated..
- Use your favorite "client" tool for your database and create the database using the DDL file created.
- Go back to your VDB and configure the data source and translator for the "materialized" physical model to the database you just created.
- Once finished, deploy the VDB to the Red Hat JBoss Data Virtualization Server and make sure that it is correctly configured and active.
Important
2.3. External Materialization Options
INSERT INTO mv_view.mv_UpdateProduct SELECT * FROM Portfolio.UpdateProduct OPTION NOCACHE
sql=connect(${url}, ${user}, ${password}); sql.execute("DELETE FROM mv_view.mv_UpdateProduct"); sql.execute("INSERT INTO mv_view.mv_UpdateProduct SELECT * FROM Portfolio.UpdateProduct OPTION NOCACHE"); sql.close();
adminshell.sh . load.groovy
Note
- If you want to set up a job to run this script frequently at regular intervals, then on Red Hat Enterprise Linux use "cron tab" or on Microsoft Windows use "Windows Scheduler" to refresh the rows in the materialized table. Every time the script runs it will refresh the contents.
- This job needs to be run only when user access is restricted.
Important
- If it is updating all the rows in the materialized table, and you only need to update only few rows to avoid long refresh time.
- If it takes an hour for your reload your materialized table, queries executed during that time will fail to povide correct results.
- Also ensure that you create indexes on your materialization table after the data is loaded, as having indexes during the load process slows down the loading of data, especially when you are dealing with a large number of rows.
2.4. External Materialization and Red Hat JBoss Data Grid
Note
[EAP_HOME]/quickstarts/jdg7.1-remote-cache-materialization
quick start.
Important
Important
2.4.1. Materializing a View
You must have two caches and the teiid-alias-naming-cache in Red Hat JBoss Data Grid. (The teiid-alias-naming-cache only needs to be created once because it is shared across all the materializations stored in the Red Hat JBoss Data Grid instance.)
- Using Teiid Designer, click on a view that is to be materialized.
Note
Make sure the view has a primary key defined as the Red Hat JBoss Data Grid source table needs one for updates. Otherwise, you will need to manually create a primary key on each of the new JDG source tables. - Right-click on the view and click Modeling->Materialize.
- Enter the primary and staging cache names.Optionally, you can change the JDGSource model name and the directory in which the model is saved.
Note
Red Hat JBoss Data Grid restricts the name of the source model because the protobuf code is based on the Java package naming constraints. The model name becomes the package name in the.proto
file. This is due to a limitation in the way that the protobuf is defined. Because Red Hat JBoss Data Grid uses Java, the package name must follow the Java package naming standards. Dashes, for instance, are not allowed. - Click Finish.
- To control the materialization process, update the materialized view extension properties on the above selected view:
- MATVIEW_TTL - to set the refresh rate, in milliseconds
If the materialization management status table is used, then set the following extension properties:- ALLOW_MATVIEW_MANAGEMENT = true
- MATVIEW_STATUS_TABLE = {status table name}
- Create the VDB, using the models needed for materialization.For the JDGSource model, be sure the JNDI is mapped to the JDG data source. Also, enable native queries. To do this, create a translator override for the infinispan-hotrod translator. Click the supportsDirectQueryProcedure property and set the value to true.
- Deploy the VDB.
2.5. Internal Materialization
Table 2.2. Mapping
Property Name | Description |
---|---|
teiid_rel:ALLOW_MATVIEW_MANAGEMENT | Allow Teiid based management of the ttl and initial load rather than the implicit behavior |
teiid_rel:MATVIEW_PREFER_MEMORY | Same as the pref_mem cache hint option |
teiid_rel:MATVIEW_TTL | Same as the ttl cache hint option |
teiid_rel:MATVIEW_UPDATABLE | Same as the updatable cache hint option |
teiid_rel:MATVIEW_SCOPE | Same as the scope cache hint option |
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <vdb name="sakila" version="1"> <model name="pg"> <source name="pg" translator-name="postgresql" connection-jndi-name="java:/sakila-ds"/> </model> <model name="sakila" type="VIRTUAL"> <metadata type="DDL"><![CDATA[ CREATE VIEW actor ( actor_id integer, first_name varchar(45) NOT NULL, last_name varchar(45) NOT NULL, last_update timestamp NOT NULL ) OPTIONS (materialized true, "teiid_rel:MATVIEW_TTL" 120000, "teiid_rel:MATVIEW_PREFER_MEMORY" 'true', "teiid_rel:MATVIEW_UPDATABLE" 'true', "teiid_rel:MATVIEW_SCOPE" 'vdb') AS SELECT actor_id, first_name, last_name, last_update from pg."public".actor; ]> </metadata> </model> </vdb>
CALL SYSADMIN.refreshMatView(viewname=>'schema.matview', invalidate=>true)
/*+ cache(ttl:3600000) */ select t.col, t1.col from t, t1 where t.id = t1.id
- The automatic ttl refresh may not be suitable for complex loading scenarios as nested materialized views will be used by the refresh query.
- The non-managed ttl refresh is performed lazily, that is it is only trigger by using the table after the ttl has expired. For infrequently used tables with long load times, this means that data may be used well past the intended ttl.
/*+ cache(updatable) */ select t.col, t1.col from t, t1 where t.id = t1.id
CALL SYSADMIN.refreshMatViewRow(viewname=>'schema.matview', key=>5)
- Function based index are supported, but can only be specified through DDL metadata. If you are not using DDL metadata, consider adding another column to the view that projects the function expression, then place an index on that new column. Queries to the view will need to be modified as appropriate though to make use of the new column/index.
- If additional covered columns are needed, they may simply be added to the index columns. This however is only applicable to comparable types. Adding additional columns will increase the amount of space used by the index, but may allow its usage to result in higher performance when only the covered columns are used and the main table is not consulted.
2.6. Code Table Caching
lookup('ISOCountryCodes', 'CountryCode', 'CountryName', 'United States')
- The use of the lookup function automatically performs caching; there is no option to use the lookup function and not perform caching.
- No mechanism is provided to refresh code tables
- Only a single key/return column is cached - values will not be session/user specific.
SELECT (SELECT CountryCode From MatISOCountryCodes WHERE CountryName = tbl.CountryName) as cc FROM tbl
- More control of the possible return columns. Code tables will create a materialized view for each key/value pair. If there are multiple return columns it would be better to have a single materialized view.
- Proper materialized views have built-in system procedure/table support.
- More control via the cache hint.
- The ability to use OPTION NOCACHE.
- There is almost no performance difference.
2.7. Create a Materialized View for Code Table Caching
Procedure 2.1. Create a Materialized View for Code Table Caching
- Create a view selecting the appropriate columns from the desired table. In general, this view may have an arbitrarily complicated transformation query.
- Designate the appropriate column(s) as the primary key. Additional indexes can be added if needed.
- Set the materialized property to true.
- Add a cache hint to the transformation query. To mimic the behavior of the implicit internal materialized view created by the lookup function, use the Hints and Options /*+ cache(pref_mem) */ to indicate that the table data pages should prefer to remain in memory.
Just as with the lookup function, the materialized view table will be created on first use and reused subsequently.
2.8. Programmatic Control
dataModification
(which affects result set caching) or updateMatViewRow
(which affects internal materialization) to alert the Teiid engine that the underlying source data has been modified. These operations, which work cluster wide will invalidate the cache entries appropriately and reload the new cache contents.
Note
EventDistributor
interface in their own code that is deployed in the same JBoss EAP virtual machine using a Pojo/MDB/Session Bean:
public class ChanageDataCapture { public void invalidate() { InitialContext ic = new InitialContext(); EventDistributor ed = ((EventDistributorFactory)ic.lookup("teiid/event-distributor-factory")).getEventDistributor(); // this below line indicates that Customer table in the "model-name" schema has been changed. // this result in cache reload. ed.dataModification("vdb-name", "version", "model-name", "Customer"); } }
Important
Chapter 3. Result Set Caching
3.1. User Query Cache
Properties info = new Properties(); ... info.setProperty("ResultSetCacheMode", "true"); Connection conn = DriverManager.getConnection(url, info);
Note
... PreparedStatement ps = connection.prepareStatement("/*+ cache */ select col from t where col2 = ?"); ps.setInt(1, 5); ps.execute(); ...
/*+ cache(pref_mem ttl:60000) */ select col from t
Important
3.2. Procedure Result Caching
/*+ cache */ BEGIN ... END
3.3. Cache Configuration
Important
3.4. Extension Metadata
vdb.xml
:
<vdb name="vdbname" version="1"> <model name="Customers"> <property name="teiid_rel:data-ttl" value="0"/> ...</para>
3.5. Cache Administration
connectAsAdmin() clearCache("QUERY_SERVICE_RESULT_SET_CACHE") ...
3.6. Caching Limitations
- XML, BLOB, CLOB, and OBJECT type cannot be used as part of the cache key for prepared statement of procedure cache keys.
- The exact SQL string, including the cache hint if present, must match the cached entry for the results to be reused. This allows cache usage to skip parsing and resolving for faster responses.
- Result set caching is transactional by default using the NON_XA transaction mode. To use full XA support, change the configuration to use NON_DURABLE_XA.
- Clearing the results cache clears all cache entries for all VDBs.
3.7. Translator Result Caching
3.8. Cache Hints and Options
- Indicate that a user query is eligible for result set caching and set the cache entry memory preference, time to live and so forth.
- Set the materialized view memory preference, time to live, or updatablity.
- Indicate that a virtual procedure should be cachable and set the cache entry memory preference, time to live and so on
/*+ cache[([pref_mem] [ttl:n] [updatable])] [scope:(session|user|vdb)] */ sql ...
- The cache hint should appear at the beginning of the SQL. It will not have any affect on INSERT/UPDATE/DELETE statements or INSTEAD OF TRIGGERS.
- pref_mem- if present indicates that the cached results should prefer to remain in memory. The results may still be paged out based upon memory pressure.
Important
Care should be taken to not over use the pref_mem option. The memory preference is implemented with Java soft references. While soft references are effective at preventing out of memory conditions. Too much memory held by soft references can limit the effective working memory. Consult your JVM options for clearing soft references if you need to tune their behavior. - ttl:n- if present n indicates the time to live value in milliseconds. The default value for result set caching is the default expiration for the corresponding Infinispan cache. There is no default time to live for materialized views.
- updatable- if present indicates that the cached results can be updated. This defaults to false for materialized views and to true for result set cache entries.
- scope- There are three different cache scopes: session - cached only for current session, user - cached for any session by the current user, vdb - cached for any user connected to the same vdb. For cached queries the presense of the scope overrides the computed scope. Materialized views on the other hand default to the vdb scope. For materialized views explicitly setting the session or user scopes will result in a non-replicated session scoped materialized view.
Note
SELECT * from vg1, vg2, vg3 WHERE … OPTION NOCACHE
SELECT * from vg1, vg2, vg3 WHERE … OPTION NOCACHE vg1, vg3
Appendix A. Revision History
Revision History | |||
---|---|---|---|
Revision 6.4.0-20 | Thu Jun 06 2017 | David Le Sage | |
|