Red Hat Training

A Red Hat training course is available for Red Hat JBoss Data Virtualization

Chapter 3. Platform requirements

3.1. Evaluating your architecture and your needs

Minimum sizing recommendations
The following minimum requirements should be thought of as a starting point. These should be adjusted based on expected usage.
JBDS (Teiid Designer) – without application server
  • 2 GB RAM will get you started, but more is needed for large models
  • Modern Processor
  • 500 MB disk space for installed product files
  • 2+GB for model projects and related artifacts
The goal of the following sizing recommendations will be to provide a starting point (minimum size) for server.
The following is a minimum recommdendation, a starting point. It should also be used when no client information can be obtained in order to make sizing recommendations.
The minimum sizing for the DV server is:
  • 16 GB JVM memory size
  • Modern multi-core (dual or better) processor or multi-socket system with modern multi-core processors
  • 20+ GB Disk Space that's needed for JBoss server product and DV components:
  • 1 GB disk for installed product files
  • 5+ GB for log files and deployed artifacts
  • 15 GB (default) for BufferManager maxBufferSpace
  • If Modeshape (repository) will be used, will need to bump up the file space by a minimum 5GB more.
There are three considerations that will be used to help determine the minimal JVM footprint; concurrency, data volume and plan processing.
  • Concurrency – this takes into account max sessions, the transport thread pool, the engine thread pool / engine (especially max active) settings and connection pool sizes.
  • Data Volume – this considers the amount of data read from the data source(s) based on the batch sizes. The default processor batch size is 256 with a target of ~2k bytes per row, flowing thru the system at the size of ~512kb. However, it is recommended on machines with more memory to increase the batch size to 512, making it ~1mb per batch in size.
  • Plan Processing – this considers the additional processing on the data that will be done based on the query plan. This will generally require additional memory (i.e., sorting).
The following are the assumptions that will be used in determining size:
  • the server is tuned (i.e., thread pools, connection pools, etc) so that each query will execute without waiting or maximum through-put.
  • there will be 1 source query per datasource in the plan (more complex queries will increase the need for more memory)
  • no other apps are running in the same JVM as Teiid (if other apps will be running in the same JVM, then the additional memory requirements will need to be accounted for)
  • executing straight reads, non-transactional (Teiid does a proactive batch fetch which increases memory requirement, this is why batch types is doubled)
  • The default processor batch size is configured at 512 (changed from default of 256), which is recommended on machines with more memory to reduce batch overhead.
This is the formula to estimate minimum JVM size: (concurrent queries) * (4 * batch bytes) + (2 * (#source queries per plan * approximate source bytes))) + overhead where:
  • concurrent queries
  • batch bytes - represent the batches flowing thru the system, and using the default of 256 batch size, the byte size of that is ~512kb. However, using the recommended batch size of 512 would mean ~1mb per batch. The doubling of the batch bytes is to account for the effect of storing a batch on the work item in case a partial batch is retrieved while another batch is in process.
  • 4 – represents 1) the doubling of batch bytes and 2) plan operations – where this is the number additional copies of the batches that are needed to fulfill plan operations (i.e., sort, joining, etc.). 2 will be used.
  • source queries per plan – number of data sources in the query, but limited based on the assumptions.
  • approximate source bytes – ( ( (32bit?5:7) + (4 * avg raw bytes per column) ) * #columns), using 10 as an average raw bytes per column and using a 64-bit machine
  • overhead – this would include the adjustment for AS (~300mb), additional Teiid overhead (caching, plans, etc.), connection pool overhead. But only ~300mb will be used, because the others are harder to figure, but know your server will need to take this into account for better performance.
The refined formula that will be used is:
  • (concurrent queries) * (4 * batched bytes) + (2 * source bytes) * #source queries + 300mb
  • (concurrent queries) * (4 * 1mb) + (2 * 512kb) * #source queries + 300mb
  • (concurrent queries) * (4mb) + (1mb) * #source queries + 300mb
  • #concurrency * (5mb) * #source queries + 300mb

Table 3.1. Table

concurrency # source queries
100
200
2
1.3 gb
2.3 gb
5
2.8 gb
5.3 gb
10
5.3 gb
10.3 gb
Based on the max concurrent queries, start with the following to tune the Teiid engine:
  • set maxActivePlans to the max concurrent queries
  • set maxThreads = maxActivePlans * 2 (if transactions will be used, then * 3)
  • set each datasource max pool size = max conncurrent source queries (minimum would be max conncurrent queries, but if majority of queries are complex in which there are subqueries that cause multiple source queries to be spawned, then max pool size should be increased accordingly),
  • After the above adjustments are done and the server has memory room, then consider increasing the processBatchSize and connectorBatchSize (i.e., 512, 1024, respectfully) to increase through-put from the data sourse and thru the engine. If you're out of memory, then increase the JVM size. A machines that has less than 6GB memory, stick with 512, larger machines use higher sizes.

Important

If maxThreads is more than 5 to 1 to maxActivePlans, then consider making adjustments. We've seen 10 to 1 cause the server to throttle down processing and the initial recommendations above are 2 or 3 to 1.