Chapter 1. Data sets authoring
A data set is a collection of related sets of information. It can be stored in many ways, such as in a database, in a Microsoft Excel file, in memory, and so on. A data set definition instructs Decision Central how to access, read, and parse a data set. Decision Central does not store data. It enables you to define access to a data set regardless of where the data is stored.
For example, if data is stored in a database, a valid data set could contain the entire database or a subset of the database as a result of an SQL query. In both cases, the data is used as input for the reporting components of Decision Central which then displays the information.
To access a data set, you must create and register a data set definition, which will define where the data set is stored, how it can be accessed, read, and parsed, and what columns it contains.
The Data Sets page is visible only to users with admin role.
1.1. Adding data sets
You can create a new data set to fetch data from an external data source and use that data for the reporting components.
- Log in to Decision Central and click the gear icon.
- Click Data Sets → Data Set Explorer → New Data Set.
Select one of the following the provider types:
- Bean: Use to generate a data set from a Java class
- CSV: Use to generate a data set from a remote or local CSV file
- SQL: Use to generate a data set from an ANSI-SQL compliant database
- Elastic Search: Use to generate a data set from Elastic Search nodes
Execution Server: Use to generate a data set using the custom query feature of an Execution ServerNote
KIE Server must be configured with this option.
Complete the Data Set Creation Wizard and click Test.Note
Depending on the provider you chose, the configuration steps will differ.
- Click Save.
1.2. Editing data sets
You can edit existing data sets to ensure that the data fetched to the reporting components is up-to-date.
- Log in to Decision Central, click the gear icon, and then click Data Sets.
- In the Data Set Explorer pane, search for the data set you want to edit and click Edit.
In the Data Set Editor pane, use the appropriate tab to edit the data as required. The tabs will differ based on the data set provider type you chose.
For example, the following changes are applicable for editing a CSV data provider.
- CSV Configuration: Enables you to change the name of the data set definition, the source file, the separator, and other properties.
Preview: Enables you to preview the data. After you click Test in the CSV Configuration tab, the system executes the data set lookup call and if the data is available, a preview appears. Note that the Preview tab has two sub-tabs:
- Data columns: Enables you to specify what columns are part of your data set definition.
- Filter: Enables you to add a new filter.
Advanced: Enables you to manage:
- After making the required changes, click Validate.
- Click Save.
1.3. Data refresh
The data refresh feature enables you to specify an interval of time after which a data set (or data) is refreshed. The Refresh on stale data feature refreshes the cached data when the back-end data changes.
Decision Central provides caching mechanisms for storing data sets and performing data operations using in-memory data. Caching data reduces network traffic, remote system payload, and processing time. To avoid performance issues, configure the cache settings in Decision Central.
For any data lookup call that result in a data set, the caching technique will determine where the data lookup call is executed and where the resulting data set is stored. An example of a data lookup call would be all the mortgage applications whose locale parameter is set as "Urban".
Decision Central data set functionality provides two cache levels:
- Client level
- Back end level
When the cache is turned on, the data set is cached in a web browser during the lookup operation and further lookup operations do not perform requests to the back end. Data set operations like grouping, aggregations, filtering, and sorting are processed in the web browser. Enable client caching only if the data set size is small, for example, for data sets with less than 10 mb size. For large data sets, browser issues such as slow performance or intermittent freezing can occur. Client caching reduces the number of back end requests including requests to the storage system.
Back end cache
When the cache is enabled, the decision engine caches the data set. This reduces the number of requests to the remote storage system. All data set operations are performed in the decision engine using in-memory data. Enable back-end caching only if the data set size is not updated frequently and it can be stored and processed in memory. Using back-end caching is also useful in cases with low latency connectivity issues with the remote storage.
Back end cache settings are not always visible in the Advanced tab of the Data Set Editor because Java and CSV data providers rely on back end caching (data set must be in the memory) in order to resolve any data lookup operation using the in-memory decision engine.