How to keep the database tables from growing infinitely in RHPAM/BPM Suite?

Solution Unverified - Updated -

Environment

  • Red Hat JBoss BRMS (BRMS)
    • 5.3.x
  • Red Hat JBoss BPM Suite (BPMS)
    • 6.x
  • Red Hat Process Automation Manager (RHPAM)
    • 7.x

Issue

  • We are developing an application which creates a very high number of process instances. We need to ensure that the tables where jBPM 5 stores the runtime information are not getting filled by information pertaining to ended process instances. How can this be done?
  • Are there any database clean-up tasks available in jBPM 5?
  • How to enable history logging?
  • What is the significance of SessionInfo table of jBPM5 schema?
  • Would like to clean up SessionInfo table at regular intervals of time. Is it advisable and would it have any impact?
  • Is there any API available to clean up the records from tables like TASK, WORKITEMINFO, PROCESSINSTANCEINFO, PROCESSINSTANCELOG, VARIABLEINSTANCELOG, NODEINSTANCELOG ? If not, then what strategies could be taken to clean up the unwanted/old records manually from these tables?

Resolution

RHPAM 7 / BPM Suite 6

Use LogCleanupCommand. Red Hat BPM Suite - Administration and Configuration Guide - 19.4. Managing log files documentation describes what is needed to prevent unlimited table growth in BPM Suite 6. Note that record deletion by manual SQL is not supported.

BRMS 5.3

  • For the binary persistence of the process instance execution model, the following approaches need to be followed to prevent table growth:

    • Do not enable db logger to persist process instance events
    • Enable db logger, but clear the log tables periodically to keep them from growing too large. To clean the log tables, the JPAProcessInstanceDbLog class offers the method clear(), otherwise the data could be purged from the database directly.
  • For the human task related tables, the following approaches could be followed to prevent table growth:

    • Create a listener similar to TaskCleanUpProcessEventListener1 - due to signed jars which are shipped with the platform, use a different package name. This listener will be included in an upcoming BRMS version.
    • Use the TasksAdmin2 API to periodically remove ended tasks, by calling it's removeTasks method.
    • Delete the records directly in the database - the TASK table has a status field which could be used to query for tasks which could be removed. It is needed to create a script that takes care of properly handling the entries in the referenced tables.

      • SessionInfo table contains all the persistent sessions. As soon as a persistent-knowledge session is created using the JPAKnowledgeService helper class, a new row is inserted inside this SessionInfo table. Please note that this table not only contains the sessions having active process instances but all the sessions which have been created over time (unless records have not been explicitly deleted).

      • Records from the SessionInfo table can be manually removed from the database itself since there is no API for doing that. But keep in mind couple of factors before doing that.

        • As per the business need people may want to keep sessions with a STARTDATE column value which is less than [(current date) - x days] . Unless they have timers, rules data, globals etc. More details could be found in Kris's comment in JBPM-3582 JIRA3.
        • The other factor to be kept under consideration could be, when are the ksessions being reloaded from the database. If it is also done from within the code, then obviously, when ksessions are being reloaded needs to be taken into account.
  • There are no direct API available for removing old records from most of the following tables. However, here are a few of the strategies which can be considered (still it might differ as per business need) while removing records from some of the jBPM 5/Human Task related tables:

    1. TASK table

    When a process instance is aborted the tasks are put into the Exited status but they are not cleaned up in the DB. There have also been cases where tasks with status = "Completed" are orphaned (= process instance of TASK.PROCESSINSTANCEID finished and outdated like LOG tables below). For these scenarios jBPM 5.2 contains a org.jbpm.task.admin.TaskAdmin class [2][3] that has a public int removeTasks(List<TaskSummary> tasks); method which can delete tasks. This is the easiest way to go about deleting tasks.

    2. WORKITEMINFO table

    There aren't any direct API available for removing these records but users can clean up the records using this approach manually:

    When CREATIONDATE column value < (current date - x days) and the
    PROCESSINSTANCEID value does not exist in the PROCESSINSTANCEINFO table
    

    Please note the relationship to the process instance is a determining factor here.

    3. PROCESSINSTANCEINFO table

    Again for this as well there aren't any direct API but manually deleting records can be done. However, if there process instances in limbo that have state 1 (active), then before manually removing those records users need to determine whether or not records that should be deleted is the age (perhaps looking at both the lastmodificationdate and the lastreaddate).

    4. PROCESSINSTANCELOG table

    No direct API available but manually it can done by the following strategy

    When END_DATE column value is < (current date - x days)
    

    5. VARIABLEINSTANCELOG table

    No direct API available but manually it can done by the following strategy

    When LOG_DATE column value is < (current date - x days) and the
    PROCESSINSTANCEID value does not exist in the PROCESSINSTANCELOG table
    

    6. NODEINSTANCELOG table

    No direct API available but manually it can done by the following strategy.

    When LOG_DATE column value is < (current date - x days) and the
    PROCESSINSTANCEID value does not exist in the PROCESSINSTANCELOG table
    

Root Cause

  • By default, jBPM 5 only keeps records about active process instances in its tables. For instance, after starting a process instance, a new record will be inserted into the ProcessInstanceInfo table. After ending the process instance, the record in the ProcessInstanceInfo table will be deleted again.

  • Only if you have instructed jBPM 5 to create a database history log, by registering JPAWorkingMemoryDbLogger, then additional records are created during process execution in the ProcessInstanceLog, NodeInstanceLog and VariableInstanceLog tables:

    StatefulKnowledgeSession ksession = ...;
    JPAWorkingMemoryDbLogger logger = new JPAWorkingMemoryDbLogger(ksession);
    
    // invoke methods one your session here
    
    logger.dispose();
    

    Note: These log records are not cleaned automatically when the process instance ends.

  • Refers to the documentation links provided to get information about the DB schema, both for the binary persistence of the process instance execution model1, as well as the history log2.

  • While it is true that jBPM 5 does not keep information about ended process instances, except for the log tables, this is not the case for the human task component. Therefore, if there are processes which use human tasks, the human task related tables will grow in size. For an overview of these tables, please refer to the newly added chapter in the jBPM 5.3 User Guide3.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments