Chapter 16. Execution error management

When an execution error occurs for a business process, the process stops and reverts to the most recent stable state (the closest safe point) and continues its execution. If an error of any kind is not handled by the process the entire transaction rolls back, leaving the process instance in the previous wait state. Any trace of this is only visible in the logs, and usually displayed to the caller who sent the request to the process engine.

Users with process administrator (process-admin) or administrator (admin) roles are able to access error messages in Business Central. Execution error messaging provides the following primary benefits:

  • Better traceability
  • Visibility in case of critical processes
  • Reporting and analytics based on error situations
  • External system error handling and compensation

Configurable error handling is responsible for receiving any technical errors thrown throughout the process engine execution (including task service). The following technical exceptions apply:

  • Anything that extends java.lang.Throwable
  • Process level error handling and any other exceptions not previously handled

There are several components that make up the error handling mechanism and allow a pluggable approach to extend its capabilities.

The process engine entry point for error handling is the ExecutionErrorManager. This is integrated with RuntimeManager, which is then responsible for providing it to the underlying KieSession and TaskService.

From an API point of view, ExecutionErrorManager provides access to the following components:

  • ExecutionErrorHandler: The primary mechanism for error handling
  • ExecutionErrorStorage: Pluggable storage for execution error information

16.1. Managing execution errors

By definition, every process error that is detected and stored is unacknowledged and must be handled by someone or something (in case of automatic error recovery). Errors are filtered on the basis of whether or not they have been acknowledged. Acknowledging an error saves the user information and time stamp for traceability. You can access the Error Management view at any time.

Procedure

  1. In Business Central, go to MenuManageExecution Errors.
  2. Select an error from the list to open the Details tab. This displays information about the error or errors.
  3. Click the Acknowledge button to acknowledge and clear the error. You can view the error later by selecting Yes on the Acknowledged filter in the Manage Execution Errors page.

    If the error was related to a task, a Go to Task button is displayed.

  4. Click the Go to Task button, if applicable, to view the associated job information in the Manage Tasks page.

    In the Manage Tasks page, you can restart, reschedule, or retry the corresponding task.

16.2. The ExecutionErrorHandler

The ExecutionErrorHandler is the primary mechanism for all process error handling. It is bound to the life cycle of RuntimeEngine; meaning it is created when a new runtime engine is created, and is destroyed when RuntimeEngine is disposed. A single instance of the ExecutionErrorHandler is used within a given execution context or transaction. Both KieSession and TaskService use that instance to inform the error handling about processed nodes/tasks. ExecutionErrorHandler is informed about:

  • Starting of processing of a given node instance.
  • Completion of processing of a given node instance.
  • Starting of processing of a given task instance.
  • Completion of processing of a given task instance.

This information is mainly used for errors that are of unknown type; that is, errors that do not provide information about the process context. For example, upon commit time, database exceptions do not carry any process information.

16.3. Execution error storage

ExecutionErrorStorage is a pluggable strategy that permits various ways of persisting information about execution errors. Storage is used directly by the handler that gets an instance of the store when it is created (when RuntimeEngine is created). Default storage implementation is based on the database table, which stores every error and includes all of the available information. Some errors may not contain details, as this depends on the type of error and whether or not it is possible to extract specific information.

16.4. Error types and filters

Error handling attempts to catch and handle any kind of error, therefore it needs a way to categorize errors. By doing this, it is able to properly extract information from the error and make it pluggable, as some users may require specific types of errors to be thrown and handled in different ways than what is provided by default.

Error categorization and filtering is based on ExecutionErrorFilters. This interface is solely responsible for building instances of ExecutionError, which are later stored by way of the ExecutionErrorStorage strategy. It has following methods:

  • accept: indicates if given error can be handled by the filter.
  • filter: where the actual filtering, handling, and so on happens.
  • getPriority: indicates the priority that is used when calling filters.

Filters process one error at a time and use a priority system to avoid having multiple filters returning alternative “views” of the same error. Priority enables more specialized filters to see if the error can be accepted, or otherwise allow another filter to handle it.

ExecutionErrorFilter can be provided using the ServiceLoader mechanism, which enables the capability of error handling to be easily extended.

Red Hat Process Automation Manager ships with the following ExecutionErrorFilters :

Table 16.1. ExecutionErrorFilters

Class nameTypePriority

org.jbpm.runtime.manager.impl.error.filters.ProcessExecutionErrorFilter

Process

100

org.jbpm.runtime.manager.impl.error.filters.TaskExecutionErrorFilter

Task

80

org.jbpm.runtime.manager.impl.error.filters.DBExecutionErrorFilter

DB

200

org.jbpm.executor.impl.error.JobExecutionErrorFilter

Job

100

Filters are given a higher execution order based on the lowest value of the priority. In above table, filters are invoked in following order:

  1. Task
  2. Process
  3. Job
  4. DB

16.5. Auto acknowledging execution errors

When executions errors occur they are unacknowledged by default, and require a manual acknowledgment to be performed otherwise they are always seen as information that requires attention. In case of larger volumes, manual actions can be time consuming and not suitable in some situations.

Auto acknowledgment resolves this issue. It is based on scheduled jobs by way of the jbpm-executor, with the following three types of jobs available:

org.jbpm.executor.commands.error.JobAutoAckErrorCommand
Responsible for finding jobs that previously failed but now are either canceled, completed, or rescheduled for another execution. This job only acknowledges execution errors of type Job.
org.jbpm.executor.commands.error.TaskAutoAckErrorCommand
Responsible for auto acknowledgment of user task execution errors for tasks that previously failed but now are in one of the exit states (completed, failed, exited, obsolete). This job only acknowledges execution errors of type Task.
org.jbpm.executor.commands.error.ProcessAutoAckErrorCommand
Responsible for auto acknowledgment of process instances that have errors attached. It acknowledges errors where the process instance is already finished (completed or aborted), or the task that the error originated from is already finished. This is based on init_activity_id value. This job acknowledges any type of execution error that matches the above criteria.

Jobs can be registered on the Process Server. In Business Central you can configure auto acknowledge jobs for errors:

Prerequisite

  1. Execution errors of one or more type have accumulated during processes execution but require no further attention.

Procedure

  1. In Business Central, click MenuManageJobs.
  2. In the top right of the screen, click New Job.
  3. Type the process correlation key into the Business Key field.
  4. In the Type field, add type of the auto acknowledge job type from the list above.
  5. Select a Due On time for the job to be completed:

    1. To run the job immediately, select the Run now option.
    2. To run the job at a specific time, select Run later. A date and time field appears next to the Run later option. Click the field to open the calendar and schedule a specific time and date for the job.

      auto acknowledge error job1
  6. Click Create to create the job and return to the Manage Jobs page.

The following steps are optional, and allow you to configure auto acknowledge jobs to run either once (SingleRun), on specific time intervals (NextRun), or using the custom name of an entity manager factory to search for jobs to acknowledge (EmfName).

  1. Click on the Advanced tab.
  2. Click the Add Parameter button.
  3. Enter the configuration parameter you want to apply to the job:

    1. SingleRun: true or false
    2. NextRun: time expression, such as 2h, 5d, 1m, and so on.
    3. EmfName: custom entity manager factory name.

      auto acknowledge error job2

16.6. Cleaning up the error list

The ExecutionErrorInfo error list table can be cleaned up to remove redundant information. Depending on the life cycle of the process, errors may remain in the list for some time, and there is no direct API with which to clean up the list. Instead, the ExecutionErrorCleanupCommand command can be scheduled to periodically clean up errors.

The following parameters can be set for the clean up command. The command is restricted to deleting execution errors of already completed or aborted process instances:

  • DateFormat

    • Date format for further date related parameters - if not given yyyy-MM-dd is used (pattern of SimpleDateFormat class).
  • EmfName

    • Name of the entity manager factory to be used for queries (valid persistence unit name).
  • SingleRun

    • Indicates if execution should be single run only (true|false).
  • NextRun

    • Provides next execution time (valid time expression, for example: 1d, 5h, and so on)
  • OlderThan

    • Indicates what errors should be deleted - older than given date.
  • OlderThanPeriod

    • Indicated what errors should be deleted older than given time expression (valid time expression e.g. 1d, 5h, and so on)
  • ForProcess

    • Indicates errors to be deleted only for given process definition.
  • ForProcessInstance

    • Indicates errors to be deleted only for given process instance.
  • ForDeployment

    • Indicates errors to be deleted that are from given deployment ID.