Chapter 11. Troubleshooting Data Grid Servers

Gather diagnostic information about Data Grid server deployments and perform troubleshooting steps to resolve issues.

11.1. Getting Diagnostic Reports for Data Grid Servers

Data Grid servers provide aggregated reports in tar.gz archives that contain diagnostic information about both the Data Grid server and the host. The report provides details about CPU, memory, open files, network sockets and routing, threads, in addition to configuration and log files.

Procedure

  1. Create a CLI connection to Data Grid.
  2. Use the server report command to download a tar.gz archive:

    [//containers/default]> server report
    Downloaded report 'infinispan-<hostname>-<timestamp>-report.tar.gz'
  3. Move the tar.gz file to a suitable location on your filesystem.
  4. Extract the tar.gz file with any archiving tool.

11.2. Changing Data Grid Server Logging Configuration at Runtime

Modify the logging configuration for Data Grid servers at runtime to temporarily adjust logging to troubleshoot issues and perform root cause analysis.

Modifying the logging configuration through the CLI is a runtime-only operation, which means that changes:

  • Are not saved to the log4j2.xml file. Restarting server nodes or the entire cluster resets the logging configuration to the default properties in the log4j2.xml file.
  • Apply only to the nodes in the cluster when you invoke the CLI. Nodes that join the cluster after you change the logging configuration use the default properties.

Procedure

  1. Create a CLI connection to Data Grid.
  2. Use the logging to make the required adjustments.

    • List all appenders defined on the server:
[//containers/default]> logging list-appenders

The preceding command returns:

{
  "STDOUT" : {
    "name" : "STDOUT"
  },
  "JSON-FILE" : {
    "name" : "JSON-FILE"
  },
  "HR-ACCESS-FILE" : {
    "name" : "HR-ACCESS-FILE"
  },
  "FILE" : {
    "name" : "FILE"
  },
  "REST-ACCESS-FILE" : {
    "name" : "REST-ACCESS-FILE"
  }
}
  • List all logger configurations defined on the server:
[//containers/default]> logging list-loggers

The preceding command returns:

[ {
  "name" : "",
  "level" : "INFO",
  "appenders" : [ "STDOUT", "FILE" ]
}, {
  "name" : "org.infinispan.HOTROD_ACCESS_LOG",
  "level" : "INFO",
  "appenders" : [ "HR-ACCESS-FILE" ]
}, {
  "name" : "com.arjuna",
  "level" : "WARN",
  "appenders" : [ ]
}, {
  "name" : "org.infinispan.REST_ACCESS_LOG",
  "level" : "INFO",
  "appenders" : [ "REST-ACCESS-FILE" ]
} ]
  • Add and modify logger configurations with the set subcommand

For example, the following command sets the logging level for the org.infinispan package to DEBUG:

[//containers/default]> logging set --level=DEBUG org.infinispan
  • Remove existing logger configurations with the remove subcommand.

For example, the following command removes the org.infinispan logger configuration, which means the root configuration is used instead:

[//containers/default]> logging remove org.infinispan

11.3. Resource Statistics

You can inspect server-collected statistics for some of the resources within a Data Grid server using the stats command.

Use the stats command either from the context of a resource which collects statistics (containers, caches) or with a path to such a resource:

[//containers/default]> stats
{
  "statistics_enabled" : true,
  "number_of_entries" : 0,
  "hit_ratio" : 0.0,
  "read_write_ratio" : 0.0,
  "time_since_start" : 0,
  "time_since_reset" : 49,
  "current_number_of_entries" : 0,
  "current_number_of_entries_in_memory" : 0,
  "total_number_of_entries" : 0,
  "off_heap_memory_used" : 0,
  "data_memory_used" : 0,
  "stores" : 0,
  "retrievals" : 0,
  "hits" : 0,
  "misses" : 0,
  "remove_hits" : 0,
  "remove_misses" : 0,
  "evictions" : 0,
  "average_read_time" : 0,
  "average_read_time_nanos" : 0,
  "average_write_time" : 0,
  "average_write_time_nanos" : 0,
  "average_remove_time" : 0,
  "average_remove_time_nanos" : 0,
  "required_minimum_number_of_nodes" : -1
}
[//containers/default]> stats /containers/default/caches/mycache
{
  "time_since_start" : -1,
  "time_since_reset" : -1,
  "current_number_of_entries" : -1,
  "current_number_of_entries_in_memory" : -1,
  "total_number_of_entries" : -1,
  "off_heap_memory_used" : -1,
  "data_memory_used" : -1,
  "stores" : -1,
  "retrievals" : -1,
  "hits" : -1,
  "misses" : -1,
  "remove_hits" : -1,
  "remove_misses" : -1,
  "evictions" : -1,
  "average_read_time" : -1,
  "average_read_time_nanos" : -1,
  "average_write_time" : -1,
  "average_write_time_nanos" : -1,
  "average_remove_time" : -1,
  "average_remove_time_nanos" : -1,
  "required_minimum_number_of_nodes" : -1
}