21.8. Structured Logging with Rsyslog

On systems that produce large amounts of log data, it can be convenient to maintain log messages in a structured format. With structured messages, it is easier to search for particular information, to produce statistics and to cope with changes and inconsistencies in message structure. Rsyslog uses the JSON (JavaScript Object Notation) format to provide structure for log messages.
Compare the following unstructured log message:
Oct 25 10:20:37 localhost anacron[1395]: Jobs will be executed sequentially
with a structured one:
{"timestamp":"2013-10-25T10:20:37", "host":"localhost", "program":"anacron", "pid":"1395", "msg":"Jobs will be executed sequentially"}
Searching structured data with use of key-value pairs is faster and more precise than searching text files with regular expressions. The structure also lets you to search for the same entry in messages produced by various applications. Also, JSON files can be stored in a document database such as MongoDB, which provides additional performance and analysis capabilities. On the other hand, a structured message requires more disk space than the unstructured one.
In rsyslog, log messages with meta data are pulled from Journal with use of the imjournal module. With the mmjsonparse module, you can parse data imported from Journal and from other sources and process them further, for example as a database output. For parsing to be successful, mmjsonparse requires input messages to be structured in a way that is defined by the Lumberjack project.
The Lumberjack project aims to add structured logging to rsyslog in a backward-compatible way. To identify a structured message, Lumberjack specifies the @cee: string that prepends the actual JSON structure. Also, Lumberjack defines the list of standard field names that should be used for entities in the JSON string. For more information on Lumberjack, see the section called “Online Documentation”.
The following is an example of a lumberjack-formatted message:
 @cee: {"pid":17055, "uid":1000, "gid":1000, "appname":"logger", "msg":"Message text."} 
To build this structure inside Rsyslog, a template is used, see Section 21.8.2, “Filtering Structured Messages”. Applications and servers can employ the libumberlog library to generate messages in the lumberjack-compliant form. For more information on libumberlog, see the section called “Online Documentation”.

21.8.1. Importing Data from Journal

The imjournal module is Rsyslog's input module to natively read the journal files (see Section 21.7, “Interaction of Rsyslog and Journal”). Journal messages are then logged in text format as other rsyslog messages. However, with further processing, it is possible to translate meta data provided by Journal into a structured message.
To import data from Journal to Rsyslog, use the following configuration in /etc/rsyslog.conf:
        
module(load=”imjournal”
    PersistStateInterval=”number_of_messages”
    StateFile=”path”
    ratelimit.interval=”seconds”
    ratelimit.burst=”burst_number”
    IgnorePreviousMessages=”off/on”)
  • With number_of_messages, you can specify how often the journal data must be saved. This will happen each time the specified number of messages is reached.
  • Replace path with a path to the state file. This file tracks the journal entry that was the last one processed.
  • With seconds, you set the length of the rate limit interval. The number of messages processed during this interval can not exceed the value specified in burst_number. The default setting is 20,000 messages per 600 seconds. Rsyslog discards messages that come after the maximum burst within the time frame specified.
  • With IgnorePreviousMessages you can ignore messages that are currently in Journal and import only new messages, which is used when there is no state file specified. The default setting is off. Please note that if this setting is off and there is no state file, all messages in the Journal are processed, even if they were already processed in a previous rsyslog session.

Note

You can use imjournal simultaneously with imuxsock module that is the traditional system log input. However, to avoid message duplication, you must prevent imuxsock from reading the Journal's system socket. To do so, use the SysSock.Use directive:
         
module(load”imjournal”)
module(load”imuxsock”
    SysSock.Use=”off”
    Socket="/run/systemd/journal/syslog")
You can translate all data and meta data stored by Journal into structured messages. Some of these meta data entries are listed in Example 21.19, “Verbose journalctl Output”, for a complete list of journal fields see the systemd.journal-fields(7) manual page. For example, it is possible to focus on kernel journal fields, that are used by messages originating in the kernel.

21.8.2. Filtering Structured Messages

To create a lumberjack-formatted message that is required by rsyslog's parsing module, use the following template:
template(name="CEETemplate" type="string" string="%TIMESTAMP% %HOSTNAME% %syslogtag% @cee: %$!all-json%\n")
This template prepends the @cee: string to the JSON string and can be applied, for example, when creating an output file with omfile module. To access JSON field names, use the $! prefix. For example, the following filter condition searches for messages with specific hostname and UID:
($!hostname == "hostname" && $!UID== "UID")

21.8.3. Parsing JSON

The mmjsonparse module is used for parsing structured messages. These messages can come from Journal or from other input sources, and must be formatted in a way defined by the Lumberjack project. These messages are identified by the presence of the @cee: string. Then, mmjsonparse checks if the JSON structure is valid and then the message is parsed.
To parse lumberjack-formatted JSON messages with mmjsonparse, use the following configuration in the /etc/rsyslog.conf:
module(load”mmjsonparse”)

*.* :mmjsonparse:
In this example, the mmjsonparse module is loaded on the first line, then all messages are forwarded to it. Currently, there are no configuration parameters available for mmjsonparse.

21.8.4. Storing Messages in the MongoDB

Rsyslog supports storing JSON logs in the MongoDB document database through the ommongodb output module.
To forward log messages into MongoDB, use the following syntax in the /etc/rsyslog.conf (configuration parameters for ommongodb are available only in the new configuration format; see Section 21.3, “Using the New Configuration Format”):
module(load”ommongodb”)

*.* action(type="ommongodb" server="DB_server" serverport="port" db="DB_name" collection="collection_name" uid="UID" pwd="password")
  • Replace DB_server with the name or address of the MongoDB server. Specify port to select a non-standard port from the MongoDB server. The default port value is 0 and usually there is no need to change this parameter.
  • With DB_name, you identify to which database on the MongoDB server you want to direct the output. Replace collection_name with the name of a collection in this database. In MongoDB, collection is a group of documents, the equivalent of an RDBMS table.
  • You can set your login details by replacing UID and password.
You can shape the form of the final database output with use of templates. By default, rsyslog uses a template based on standard lumberjack field names.