Chapter 4. Troubleshooting Monitors

This chapter contains information on how to fix the most common errors related to the Ceph Monitors.

Before You Start

4.1. The Most Common Error Messages Related to Monitors

The following tables list the most common error messages that are returned by the ceph health detail command, or included in the Ceph logs. The tables provide links to corresponding sections that explain the errors and point to specific procedures to fix the problems.

Table 4.1. Error Messages Related to Monitors

Error messageSee

HEALTH_WARN

mon.X is down (out of quorum)

Section 4.1.1, “A Monitor Is Out of Quorum”

clock skew

Section 4.1.2, “Clock Skew”

store is getting too big!

Section 4.1.3, “The Monitor Store is Getting Too Big”

Table 4.2. Common Error Messages in Ceph Logs Related to Monitors

Error messageLog fileSee

clock skew

Main cluster log

Section 4.1.2, “Clock Skew”

clocks not synchronized

Main cluster log

Section 4.1.2, “Clock Skew”

Corruption: error in middle of record

Monitor log

Section 4.1.1, “A Monitor Is Out of Quorum”

Section 4.3, “Recovering the Monitor Store”

Corruption: 1 missing files

Monitor log

Section 4.1.1, “A Monitor Is Out of Quorum”

Section 4.3, “Recovering the Monitor Store”

Caught signal (Bus error)

Monitor log

Section 4.1.1, “A Monitor Is Out of Quorum”

4.1.1. A Monitor Is Out of Quorum

One or more Monitors are marked as down but the other Monitors are still able to form a quorum. In addition, the ceph health detail command returns an error message similar to the following one:

HEALTH_WARN 1 mons down, quorum 1,2 mon.b,mon.c
mon.a (rank 0) addr 127.0.0.1:6789/0 is down (out of quorum)
What This Means

Ceph marks a Monitor as down due to various reasons.

If the ceph-mon daemon is not running, it might have a corrupted store or some other error is preventing the daemon from starting. Also, the /var/ partition might be full. As a consequence, ceph-mon is not able to perform any operations to the store located by default at /var/lib/ceph/mon-<short-host-name>/store.db and terminates.

If the ceph-mon daemon is running but the Monitor is out of quorum and marked as down, the cause of the problem depends on the Monitor state:

  • If the Monitor is in the probing state longer than expected, it cannot find the other Monitors. This problem can be caused by networking issues, or the Monitor can have an outdated Monitor map (monmap) and be trying to reach the other Monitors on incorrect IP addresses. Alternatively, if the monmap is up-to-date, Monitor’s clock might not be synchronized.
  • If the Monitor is in the electing state longer than expected, the Monitor’s clock might not be synchronized.
  • If the Monitor changes its state from synchronizing to electing and back, the cluster state is advancing. This means that it is generating new maps faster than the synchronization process can handle.
  • If the Monitor marks itself as the leader or a peon, then it believes to be in a quorum, while the remaining cluster is sure that it is not. This problem can be caused by failed clock synchronization.
To Troubleshoot This Problem
  1. Verify that the ceph-mon daemon is running. If not, start it:

    systemctl status ceph-mon@<host-name>
    systemctl start ceph-mon@<host-name>

    Replace <host-name> with the short name of the host where the daemon is running. Use the hostname -s command when unsure.

  2. If you are not able to start ceph-mon, follow the steps in The ceph-mon Daemon Cannot Start.
  3. If you are able to start the ceph-mon daemon but is is marked as down, follow the steps in The ceph-mon Daemon Is Running, but Still Marked as down.
The ceph-mon Daemon Cannot Start
  1. Check the corresponding Monitor log, by default located at /var/log/ceph/ceph-mon.<host-name>.log.
  2. If the log contains error messages similar to the following ones, the Monitor might have a corrupted store.

    Corruption: error in middle of record
    Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/mon.0/store.db/1234567.ldb

    To fix this problem, replace the Monitor. See Section 4.4, “Replacing a Failed Monitor”.

  3. If the log contains an error message similar to the following one, the /var/ partition might be full. Delete any unnecessary data from /var/.

    Caught signal (Bus error)
    Important

    Do not delete any data from the Monitor directory manually. Instead, use the ceph-monstore-tool to compact it. See Section 4.5, “Compacting the Monitor Store” for details.

  4. If you see any other error messages, open a support ticket. See Chapter 10, Contacting Red Hat Support Service for details.
The ceph-mon Daemon Is Running, but Still Marked as down
  1. From the Monitor host that is out of the quorum, use the mon_status command to check its state:

    ceph daemon <id> mon_status

    Replace <id> with the ID of the Monitor, for example:

    # ceph daemon mon.a mon_status
  2. If the status is probing, verify the locations of the other Monitors in the mon_status output.

    1. If the addresses are incorrect, the Monitor has incorrect Monitor map (monmap). To fix this problem, see Section 4.2, “Injecting a Monitor Map”.
    2. If the addresses are correct, verify that the Monitor clocks are synchronized. See Section 4.1.2, “Clock Skew” for details. In addition, troubleshoot any networking issues, see Chapter 3, Troubleshooting Networking Issues.
  3. If the status is electing, verify that the Monitor clocks are synchronized. See Section 4.1.2, “Clock Skew”.
  4. If the status changes from electing to synchronizing, open a support ticket. See Chapter 10, Contacting Red Hat Support Service for details.
  5. If the Monitor is the leader or a peon, verify that the Monitor clocks are synchronized. See Section 4.1.2, “Clock Skew”. Open a support ticket if synchronizing the clocks does not solve the problem. See Chapter 10, Contacting Red Hat Support Service for details.
See Also

4.1.2. Clock Skew

A Ceph Monitor is out of quorum, and the ceph health detail command output contains error messages similar to these:

mon.a (rank 0) addr 127.0.0.1:6789/0 is down (out of quorum)
mon.a addr 127.0.0.1:6789/0 clock skew 0.08235s > max 0.05s (latency 0.0045s)

In addition, Ceph logs contain error messages similar to these:

2015-06-04 07:28:32.035795 7f806062e700 0 log [WRN] : mon.a 127.0.0.1:6789/0 clock skew 0.14s > max 0.05s
2015-06-04 04:31:25.773235 7f4997663700 0 log [WRN] : message from mon.1 was stamped 0.186257s in the future, clocks not synchronized
What This Means

The clock skew error message indicates that Monitors' clocks are not synchronized. Clock synchronization is important because Monitors depend on time precision and behave unpredictably if their clocks are not synchronized.

The mon_clock_drift_allowed parameter determines what disparity between the clocks is tolerated. By default, this parameter is set to 0.05 seconds.

Important

Do not change the default value of mon_clock_drift_allowed without previous testing. Changing this value might affect the stability of the Monitors and the Ceph Storage Cluster in general.

Possible causes of the clock skew error include network problems or problems with Network Time Protocol (NTP) synchronization if that is configured. In addition, time synchronization does not work properly on Monitors deployed on virtual machines.

To Troubleshoot This Problem
  1. Verify that your network works correctly. For details, see Chapter 3, Troubleshooting Networking Issues. In particular, troubleshoot any problems with NTP clients if you use NTP. See Section 3.2, “Basic NTP Troubleshooting” for more information.
  2. If you use a remote NTP server, consider deploying your own NTP server on your network. For details, see the Configuring NTP Using ntpd chapter in the System Administrator’s Guide for Red Hat Enterprise Linux 7.
  3. If you do not use an NTP client, set one up. For details, see the Configuring the Network Time Protocol for Red Hat Ceph Storage section in the Red Hat Ceph Storage 3 Installation Guide for Red Hat Enterprise Linux or Ubuntu.
  4. If you use virtual machines for hosting the Monitors, move them to bare metal hosts. Using virtual machines for hosting Monitors is not supported. For details, see the Red Hat Ceph Storage: Supported configurations article on the Red Hat Customer Portal.
Note

Ceph evaluates time synchronization every five minutes only so there will be a delay between fixing the problem and clearing the clock skew messages.

See Also

4.1.3. The Monitor Store is Getting Too Big

The ceph health command returns an error message similar to the following one:

mon.ceph1 store is getting too big! 48031 MB >= 15360 MB -- 62% avail
What This Means

Ceph Monitors store is in fact a LevelDB database that stores entries as key–values pairs. The database includes a cluster map and is located by default at /var/lib/ceph/mon/<cluster-name>-<short-host-name>/store.db.

Querying a large Monitor store can take time. As a consequence, the Monitor can be delayed in responding to client queries.

In addition, if the /var/ partition is full, the Monitor cannot perform any write operations to the store and terminates. See Section 4.1.1, “A Monitor Is Out of Quorum” for details on troubleshooting this issue.

To Troubleshoot This Problem
  1. Check the size of the database:

    du -sch /var/lib/ceph/mon/<cluster-name>-<short-host-name>/store.db

    Specify the name of the cluster and the short host name of the host where the ceph-mon is running, for example:

    # du -sch /var/lib/ceph/mon/ceph-host1/store.db
    47G     /var/lib/ceph/mon/ceph-ceph1/store.db/
    47G     total
  2. Compact the Monitor store. For details, see Section 4.5, “Compacting the Monitor Store”.
See Also

4.1.4. Understanding Monitor Status

The mon_status command returns information about a Monitor, such as:

  • State
  • Rank
  • Elections epoch
  • Monitor map (monmap)

If Monitors are able to form a quorum, use mon_status with the ceph command-line utility.

If Monitors are not able to form a quorum, but the ceph-mon daemon is running, use the administration socket to execute mon_status. For details, see the Using the Administration Socket section in the Administration Guide for Red Hat Ceph Storage 3.

An example output of mon_status

{
    "name": "mon.3",
    "rank": 2,
    "state": "peon",
    "election_epoch": 96,
    "quorum": [
        1,
        2
    ],
    "outside_quorum": [],
    "extra_probe_peers": [],
    "sync_provider": [],
    "monmap": {
        "epoch": 1,
        "fsid": "d5552d32-9d1d-436c-8db1-ab5fc2c63cd0",
        "modified": "0.000000",
        "created": "0.000000",
        "mons": [
            {
                "rank": 0,
                "name": "mon.1",
                "addr": "172.25.1.10:6789\/0"
            },
            {
                "rank": 1,
                "name": "mon.2",
                "addr": "172.25.1.12:6789\/0"
            },
            {
                "rank": 2,
                "name": "mon.3",
                "addr": "172.25.1.13:6789\/0"
            }
        ]
    }
}

Monitor States
Leader
During the electing phase, Monitors are electing a leader. The leader is the Monitor with the highest rank, that is the rank with the lowest value. In the example above, the leader is mon.1.
Peon
Peons are the Monitors in the quorum that are not leaders. If the leader fails, the peon with the highest rank becomes a new leader.
Probing
A Monitor is in the probing state if it is looking for other Monitors. For example after you start the Monitors, they are probing until they find enough Monitors specified in the Monitor map (monmap) to form a quorum.
Electing
A Monitor is in the electing state if it is in the process of electing the leader. Usually, this status changes quickly.
Synchronizing
A Monitor is in the synchronizing state if it is synchronizing with the other Monitors to join the quorum. The smaller the Monitor store it, the faster the synchronization process. Therefore, if you have a large store, synchronization takes longer time.

4.2. Injecting a Monitor Map

If a Monitor has an outdated or corrupted Monitor map (monmap), it cannot join a quorum because it is trying to reach the other Monitors on incorrect IP addresses.

The safest way to fix this problem is to obtain and inject the actual Monitor map from other Monitors. Note that this action overwrites the existing Monitor map kept by the Monitor.

This procedure shows how to inject the Monitor map when the other Monitors are able to form a quorum, or when at least one Monitor has a correct Monitor map. If all Monitors have corrupted store and therefore also the Monitor map, see Section 4.3, “Recovering the Monitor Store”.

Procedure: Injecting a Monitor Map

  1. If the remaining Monitors are able to form a quorum, get the Monitor map by using the ceph mon getmap command:

    # ceph mon getmap -o /tmp/monmap
  2. If the remaining Monitors are not able to form the quorum and you have at least one Monitor with a correct Monitor map, copy it from that Monitor:

    1. Stop the Monitor which you want to copy the Monitor map from:

      systemctl stop ceph-mon@<host-name>

      For example, to stop the Monitor running on a host with the host1 short host name:

      # systemctl stop ceph-mon@host1
    2. Copy the Monitor map:

      ceph-mon -i <id> --extract-monmap /tmp/monmap

      Replace <id> with the ID of the Monitor which you want to copy the Monitor map from, for example:

      # ceph-mon -i mon.a  --extract-monmap /tmp/monmap
  3. Stop the Monitor with the corrupted or outdated Monitor map:

    systemctl stop ceph-mon@<host-name>

    For example, to stop a Monitor running on a host with the host2 short host name:

    # systemctl stop ceph-mon@host2
  4. Inject the Monitor map:

    ceph-mon -i <id> --inject-monmap /tmp/monmap

    Replace <id> with the ID of the Monitor with the corrupted or outdated Monitor map, for example:

    # ceph-mon -i mon.c --inject-monmap /tmp/monmap
  5. Start the Monitor, for example:

    # systemctl start ceph-mon@host2

    If you copied the Monitor map from another Monitor, start that Monitor, too, for example:

    # systemctl start ceph-mon@host1

See Also

4.3. Recovering the Monitor Store

Ceph Monitors store the cluster map in a key–value store such as LevelDB. If the store is corrupted on a Monitor, the Monitor terminates unexpectedly and fails to start again. The Ceph logs might include the following errors:

Corruption: error in middle of record
Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/mon.0/store.db/1234567.ldb

Production clusters must use at least three Monitors so that if one fails, it can be replaced with another one. However, under certain circumstances, all Monitors can have corrupted stores. For example, when the Monitor nodes have incorrectly configured disk or file system settings, a power outage can corrupt the underlying file system.

If the store is corrupted on all Monitors, you can recover it with information stored on the OSD nodes by using utilities called ceph-monstore-tool and ceph-objectstore-tool.

Important

This procedure cannot recover the following information:

  • Metadata Daemon Server (MDS) keyrings and maps
  • Placement Group settings:

    • full ratio set by using the ceph pg set_full_ratio command
    • nearfull ratio set by using the ceph pg set_nearfull_ratio command

Before You Start

  • Ensure that you have the rsync utility and the ceph-test package installed.

Procedure: Recovering the Monitor Store

Use the following commands from the Monitor node with the corrupted store.

  1. Collect the cluster map from all OSD nodes:

    ms=<directory>
    mkdir $ms
    
    for host in $host_list; do
      rsync -avz "$ms" root@$host:"$ms"; rm -rf "$ms"
      ssh root@$host <<EOF
      for osd in  /var/lib/ceph/osd/ceph-*; do
        ceph-objectstore-tool --data-path \$osd --op update-mon-db --mon-store-path $ms
      done
    EOF
    rsync -avz root@$host:$ms $ms; done

    Replace <directory> with a temporary directory to store the collected cluster map, for example:

    $ ms=/tmp/monstore/
    $ mkdir $ms
    $ for host in $host_list; do
      rsync -avz "$ms" root@$host:"$ms"; rm -rf "$ms"
      ssh root@$host <<EOF
      for osd in  /var/lib/ceph/osd/ceph-*; do
        ceph-objectstore-tool --data-path \$osd --op update-mon-db --mon-store-path $ms
      done
    EOF
    rsync -avz root@$host:$ms $ms; done
  2. Set appropriate capabilities:

    ceph-authtool <keyring>  -n mon. --cap mon 'allow *'
    ceph-authtool <keyring>  -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'

    Replace <keyring> with the path to the client administration keyring, for example:

    $ ceph-authtool /etc/ceph/ceph.client.admin.keyring  -n mon. --cap mon 'allow *'
    $ ceph-authtool /etc/ceph/ceph.client.admin.keyring  -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'
  3. Rebuild the Monitor store from the collected map:

    ceph-monstore-tool <directory> rebuild -- --keyring <keyring>

    Replace <directory> with the temporary directory from the first step and <keyring> with the path to the client administration keyring, for example:

    $ ceph-monstore-tool /tmp/mon-store rebuild -- --keyring /etc/ceph/ceph.client.admin.keyring
    Note

    If you do not use the cephfx authentication, omit the --keyring option:

    $ ceph-monstore-tool /tmp/mon-store rebuild
  4. Back up the corrupted store:

    mv /var/lib/ceph/mon/<mon-ID>/store.db \
       /var/lib/ceph/mon/<mon-ID>/store.db.corrupted

    Replace <mon-ID> with the Monitor ID, for example <mon.0>:

    # mv /var/lib/ceph/mon/mon.0/store.db \
         /var/lib/ceph/mon/mon.0/store.db.corrupted
  5. Replace the corrupted store:

    mv /tmp/mon-store/store.db /var/lib/ceph/mon/<mon-ID>/store.db

    Replace <mon-ID> with the Monitor ID, for example <mon.0>:

    # mv /tmp/mon-store/store.db /var/lib/ceph/mon/mon.0/store.db

    Repeat this step for all Monitors with corrupted store.

  6. Change the owner of the new store:

    chown -R ceph:ceph /var/lib/ceph/mon/<mon-ID>/store.db

    Replace <mon-ID> with the Monitor ID, for example <mon.0>:

    # chown -R ceph:ceph /var/lib/ceph/mon/mon.0/store.db

    Repeat this step for all Monitors with corrupted store.

See also

4.4. Replacing a Failed Monitor

When a Monitor has a corrupted store, the recommended way to fix this problem is to replace the Monitor by using the Ansible automation application.

Before You Start

  • Before removing a Monitor, ensure that the other Monitors are running and able to form a quorum.

Procedure: Replacing a Failed Monitor

  1. From the Monitor host, remove the Monitor store by default located at /var/lib/ceph/mon/<cluster-name>-<short-host-name>:

    rm -rf /var/lib/ceph/mon/<cluster-name>-<short-host-name>

    Specify the short host name of the Monitor host and the cluster name. For example, to remove the Monitor store of a Monitor running on host1 from a cluster called remote:

    # rm -rf /var/lib/ceph/mon/remote-host1
  2. Remove the Monitor from the Monitor map (monmap):

    ceph mon remove <short-host-name> --cluster <cluster-name>

    Specify the short host name of the Monitor host and the cluster name. For example, to remove the Monitor running on host1 from a cluster called remote:

    # ceph mon remove host1 --cluster remote
  3. Troubleshoot and fix any problems related to the underlying file system or hardware of the Monitor host.
  4. From the Ansible administration node, redeploy the Monitor by running the ceph-ansible playbook:

    $ /usr/share/ceph-ansible/ansible-playbook site.yml

See Also

4.5. Compacting the Monitor Store

When the Monitor store has grown big in size, you can compact it:

Important

Monitor store size changes when the cluster is not in the active+clean state or during the rebalancing process. For this reason, compact the Monitor store when rebalancing is completed. Also, ensure that the placement groups are in the active+clean state.

Procedure: Compacting the Monitor Store Dynamically

To compact the Monitor store when the ceph-mon daemon is running:

ceph tell mon.<host-name> compact

Replace <host-name> with the short host name of the host where the ceph-mon is running. Use the hostname -s command when unsure.

# ceph tell mon.host1 compact

Procedure: Compacting the Monitor Store at Startup

  1. Add the following parameter to the Ceph configuration under the [mon] section:

    [mon]
    mon_compact_on_start = true
  2. Restart the ceph-mon daemon:

    systemctl restart ceph-mon@<host-name>

    Replace <host-name> with the short name of the host where the daemon is running. Use the hostname -s command when unsure.

    # systemctl restart ceph-mon@host1
  3. Ensure that Monitors have formed a quorum:

    # ceph mon stat
  4. Repeat these steps on other Monitors if needed.

Procedure: Compacting Monitor Store with ceph-monstore-tool

Note

Before you start, ensure that you have the ceph-test package installed.

  1. Verify that the ceph-mon daemon with the large store is not running. Stop the daemon if needed.

    systemctl status ceph-mon@<host-name>
    systemctl stop ceph-mon@<host-name>

    Replace <host-name> with the short name of the host where the daemon is running. Use the hostname -s command when unsure.

    # systemctl status ceph-mon@host1
    # systemctl stop ceph-mon@host1
  2. Compact the Monitor store:

    ceph-monstore-tool /var/lib/ceph/mon/mon.<host-name> compact

    Replace <host-name> with a short host name of the Monitor host.

    # ceph-monstore-tool /var/lib/ceph/mon/mon.node1 compact
  3. Start ceph-mon again:

    systemctl start ceph-mon@<host-name>

    For example:

    # systemctl start ceph-mon@host1

See Also

4.6. Opening Ports for Ceph Manager

The ceph-mgr daemons receive placement group information from OSDs on the same range of ports as the ceph-osd daemons. If these ports are not open, a cluster will devolve from HEALTH_OK to HEALTH_WARN and will indicate that PGs are unknown with a percentage count of the PGs unknown.

To resolve this situation, for each host running ceph-mgr daemons, open ports 6800:7300. For example:

[root@ceph-mgr] # firewall-cmd --add-port 6800:7300/tcp
[root@ceph-mgr] # firewall-cmd --add-port 6800:7300/tcp --permanent

Then, restart the ceph-mgr daemons.