Language:
Format:

Chapter 3. Troubleshooting networking issues

This chapter lists basic troubleshooting procedures connected with networking and chrony for Network Time Protocol (NTP).

3.1. Prerequisites

A running Red Hat Ceph Storage cluster.

3.2. Basic networking troubleshooting

Red Hat Ceph Storage depends heavily on a reliable network connection. Red Hat Ceph Storage nodes use the network for communicating with each other. Networking issues can cause many problems with Ceph OSDs, such as them flapping, or being incorrectly reported as down. Networking issues can also cause the Ceph Monitor’s clock skew errors. In addition, packet loss, high latency, or limited bandwidth can impact the cluster performance and stability.

Prerequisites

Root-level access to the node.

Procedure

Installing the net-tools and telnet packages can help when troubleshooting network issues that can occur in a Ceph storage cluster:
Example
```
[root@host01 ~]# dnf install net-tools
[root@host01 ~]# dnf install telnet
```

Log into the cephadm shell and verify that the public_network parameters in the Ceph configuration file include the correct values:

Example

[ceph: root@host01 /]# cat /etc/ceph/ceph.conf
# minimal ceph.conf for 57bddb48-ee04-11eb-9962-001a4a000672
[global]
	fsid = 57bddb48-ee04-11eb-9962-001a4a000672
	mon_host = [v2:10.74.249.26:3300/0,v1:10.74.249.26:6789/0] [v2:10.74.249.163:3300/0,v1:10.74.249.163:6789/0] [v2:10.74.254.129:3300/0,v1:10.74.254.129:6789/0]
[mon.host01]
public network = 10.74.248.0/21

Exit the shell and verify that the network interfaces are up:

Example

[root@host01 ~]# ip link list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:1a:4a:00:06:72 brd ff:ff:ff:ff:ff:ff

Verify that the Ceph nodes are able to reach each other using their short host names. Verify this on each node in the storage cluster:
Syntax
```
ping SHORT_HOST_NAME
```
Example
```
[root@host01 ~]# ping host02
```

If you use a firewall, ensure that Ceph nodes are able to reach each other on their appropriate ports. The firewall-cmd and telnet tools can validate the port status, and if the port is open respectively:

Syntax

firewall-cmd --info-zone=ZONE
telnet IP_ADDRESS PORT

Example

[root@host01 ~]# firewall-cmd --info-zone=public
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: ens3
  sources:
  services: ceph ceph-mon cockpit dhcpv6-client ssh
  ports: 9283/tcp 8443/tcp 9093/tcp 9094/tcp 3000/tcp 9100/tcp 9095/tcp
  protocols:
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:

[root@host01 ~]# telnet 192.168.0.22 9100

Verify that there are no errors on the interface counters. Verify that the network connectivity between nodes has expected latency, and that there is no packet loss.

Using the ethtool command:

Syntax

ethtool -S INTERFACE

Example

[root@host01 ~]# ethtool -S ens3 | grep errors
NIC statistics:
     rx_fcs_errors: 0
     rx_align_errors: 0
     rx_frame_too_long_errors: 0
     rx_in_length_errors: 0
     rx_out_length_errors: 0
     tx_mac_errors: 0
     tx_carrier_sense_errors: 0
     tx_errors: 0
     rx_errors: 0

Using the ifconfig command:

Example

[root@host01 ~]# ifconfig
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.74.249.26  netmask 255.255.248.0  broadcast 10.74.255.255
        inet6 fe80::21a:4aff:fe00:672  prefixlen 64  scopeid 0x20<link>
        inet6 2620:52:0:4af8:21a:4aff:fe00:672  prefixlen 64  scopeid 0x0<global>
        ether 00:1a:4a:00:06:72  txqueuelen 1000  (Ethernet)
        RX packets 150549316  bytes 56759897541 (52.8 GiB)
        RX errors 0  dropped 176924  overruns 0  frame 0
        TX packets 55584046  bytes 62111365424 (57.8 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 9373290  bytes 16044697815 (14.9 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 9373290  bytes 16044697815 (14.9 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Using the netstat command:

Example

[root@host01 ~]# netstat -ai
Kernel Interface table
Iface             MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
ens3             1500 311847720      0 364903 0      114341918      0      0      0 BMRU
lo              65536 19577001      0      0 0      19577001      0      0      0 LRU

For performance issues, in addition to the latency checks and to verify the network bandwidth between all nodes of the storage cluster, use the iperf3 tool. The iperf3 tool does a simple point-to-point network bandwidth test between a server and a client.

Install the iperf3 package on the Red Hat Ceph Storage nodes you want to check the bandwidth:
Example
```
[root@host01 ~]# dnf install iperf3
```

On a Red Hat Ceph Storage node, start the iperf3 server:

Example

[root@host01 ~]# iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

Note

The default port is 5201, but can be set using the -P command argument.

On a different Red Hat Ceph Storage node, start the iperf3 client:

Example

[root@host02 ~]# iperf3 -c mon
Connecting to host mon, port 5201
[  4] local xx.x.xxx.xx port 52270 connected to xx.x.xxx.xx port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   114 MBytes   954 Mbits/sec    0    409 KBytes
[  4]   1.00-2.00   sec   113 MBytes   945 Mbits/sec    0    409 KBytes
[  4]   2.00-3.00   sec   112 MBytes   943 Mbits/sec    0    454 KBytes
[  4]   3.00-4.00   sec   112 MBytes   941 Mbits/sec    0    471 KBytes
[  4]   4.00-5.00   sec   112 MBytes   940 Mbits/sec    0    471 KBytes
[  4]   5.00-6.00   sec   113 MBytes   945 Mbits/sec    0    471 KBytes
[  4]   6.00-7.00   sec   112 MBytes   937 Mbits/sec    0    488 KBytes
[  4]   7.00-8.00   sec   113 MBytes   947 Mbits/sec    0    520 KBytes
[  4]   8.00-9.00   sec   112 MBytes   939 Mbits/sec    0    520 KBytes
[  4]   9.00-10.00  sec   112 MBytes   939 Mbits/sec    0    520 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver

iperf Done.

This output shows a network bandwidth of 1.1 Gbits/second between the Red Hat Ceph Storage nodes, along with no retransmissions (Retr) during the test.

Red Hat recommends you validate the network bandwidth between all the nodes in the storage cluster.

Ensure that all nodes have the same network interconnect speed. Slower attached nodes might slow down the faster connected ones. Also, ensure that the inter switch links can handle the aggregated bandwidth of the attached nodes:

Syntax

ethtool INTERFACE

Example

[root@host01 ~]# ethtool ens3
Settings for ens3:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full
                        100baseT/Half 100baseT/Full
                        1000baseT/Half 1000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes:  10baseT/Half 10baseT/Full
                        100baseT/Half 100baseT/Full
                        1000baseT/Half 1000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                     100baseT/Half 100baseT/Full
                                     1000baseT/Full
Link partner advertised pause frame use: Symmetric
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s 1
Duplex: Full 2
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: off
Supports Wake-on: g
Wake-on: d
Current message level: 0x000000ff (255)
       drv probe link timer ifdown ifup rx_err tx_err
Link detected: yes 3

Additional Resources

See the Basic Network troubleshooting solution on the Customer Portal for details.
See the What is the "ethtool" command and how can I use it to obtain information about my network devices and interfaces for details.
See the RHEL network interface dropping packets solutions on the Customer Portal for details.
For details, see the What are the performance benchmarking tools available for Red Hat Ceph Storage? solution on the Customer Portal.
For more information, see Knowledgebase articles and solutions related to troubleshooting networking issues on the Customer Portal.

3.3. Basic chrony NTP troubleshooting

This section includes basic chrony NTP troubleshooting steps.

Prerequisites

A running Red Hat Ceph Storage cluster.
Root-level access to the Ceph Monitor node.

Procedure

Verify that the chronyd daemon is running on the Ceph Monitor hosts:
Example
```
[root@mon ~]# systemctl status chronyd
```

If chronyd is not running, enable and start it:

Example

[root@mon ~]# systemctl enable chronyd
[root@mon ~]# systemctl start chronyd

Ensure that chronyd is synchronizing the clocks correctly:

Example

[root@mon ~]# chronyc sources
[root@mon ~]# chronyc sourcestats
[root@mon ~]# chronyc tracking

Additional Resources

See the How to troubleshoot chrony issues solution on the Red Hat Customer Portal for advanced chrony NTP troubleshooting steps.
See the Clock skew section in the Red Hat Ceph Storage Troubleshooting Guide for further details.
See the Checking if chrony is synchronized section for further details.

Select Your Language

Chapter 3. Troubleshooting networking issues

3.1. Prerequisites

3.2. Basic networking troubleshooting

3.3. Basic chrony NTP troubleshooting

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Language and Page Formatting Options

Chapter 3. Troubleshooting networking issues

3.1. Prerequisites

3.2. Basic networking troubleshooting

3.3. Basic chrony NTP troubleshooting

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links