Why do my ssh connections hang in the Microsoft Azure cloud?

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 7
  • Microsoft Azure cloud
  • SSH connections with OpenSSH

Issue

  • Interactive ssh connections to or from hosts in the Microsoft Azure cloud freeze, while non-interactive or batch-mode connections complete successfully.

Resolution

There are two possibilities for resolving this issue, using OpenSSH configuration directives or using firewalling.

Mitigation via OpenSSH Configuration

To address the issue in OpenSSH, modify both /etc/ssh/sshd_config and /etc/ssh/ssh_config to include this directive, which will cause OpenSSH to tag packets with the default QoS hints used in the sshd appearing in RHEL 8 and newer:

IPQoS af21 cs1

After changing a system's /etc/ssh/sshd_config please restart sshd:

# systemctl restart sshd.service

Mitigation via firewalld direct rules

To address the issue using a firewalld direct rule, apply the following to servers and clients to cause the kernel to manually reset the DSCP flag to zero for packets emnating from or destined for port 22, the default ssh port:

# firewall-cmd --direct --add-rule ipv4 mangle OUTPUT 1 -p tcp -m tcp --dport 22 -j DSCP --set-dscp 0
# firewall-cmd --direct --add-rule ipv4 mangle OUTPUT 2 -p tcp -m tcp --sport 22 -j DSCP --set-dscp 0
# firewall-cmd --runtime-to-permanent

Mitigation via iptables rules

Alternately, to do the same using iptables, apply the following:

# iptables -t mangle -A OUTPUT -p tcp -m tcp --dport 22 -j DSCP --set-dscp 0
# iptables -t mangle -A OUTPUT -p tcp -m tcp --sport 22 -j DSCP --set-dscp 0

Note that these iptables rules will not survive a reboot, so you'll wish to ensure that they apply at boot time using whatever iptables persistence mechanism you prefer.

Also, note that if you run an sshd on an alternate port and opt for the firewall-based mitigation, you'll need to update the destination and source ports accordingly, or use the configuration-based mitigation.

Root Cause

There is a bug in the OpenSSH server package shipped in RHEL 7 that causes the DSCP flag (a hint for how best to implement Quality of Service handling for different sorts of connections) on packets that are part of an interactive SSH connection to be set incorrectly to "4". This would normally mean that there will be no special handling of those packets, but an issue has been identified in the Microsoft Azure cloud, in their East US 2 region, wherein packets with anomalous DSCP flags are dropped, rather than being handled using the default QoS level.

Red Hat Private Bug 1970173 - OpenSSH sets incorrect RFC 2474 QoS (DSCP) flag

Diagnostic Steps

To observe packets with QoS flags incorrectly set by RHEL 7's sshd, you can do something like this:

# sudo tshark -i br0 -Y "tcp.port==22 && ip.dsfield.dscp==4"

Replace br0 with your Ethernet interface.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments