The NVIDIA BlueField-2 DPU SoC requires an errata kernel for kdump to pass
Issue
Under kdump environment, mlx5 driver tries to use minimal resources as possible since the RAM is very limited. And when using a device that supports TLS offload (like ConnectX-6 Dx or BlueFiled-2), the stop room will be higher than the SQ size and so the vmcore copy over NFS fails as it won't have a working interface.
[ 18.477401] mlx5_core 0000:03:00.1 enp3s0f1: Stop room 95 is bigger than the SQ size 64
[ 17.990167] WARNING: CPU: 0 PID: 479 at drivers/net/ethernet/mellanox/mlx5/core/en_main.c:1131 mlx5e_open_sqs+0x49c/0x508 [mlx5_core]
[ 17.727583] Call trace:
[ 17.732526] mlx5e_open_sqs+0x49c/0x508 [mlx5_core]
[ 17.742360] mlx5e_open_channels+0x784/0x938 [mlx5_core]
[ 17.753066] mlx5e_open_locked+0x44/0xb8 [mlx5_core]
[ 17.763075] mlx5e_open+0x38/0x88 [mlx5_core]
[ 17.771827] __dev_open+0xf8/0x190
[ 17.778648] __dev_change_flags+0x1a0/0x208
[ 17.787038] dev_change_flags+0x3c/0x78
[ 17.794730] do_setlink+0x2a0/0xc88
[ 17.801724] __rtnl_newlink+0x5e4/0x700
[ 17.809411] rtnl_newlink+0x58/0x80
[ 17.816401] rtnetlink_rcv_msg+0x230/0x2f8
[ 17.824615] netlink_rcv_skb+0x60/0x120
[ 17.832305] rtnetlink_rcv+0x28/0x38
[ 17.839472] netlink_unicast+0x1d0/0x260
[ 17.847334] netlink_sendmsg+0x1b4/0x358
[ 17.855201] sock_sendmsg+0x4c/0x68
[ 17.862193] ____sys_sendmsg+0x200/0x240
[ 17.870058] ___sys_sendmsg+0x90/0xd0
[ 17.877401] __sys_sendmsg+0x68/0xb0
[ 17.884567] __arm64_sys_sendmsg+0x2c/0x38
[ 17.892785] el0_svc_handler+0xb0/0x180
[ 17.900478] el0_svc+0x8/0xc
[ 17.441498] mount[495]: mount.nfs: Network is unreachable
Environment
- Red Hat Enterprise Linux 8.4
- NVIDIA BlueField-2 DPU SoC
- aarch64 platform
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.