NFS server is randomly sending malformed replies to READDIR requests

Solution Verified - Updated -

Issue

  • NFS client hangs on getdents64() system call
  • A NFS server is observed to be sending NFSv4 owner groups with incorrect data.
  • In NFS Client, Kernel crashes with following logs in some cases:
[132015.011271] BUG: unable to handle kernel paging request at ffff88082bf68000 
[132015.046471] IP: [<ffffffff81326696>] memcpy+0x6/0x110 
[132015.072112] PGD 1f9e067 PUD 1fa1067 PMD 82cff8063 PTE 800000082bf68161 
[132015.102296] Oops: 0003 [#1] SMP  
[132015.116789] Modules linked in: loop rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache intel_powerclamp coretemp kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_ssif ipmi_devintf ipmi_si iTCO_wdt pcspkr sb_edac sg ipmi_msghandler hpilo iTCO_vendor_support hpwdt wmi acpi_power_meter edac_core lpc_ich pcc_cpufreq ioatdma nfsd shpchp dca auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ata_generic pata_acpi ata_piix crct10dif_pclmul crct10dif_common tg3 drm crc32c_intel serio_raw libata hpsa ptp i2c_core scsi_transport_sas pps_core fjes dm_mirror dm_region_hash dm_log dm_mod 
[132015.431515] CPU: 6 PID: 54488 Comm: find Not tainted 3.10.0-510.el7.bz1375457.x86_64 #1 
[132015.468026] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014 
[132015.498181] task: ffff88034f410fb0 ti: ffff8801ae180000 task.ti: ffff8801ae180000 
[132015.532299] RIP: 0010:[<ffffffff81326696>]  [<ffffffff81326696>] memcpy+0x6/0x110 
[132015.569035] RSP: 0018:ffff8801ae183ac0  EFLAGS: 00010282 
[132015.595289] RAX: ffff88082bf4ae42 RBX: ffff8801ae183c28 RCX: fffffffffffe2e41 
[132015.628541] RDX: ffffffffffffffff RSI: ffff8803879e2196 RDI: ffff88082bf68000 
[132015.661145] RBP: ffff8801ae183b48 R08: 0000000000000000 R09: 0000000000000000 
[132015.693802] R10: 0000000000000012 R11: ffff8803879c4eac R12: ffff88042b4fb900 
[132015.726315] R13: ffff88082bf4ae40 R14: 00000000ffffffff R15: ffff8801ae183b6c 
[132015.758918] FS:  00007fd33c2a0800(0000) GS:ffff88042f780000(0000) knlGS:0000000000000000 
[132015.795712] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[132015.822045] CR2: ffff88082bf68000 CR3: 00000005e56c8000 CR4: 00000000001407e0 
[132015.854577] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[132015.887146] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
[132015.919704] Stack: 
[132015.929121]  ffffffffa056c48d ffff8801ae183ad8 ffffffffa05756f3 ffffffffffffffff 
[132015.962946]  ffff88042d64e000 0000000000000000 0000000000000000 ffff8801ae183c28 
[132015.996626]  ffff8801ae183b64 ffff8801ae183b68 ffff88032d98a000 00000000b3363ae6 
[132016.030992] Call Trace: 
[132016.042864]  [<ffffffffa056c48d>] ? decode_getfattr_attrs+0x2cd/0x1510 [nfsv4] 
[132016.078474]  [<ffffffffa05756f3>] ? nfs4_have_delegation+0x13/0x20 [nfsv4] 
[132016.113650]  [<ffffffffa056ffb7>] nfs4_decode_dirent+0x137/0x1c0 [nfsv4] 
[132016.145652]  [<ffffffffa045e945>] nfs_readdir_page_filler+0x135/0x5b0 [nfs] 
[132016.177613]  [<ffffffffa045efcd>] nfs_readdir_xdr_to_array+0x20d/0x3b0 [nfs] 
[132016.209323]  [<ffffffff8118acc6>] ? __alloc_pages_nodemask+0x176/0x420 
[132016.239264]  [<ffffffffa045f170>] ? nfs_readdir_xdr_to_array+0x3b0/0x3b0 [nfs] 
[132016.272644]  [<ffffffffa045f192>] nfs_readdir_filler+0x22/0x90 [nfs] 
[132016.301694]  [<ffffffff8118105f>] do_read_cache_page+0x7f/0x190 
[132016.328718]  [<ffffffff81212270>] ? fillonedir+0xe0/0xe0 
[132016.353029]  [<ffffffff811811ac>] read_cache_page+0x1c/0x30 
[132016.378577]  [<ffffffffa045f3db>] nfs_readdir+0x1db/0x6b0 [nfs] 
[132016.405766]  [<ffffffffa056fe80>] ? nfs4_xdr_dec_layoutget+0x270/0x270 [nfsv4] 
[132016.438268]  [<ffffffff81212270>] ? fillonedir+0xe0/0xe0 
[132016.462088]  [<ffffffff81212160>] vfs_readdir+0xb0/0xe0 
[132016.485961]  [<ffffffff81212585>] SyS_getdents+0x95/0x120 
[132016.510752]  [<ffffffff816962c9>] system_call_fastpath+0x16/0x1b 
[132016.538324] Code: 43 60 48 2b 43 50 88 43 4e 5b 5d c3 66 0f 1f 84 00 00 00 00 00 e8 7b fc ff ff eb e2 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 <f3> a4 c3 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b  
[132016.629326] RIP  [<ffffffff81326696>] memcpy+0x6/0x110 
[132016.653592]  RSP <ffff8801ae183ac0> 
[132016.669725] CR2: ffff88082bf68000 

NOTE: The panic stack could be different, depending on the task that triggers the panic (which may not always be an nfs related one). The stack seen above is a strong indication of this problem. More details can be found in the "Root Cause" section.

Environment

  • Red Hat Enterprise Linux 7 or 8 as the NFS server
  • Red Hat Enterprise Linux as a NFS client
  • NFS clients from other vendors
  • NFSv4.0

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content