Upgrade to nfs-utils-1.2.3-7.el6_1.1 causes NFSv4 doesn't work with Kerberos

Latest response

Hello,

 

I have NFSv4 working with an kerberos authentication on my Red Hat 6.1 NFS server. Everything was working properly with the nfs-utils-1.2.3-7.el6.x86_64. When I upgraded it to the nfs-utils-1.2.3-7.el6_1.1.x86_64 rpc.svcgssd stopped work correctly. I didn't change anything on workstations (Fedora 15 and Red Hat 6.1). Just this one package.

 

After a more detailed investigation I found that rpc.svcgssd caused the problem. When I replace this file - /usr/sbin/rpc.svcgssd with that one from the nfs-utils-1.2.3-7.el6.x86_64 everything is working fine again.

 

Now I can give you some details about this problem. When I starting my NFSv4 server evetything seems working correctly. But when a client try to mount an NFS volume with the kerberos authentication there apperas an error (without kerberos security everything works fine):

 

[root@client ~]# mount -t nfs4 -o sec=krb5 nfs.wszib.edu.pl:/ /nfs
 

mount.nfs4: access denied by server while mounting nfs.wszib.edu.pl:/
 

On the server I have in /var/log/messages:

Nov 10 20:57:41 nfs rpc.svcgssd[1786]: ERROR: GSS-API: error in gss_export_lucid_sec_context(): GSS_S_NO_CONTEXT (No context has been established) - (0x00007f26)
Nov 10 20:57:41 nfs rpc.svcgssd[1786]: ERROR: failed serializing krb5 context for kernel
Nov 10 20:57:41 nfs rpc.svcgssd[1786]: WARNING: handle_nullreq: serialize_context_for_kernel failed

 

When I running /usr/sbin/rpc.svcgssd in verbose mode I have:

[root@nfs sbin]# /usr/sbin/rpc.svcgssd -f -vvvv
entering poll
leaving poll
handling null request
sname = nfs/client.wszib.edu.pl@WSZIB.EDU.PL
DEBUG: serialize_krb5_ctx: lucid version!
ERROR: GSS-API: error in gss_export_lucid_sec_context(): GSS_S_NO_CONTEXT (No context has been established) - (0x00007f80)
ERROR: failed serializing krb5 context for kernel
WARNING: handle_nullreq: serialize_context_for_kernel failed

sending null reply
...

 

 

[root@client ~]# /usr/sbin/rpc.gssd -f -vvvv
beginning poll
dir_notify_handler: sig 37 si 0x7fff49700870 data 0x7fff49700740
dir_notify_handler: sig 37 si 0x7fff49700870 data 0x7fff49700740
dir_notify_handler: sig 37 si 0x7fff496fad30 data 0x7fff496fac00
dir_notify_handler: sig 37 si 0x7fff49700870 data 0x7fff49700740
dir_notify_handler: sig 37 si 0x7fff49700870 data 0x7fff49700740
handling gssd upcall (/var/lib/nfs/rpc_pipefs/nfs/clnt50)
handle_gssd_upcall: 'mech=krb5 uid=0 enctypes=18,17,16,23,3,1,2 '
handling krb5 upcall (/var/lib/nfs/rpc_pipefs/nfs/clnt50)
process_krb5_upcall: service is '<null>'
Full hostname for 'nfs.wszib.edu.pl' is 'nfs.wszib.edu.pl'
Full hostname for 'client.wszib.edu.pl' is 'client.wszib.edu.pl'
No key table entry found for CLIENT.WSZIB.EDU.PL$@WSZIB.EDU.PL while getting keytab entry for 'CLIENT.WSZIB.EDU.PL$@WSZIB.EDU.PL'
No key table entry found for root/client.wszib.edu.pl@WSZIB.EDU.PL while getting keytab entry for 'root/client.wszib.edu.pl@WSZIB.EDU.PL'
Success getting keytab entry for 'nfs/client.wszib.edu.pl@WSZIB.EDU.PL'
Successfully obtained machine credentials for principal 'nfs/clientwszib.edu.pl@WSZIB.EDU.PL' stored in ccache 'FILE:/tmp/krb5cc_machine_WSZIB.EDU.PL'
INFO: Credentials in CC 'FILE:/tmp/krb5cc_machine_WSZIB.EDU.PL' are good until 1320994315

using FILE:/tmp/krb5cc_machine_WSZIB.EDU.PL as credentials cache for machine creds
using environment variable to select krb5 ccache FILE:/tmp/krb5cc_machine_WSZIB.EDU.PL
creating context using fsuid 0 (save_uid 0)
creating tcp client for server nfs.wszib.edu.pl
DEBUG: port already set to 2049
creating context with server nfs@nfs.wszib.edu.pl
WARNING: Failed to create krb5 context for user with uid 0 for server nfs3.dydaktyka.wszib.edu.pl
WARNING: Failed to create machine krb5 context with credentials cache FILE:/tmp/krb5cc_machine_WSZIB.EDU.PL for server nfs.wszib.edu.pl
WARNING: Machine cache is prematurely expired or corrupted trying to recreate cache for server nfs.wszib.edu.pl

 

I don't have any idea what is wrong with the /usr/sbin/rpc.svcgssd and how resolve this problem. The only solution for me is to come back to the previous version of nfs-utils.

 

 

Thanks in advance for any experiences related,

Remigiusz

Responses

Yes, we have experienced the same problem after the recent update of nfs-utils to 1.2.3-7.el6_1.1.  The symptom was caused by the rpc.gssd dumped core trying to do NFSv4 mount with Kerberos.  See the bugzilla:

 

https://bugzilla.redhat.com/show_bug.cgi?id=751353

 

The weird part is that RH engineer (and the changelog) says that the only change from 1.2.3-7 to 1.2.3-7.1 was:

 

* Tue Sep 27 2011 Steve Dickson <steved@redhat.com> 1.2.3-7.1
- umount: allow spaces in unmount paths (bz 731309)

 

So out of curiousity we regenerate a RPM of 1.2.3-7 using the source RPM from RH and sure enough the locally compiled rpc.gssd dumps core as well (while the binary from RH's rpm does not)!  This leads us to wonder maybe the bug has already been there even before 1.2.3-7.1 and somehow was masked by compiler, shared libraries, etc.?