The Linux/NFS 16 group limit
I just spent a couple of hours wondering why I suddenly couldn’t use files belonging to a particular group on my Linux system in work. For example, a directory is group-owned by a group called rivetgun, it’s group-writeable and calling groups tells me that I’m a member of that group. But I can’t write to it… what’s up? The permissions of the group look identical to those of other groups that are behaving as normal. Eventually, I try logging in as another member of the same group… it works fine! Okay, so it’s my problem, not the group.
buckley@h6:~$ groups users rivet hepforge hepdata professor cedar lhapdf jimmy ktjet hztool jetweb hepml hzsteer heptex feynml pyfeyn rivetgun hepjet statpattrec rr buckley@h6:~$ touch hepjet/foo touch: cannot touch 'hepjet/foo': Permission denied buckley@h6:~$ touch pyfeyn/foo buckley@h6:~$
Let’s count from the left: 1 is users, 2 is rivet, 3 is hepforge, … 16 is pyfeyn, 17 is rivetgun. Aha, suspicious. Google delivers a handy link to this blog entry:
Synopsis: UNIX users can belong to UNIX groups and for many years the maximum number of groups in Solaris has been limited to 16. Increasing it sounds easy and of obvious benefit. It turns out to be neither, read on.
The same logic applies to Linux - the 2.6 kernel can handle up to 64k groups per user, but still truncates them at 16 when making use of the AUTH_SYS authentication mechanism. Bah!
Okay, so the weird error is understood and fortunately I can remove myself from some of the groups in the first 16, so the immediate problem can be worked around for now. But this is a real problem - looks like more of my non-existent admin time will have to be devoted to finding a sensible ACL system next time I upgrade the server. sigh
Update: I’ve also found this blog entry which if anything is even more useful. It’s quite definite and up-front with the opinion that NFS users should be using ACLs and ditching AUTH_SYS for RPCSEC_GSS… so when am I going to find the time and resources to convert HepForge to such a system? Hum. At least this issue came up before I re-wrote all the management scripts!
Posted by Andy Buckley on Tuesday, May 08, 2007