diff options
Diffstat (limited to 'Documentation/filesystems/nfs')
| -rw-r--r-- | Documentation/filesystems/nfs/00-INDEX | 6 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/Exporting | 9 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/fault_injection.txt | 69 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/idmapper.txt | 24 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/nfs.txt | 44 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/nfs41-server.txt | 85 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/nfsd-admin-interfaces.txt | 41 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/nfsroot.txt | 12 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/pnfs.txt | 63 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/rpc-server-gss.txt | 91 | 
10 files changed, 366 insertions, 78 deletions
diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX index a57e12411d2..53f3b596ac0 100644 --- a/Documentation/filesystems/nfs/00-INDEX +++ b/Documentation/filesystems/nfs/00-INDEX @@ -2,6 +2,8 @@  	- this file (nfs-related documentation).  Exporting  	- explanation of how to make filesystems exportable. +fault_injection.txt +	- information for using fault injection on the server  knfsd-stats.txt  	- statistics which the NFS server makes available to user space.  nfs.txt @@ -10,6 +12,8 @@ nfs41-server.txt  	- info on the Linux server implementation of NFSv4 minor version 1.  nfs-rdma.txt  	- how to install and setup the Linux NFS/RDMA client and server software +nfsd-admin-interfaces.txt +	- Administrative interfaces for nfsd.  nfsroot.txt  	- short guide on setting up a diskless box with NFS root filesystem.  pnfs.txt @@ -18,3 +22,5 @@ rpc-cache.txt  	- introduction to the caching mechanisms in the sunrpc layer.  idmapper.txt  	- information for configuring request-keys to be used by idmapper +rpc-server-gss.txt +	- Information on GSS authentication support in the NFS Server diff --git a/Documentation/filesystems/nfs/Exporting b/Documentation/filesystems/nfs/Exporting index 87019d2b598..e543b1a619c 100644 --- a/Documentation/filesystems/nfs/Exporting +++ b/Documentation/filesystems/nfs/Exporting @@ -92,7 +92,14 @@ For a filesystem to be exportable it must:     1/ provide the filehandle fragment routines described below.     2/ make sure that d_splice_alias is used rather than d_add        when ->lookup finds an inode for a given parent and name. -      Typically the ->lookup routine will end with a: + +      If inode is NULL, d_splice_alias(inode, dentry) is equivalent to + +		d_add(dentry, inode), NULL + +      Similarly, d_splice_alias(ERR_PTR(err), dentry) = ERR_PTR(err) + +      Typically the ->lookup routine will simply end with a:  		return d_splice_alias(inode, dentry);  	} diff --git a/Documentation/filesystems/nfs/fault_injection.txt b/Documentation/filesystems/nfs/fault_injection.txt new file mode 100644 index 00000000000..426d166089a --- /dev/null +++ b/Documentation/filesystems/nfs/fault_injection.txt @@ -0,0 +1,69 @@ + +Fault Injection +=============== +Fault injection is a method for forcing errors that may not normally occur, or +may be difficult to reproduce.  Forcing these errors in a controlled environment +can help the developer find and fix bugs before their code is shipped in a +production system.  Injecting an error on the Linux NFS server will allow us to +observe how the client reacts and if it manages to recover its state correctly. + +NFSD_FAULT_INJECTION must be selected when configuring the kernel to use this +feature. + + +Using Fault Injection +===================== +On the client, mount the fault injection server through NFS v4.0+ and do some +work over NFS (open files, take locks, ...). + +On the server, mount the debugfs filesystem to <debug_dir> and ls +<debug_dir>/nfsd.  This will show a list of files that will be used for +injecting faults on the NFS server.  As root, write a number n to the file +corresponding to the action you want the server to take.  The server will then +process the first n items it finds.  So if you want to forget 5 locks, echo '5' +to <debug_dir>/nfsd/forget_locks.  A value of 0 will tell the server to forget +all corresponding items.  A log message will be created containing the number +of items forgotten (check dmesg). + +Go back to work on the client and check if the client recovered from the error +correctly. + + +Available Faults +================ +forget_clients: +     The NFS server keeps a list of clients that have placed a mount call.  If +     this list is cleared, the server will have no knowledge of who the client +     is, forcing the client to reauthenticate with the server. + +forget_openowners: +     The NFS server keeps a list of what files are currently opened and who +     they were opened by.  Clearing this list will force the client to reopen +     its files. + +forget_locks: +     The NFS server keeps a list of what files are currently locked in the VFS. +     Clearing this list will force the client to reclaim its locks (files are +     unlocked through the VFS as they are cleared from this list). + +forget_delegations: +     A delegation is used to assure the client that a file, or part of a file, +     has not changed since the delegation was awarded.  Clearing this list will +     force the client to reaquire its delegation before accessing the file +     again. + +recall_delegations: +     Delegations can be recalled by the server when another client attempts to +     access a file.  This test will notify the client that its delegation has +     been revoked, forcing the client to reaquire the delegation before using +     the file again. + + +tools/nfs/inject_faults.sh script +================================= +This script has been created to ease the fault injection process.  This script +will detect the mounted debugfs directory and write to the files located there +based on the arguments passed by the user.  For example, running +`inject_faults.sh forget_locks 1` as root will instruct the server to forget +one lock.  Running `inject_faults forget_locks` will instruct the server to +forgetall locks. diff --git a/Documentation/filesystems/nfs/idmapper.txt b/Documentation/filesystems/nfs/idmapper.txt index b9b4192ea8b..fe03d10bb79 100644 --- a/Documentation/filesystems/nfs/idmapper.txt +++ b/Documentation/filesystems/nfs/idmapper.txt @@ -4,13 +4,21 @@ ID Mapper  =========  Id mapper is used by NFS to translate user and group ids into names, and to  translate user and group names into ids.  Part of this translation involves -performing an upcall to userspace to request the information.  Id mapper will -user request-key to perform this upcall and cache the result.  The program -/usr/sbin/nfs.idmap should be called by request-key, and will perform the -translation and initialize a key with the resulting information. +performing an upcall to userspace to request the information.  There are two +ways NFS could obtain this information: placing a call to /sbin/request-key +or by placing a call to the rpc.idmap daemon. + +NFS will attempt to call /sbin/request-key first.  If this succeeds, the +result will be cached using the generic request-key cache.  This call should +only fail if /etc/request-key.conf is not configured for the id_resolver key +type, see the "Configuring" section below if you wish to use the request-key +method. + +If the call to /sbin/request-key fails (if /etc/request-key.conf is not +configured with the id_resolver key type), then the idmapper will ask the +legacy rpc.idmap daemon for the id mapping.  This result will be stored +in a custom NFS idmap cache. - NFS_USE_NEW_IDMAPPER must be selected when configuring the kernel to use this - feature.  ===========  Configuring @@ -47,8 +55,8 @@ request-key will find the first matching line and corresponding program.  In  this case, /some/other/program will handle all uid lookups and  /usr/sbin/nfs.idmap will handle gid, user, and group lookups. -See <file:Documentation/keys-request-keys.txt> for more information about the -request-key function. +See <file:Documentation/security/keys-request-key.txt> for more information +about the request-key function.  ========= diff --git a/Documentation/filesystems/nfs/nfs.txt b/Documentation/filesystems/nfs/nfs.txt index f50f26ce6cd..f2571c8bef7 100644 --- a/Documentation/filesystems/nfs/nfs.txt +++ b/Documentation/filesystems/nfs/nfs.txt @@ -12,9 +12,47 @@ and work is in progress on adding support for minor version 1 of the NFSv4  protocol.  The purpose of this document is to provide information on some of the -upcall interfaces that are used in order to provide the NFS client with -some of the information that it requires in order to fully comply with -the NFS spec. +special features of the NFS client that can be configured by system +administrators. + + +The nfs4_unique_id parameter +============================ + +NFSv4 requires clients to identify themselves to servers with a unique +string.  File open and lock state shared between one client and one server +is associated with this identity.  To support robust NFSv4 state recovery +and transparent state migration, this identity string must not change +across client reboots. + +Without any other intervention, the Linux client uses a string that contains +the local system's node name.  System administrators, however, often do not +take care to ensure that node names are fully qualified and do not change +over the lifetime of a client system.  Node names can have other +administrative requirements that require particular behavior that does not +work well as part of an nfs_client_id4 string. + +The nfs.nfs4_unique_id boot parameter specifies a unique string that can be +used instead of a system's node name when an NFS client identifies itself to +a server.  Thus, if the system's node name is not unique, or it changes, its +nfs.nfs4_unique_id stays the same, preventing collision with other clients +or loss of state during NFS reboot recovery or transparent state migration. + +The nfs.nfs4_unique_id string is typically a UUID, though it can contain +anything that is believed to be unique across all NFS clients.  An +nfs4_unique_id string should be chosen when a client system is installed, +just as a system's root file system gets a fresh UUID in its label at +install time. + +The string should remain fixed for the lifetime of the client.  It can be +changed safely if care is taken that the client shuts down cleanly and all +outstanding NFSv4 state has expired, to prevent loss of NFSv4 state. + +This string can be stored in an NFS client's grub.conf, or it can be provided +via a net boot facility such as PXE.  It may also be specified as an nfs.ko +module parameter.  Specifying a uniquifier string is not support for NFS +clients running in containers. +  The DNS resolver  ================ diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt index 04884914a1c..c49cd7e796e 100644 --- a/Documentation/filesystems/nfs/nfs41-server.txt +++ b/Documentation/filesystems/nfs/nfs41-server.txt @@ -5,11 +5,11 @@ Server support for minorversion 1 can be controlled using the  by reading this file will contain either "+4.1" or "-4.1"  correspondingly. -Currently, server support for minorversion 1 is disabled by default. -It can be enabled at run time by writing the string "+4.1" to +Currently, server support for minorversion 1 is enabled by default. +It can be disabled at run time by writing the string "-4.1" to  the /proc/fs/nfsd/versions control file.  Note that to write this -control file, the nfsd service must be taken down.  Use your user-mode -nfs-utils to set this up; see rpc.nfsd(8) +control file, the nfsd service must be taken down.  You can use rpc.nfsd +for this; see rpc.nfsd(8).  (Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and  "-4", respectively.  Therefore, code meant to work on both new and old @@ -29,49 +29,6 @@ are still under development out of tree.  See http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design  for more information. -The current implementation is intended for developers only: while it -does support ordinary file operations on clients we have tested against -(including the linux client), it is incomplete in ways which may limit -features unexpectedly, cause known bugs in rare cases, or cause -interoperability problems with future clients.  Known issues: - -	- gss support is questionable: currently mounts with kerberos -	  from a linux client are possible, but we aren't really -	  conformant with the spec (for example, we don't use kerberos -	  on the backchannel correctly). -	- no trunking support: no clients currently take advantage of -	  trunking, but this is a mandatory feature, and its use is -	  recommended to clients in a number of places.  (E.g. to ensure -	  timely renewal in case an existing connection's retry timeouts -	  have gotten too long; see section 8.3 of the RFC.) -	  Therefore, lack of this feature may cause future clients to -	  fail. -	- Incomplete backchannel support: incomplete backchannel gss -	  support and no support for BACKCHANNEL_CTL mean that -	  callbacks (hence delegations and layouts) may not be -	  available and clients confused by the incomplete -	  implementation may fail. -	- Server reboot recovery is unsupported; if the server reboots, -	  clients may fail. -	- We do not support SSV, which provides security for shared -	  client-server state (thus preventing unauthorized tampering -	  with locks and opens, for example).  It is mandatory for -	  servers to support this, though no clients use it yet. -	- Mandatory operations which we do not support, such as -	  DESTROY_CLIENTID, FREE_STATEID, SECINFO_NO_NAME, and -	  TEST_STATEID, are not currently used by clients, but will be -	  (and the spec recommends their uses in common cases), and -	  clients should not be expected to know how to recover from the -	  case where they are not supported.  This will eventually cause -	  interoperability failures. - -In addition, some limitations are inherited from the current NFSv4 -implementation: - -	- Incomplete delegation enforcement: if a file is renamed or -	  unlinked, a client holding a delegation may continue to -	  indefinitely allow opens of the file under the old name. -  The table below, taken from the NFSv4.1 document, lists  the operations that are mandatory to implement (REQ), optional  (OPT), and NFSv4.0 operations that are required not to implement (MNI) @@ -98,8 +55,8 @@ Operations     |                      | MNI        | or OPT)      |                |     +----------------------+------------+--------------+----------------+     | ACCESS               | REQ        |              | Section 18.1   | -NS | BACKCHANNEL_CTL      | REQ        |              | Section 18.33  | -NS | BIND_CONN_TO_SESSION | REQ        |              | Section 18.34  | +I  | BACKCHANNEL_CTL      | REQ        |              | Section 18.33  | +I  | BIND_CONN_TO_SESSION | REQ        |              | Section 18.34  |     | CLOSE                | REQ        |              | Section 18.2   |     | COMMIT               | REQ        |              | Section 18.3   |     | CREATE               | REQ        |              | Section 18.4   | @@ -108,10 +65,10 @@ NS*| DELEGPURGE           | OPT        | FDELG (REQ)  | Section 18.5   |     | DELEGRETURN          | OPT        | FDELG,       | Section 18.6   |     |                      |            | DDELG, pNFS  |                |     |                      |            | (REQ)        |                | -NS | DESTROY_CLIENTID     | REQ        |              | Section 18.50  | +I  | DESTROY_CLIENTID     | REQ        |              | Section 18.50  |  I  | DESTROY_SESSION      | REQ        |              | Section 18.37  |  I  | EXCHANGE_ID          | REQ        |              | Section 18.35  | -NS | FREE_STATEID         | REQ        |              | Section 18.38  | +I  | FREE_STATEID         | REQ        |              | Section 18.38  |     | GETATTR              | REQ        |              | Section 18.7   |  P  | GETDEVICEINFO        | OPT        | pNFS (REQ)   | Section 18.40  |  P  | GETDEVICELIST        | OPT        | pNFS (OPT)   | Section 18.41  | @@ -145,14 +102,14 @@ NS*| OPENATTR             | OPT        |              | Section 18.17  |     | RESTOREFH            | REQ        |              | Section 18.27  |     | SAVEFH               | REQ        |              | Section 18.28  |     | SECINFO              | REQ        |              | Section 18.29  | -NS | SECINFO_NO_NAME      | REC        | pNFS files   | Section 18.45, | +I  | SECINFO_NO_NAME      | REC        | pNFS files   | Section 18.45, |     |                      |            | layout (REQ) | Section 13.12  |  I  | SEQUENCE             | REQ        |              | Section 18.46  |     | SETATTR              | REQ        |              | Section 18.30  |     | SETCLIENTID          | MNI        |              | N/A            |     | SETCLIENTID_CONFIRM  | MNI        |              | N/A            |  NS | SET_SSV              | REQ        |              | Section 18.47  | -NS | TEST_STATEID         | REQ        |              | Section 18.48  | +I  | TEST_STATEID         | REQ        |              | Section 18.48  |     | VERIFY               | REQ        |              | Section 18.31  |  NS*| WANT_DELEGATION      | OPT        | FDELG (OPT)  | Section 18.49  |     | WRITE                | REQ        |              | Section 18.32  | @@ -189,6 +146,16 @@ NS*| CB_WANTS_CANCELLED      | OPT       | FDELG,      | Section 20.10 |  Implementation notes: +SSV: +* The spec claims this is mandatory, but we don't actually know of any +  implementations, so we're ignoring it for now.  The server returns +  NFS4ERR_ENCR_ALG_UNSUPP on EXCHANGE_ID, which should be future-proof. + +GSS on the backchannel: +* Again, theoretically required but not widely implemented (in +  particular, the current Linux client doesn't request it).  We return +  NFS4ERR_ENCR_ALG_UNSUPP on CREATE_SESSION. +  DELEGPURGE:  * mandatory only for servers that support CLAIM_DELEGATE_PREV and/or    CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that @@ -196,26 +163,18 @@ DELEGPURGE:    now.  EXCHANGE_ID: -* only SP4_NONE state protection supported  * implementation ids are ignored  CREATE_SESSION:  * backchannel attributes are ignored -* backchannel security parameters are ignored  SEQUENCE:  * no support for dynamic slot table renegotiation (optional) -nfsv4.1 COMPOUND rules: -The following cases aren't supported yet: -* Enforcing of NFS4ERR_NOT_ONLY_OP for: BIND_CONN_TO_SESSION, CREATE_SESSION, -  DESTROY_CLIENTID, DESTROY_SESSION, EXCHANGE_ID. -* DESTROY_SESSION MUST be the final operation in the COMPOUND request. -  Nonstandard compound limitations:  * No support for a sessions fore channel RPC compound that requires both a    ca_maxrequestsize request and a ca_maxresponsesize reply, so we may    fail to live up to the promise we made in CREATE_SESSION fore channel    negotiation. -* No more than one IO operation (read, write, readdir) allowed per -  compound. + +See also http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues. diff --git a/Documentation/filesystems/nfs/nfsd-admin-interfaces.txt b/Documentation/filesystems/nfs/nfsd-admin-interfaces.txt new file mode 100644 index 00000000000..56a96fb08a7 --- /dev/null +++ b/Documentation/filesystems/nfs/nfsd-admin-interfaces.txt @@ -0,0 +1,41 @@ +Administrative interfaces for nfsd +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Note that normally these interfaces are used only by the utilities in +nfs-utils. + +nfsd is controlled mainly by pseudofiles under the "nfsd" filesystem, +which is normally mounted at /proc/fs/nfsd/. + +The server is always started by the first write of a nonzero value to +nfsd/threads. + +Before doing that, NFSD can be told which sockets to listen on by +writing to nfsd/portlist; that write may be: + +	- an ascii-encoded file descriptor, which should refer to a +	  bound (and listening, for tcp) socket, or +	- "transportname port", where transportname is currently either +	  "udp", "tcp", or "rdma". + +If nfsd is started without doing any of these, then it will create one +udp and one tcp listener at port 2049 (see nfsd_init_socks). + +On startup, nfsd and lockd grace periods start. + +nfsd is shut down by a write of 0 to nfsd/threads.  All locks and state +are thrown away at that point. + +Between startup and shutdown, the number of threads may be adjusted up +or down by additional writes to nfsd/threads or by writes to +nfsd/pool_threads. + +For more detail about files under nfsd/ and what they control, see +fs/nfsd/nfsctl.c; most of them have detailed comments. + +Implementation notes +^^^^^^^^^^^^^^^^^^^^ + +Note that the rpc server requires the caller to serialize addition and +removal of listening sockets, and startup and shutdown of the server. +For nfsd this is done using nfsd_mutex. diff --git a/Documentation/filesystems/nfs/nfsroot.txt b/Documentation/filesystems/nfs/nfsroot.txt index 90c71c6f0d0..2d66ed68812 100644 --- a/Documentation/filesystems/nfs/nfsroot.txt +++ b/Documentation/filesystems/nfs/nfsroot.txt @@ -78,7 +78,8 @@ nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]  			flags		= hard, nointr, noposix, cto, ac -ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf> +ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>: +   <dns0-ip>:<dns1-ip>    This parameter tells the kernel how to configure IP addresses of devices    and also how to set up the IP routing table. It was originally called @@ -158,6 +159,13 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>                  Default: any +  <dns0-ip>	IP address of first nameserver. +		Value gets exported by /proc/net/pnp which is often linked +		on embedded systems by /etc/resolv.conf. + +  <dns1-ip>	IP address of secound nameserver. +		Same as above. +  nfsrootdebug @@ -226,7 +234,7 @@ They depend on various facilities being available:       	cdrecord.  	e.g. -	  cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso +	  cdrecord dev=ATAPI:1,0,0 arch/x86/boot/image.iso       	For more information on isolinux, including how to create bootdisks       	for prebuilt kernels, see http://syslinux.zytor.com/ diff --git a/Documentation/filesystems/nfs/pnfs.txt b/Documentation/filesystems/nfs/pnfs.txt index bc0b9cfe095..adc81a35fe2 100644 --- a/Documentation/filesystems/nfs/pnfs.txt +++ b/Documentation/filesystems/nfs/pnfs.txt @@ -12,7 +12,7 @@ struct pnfs_layout_hdr  ----------------------  The on-the-wire command LAYOUTGET corresponds to struct  pnfs_layout_segment, usually referred to by the variable name lseg. -Each nfs_inode may hold a pointer to a cache of of these layout +Each nfs_inode may hold a pointer to a cache of these layout  segments in nfsi->layout, of type struct pnfs_layout_hdr.  We reference the header for the inode pointing to it, across each @@ -46,3 +46,64 @@ data server cache  file driver devices refer to data servers, which are kept in a module  level cache.  Its reference is held over the lifetime of the deviceid  pointing to it. + +lseg +---- +lseg maintains an extra reference corresponding to the NFS_LSEG_VALID +bit which holds it in the pnfs_layout_hdr's list.  When the final lseg +is removed from the pnfs_layout_hdr's list, the NFS_LAYOUT_DESTROYED +bit is set, preventing any new lsegs from being added. + +layout drivers +-------------- + +PNFS utilizes what is called layout drivers. The STD defines 3 basic +layout types: "files" "objects" and "blocks". For each of these types +there is a layout-driver with a common function-vectors table which +are called by the nfs-client pnfs-core to implement the different layout +types. + +Files-layout-driver code is in: fs/nfs/nfs4filelayout.c && nfs4filelayoutdev.c +Objects-layout-deriver code is in: fs/nfs/objlayout/.. directory +Blocks-layout-deriver code is in: fs/nfs/blocklayout/.. directory + +objects-layout setup +-------------------- + +As part of the full STD implementation the objlayoutdriver.ko needs, at times, +to automatically login to yet undiscovered iscsi/osd devices. For this the +driver makes up-calles to a user-mode script called *osd_login* + +The path_name of the script to use is by default: +	/sbin/osd_login. +This name can be overridden by the Kernel module parameter: +	objlayoutdriver.osd_login_prog + +If Kernel does not find the osd_login_prog path it will zero it out +and will not attempt farther logins. An admin can then write new value +to the objlayoutdriver.osd_login_prog Kernel parameter to re-enable it. + +The /sbin/osd_login is part of the nfs-utils package, and should usually +be installed on distributions that support this Kernel version. + +The API to the login script is as follows: +	Usage: $0 -u <URI> -o <OSDNAME> -s <SYSTEMID> +	Options: +		-u		target uri e.g. iscsi://<ip>:<port> +				(allways exists) +				(More protocols can be defined in the future. +				 The client does not interpret this string it is +				 passed unchanged as received from the Server) +		-o		osdname of the requested target OSD +				(Might be empty) +				(A string which denotes the OSD name, there is a +				 limit of 64 chars on this string) +		-s 		systemid of the requested target OSD +				(Might be empty) +				(This string, if not empty is always an hex +				 representation of the 20 bytes osd_system_id) + +blocks-layout setup +------------------- + +TODO: Document the setup needs of the blocks layout driver diff --git a/Documentation/filesystems/nfs/rpc-server-gss.txt b/Documentation/filesystems/nfs/rpc-server-gss.txt new file mode 100644 index 00000000000..716f4be8e8b --- /dev/null +++ b/Documentation/filesystems/nfs/rpc-server-gss.txt @@ -0,0 +1,91 @@ + +rpcsec_gss support for kernel RPC servers +========================================= + +This document gives references to the standards and protocols used to +implement RPCGSS authentication in kernel RPC servers such as the NFS +server and the NFS client's NFSv4.0 callback server.  (But note that +NFSv4.1 and higher don't require the client to act as a server for the +purposes of authentication.) + +RPCGSS is specified in a few IETF documents: + - RFC2203 v1: http://tools.ietf.org/rfc/rfc2203.txt + - RFC5403 v2: http://tools.ietf.org/rfc/rfc5403.txt +and there is a 3rd version  being proposed: + - http://tools.ietf.org/id/draft-williams-rpcsecgssv3.txt +   (At draft n. 02 at the time of writing) + +Background +---------- + +The RPCGSS Authentication method describes a way to perform GSSAPI +Authentication for NFS.  Although GSSAPI is itself completely mechanism +agnostic, in many cases only the KRB5 mechanism is supported by NFS +implementations. + +The Linux kernel, at the moment, supports only the KRB5 mechanism, and +depends on GSSAPI extensions that are KRB5 specific. + +GSSAPI is a complex library, and implementing it completely in kernel is +unwarranted. However GSSAPI operations are fundementally separable in 2 +parts: +- initial context establishment +- integrity/privacy protection (signing and encrypting of individual +  packets) + +The former is more complex and policy-independent, but less +performance-sensitive.  The latter is simpler and needs to be very fast. + +Therefore, we perform per-packet integrity and privacy protection in the +kernel, but leave the initial context establishment to userspace.  We +need upcalls to request userspace to perform context establishment. + +NFS Server Legacy Upcall Mechanism +---------------------------------- + +The classic upcall mechanism uses a custom text based upcall mechanism +to talk to a custom daemon called rpc.svcgssd that is provide by the +nfs-utils package. + +This upcall mechanism has 2 limitations: + +A) It can handle tokens that are no bigger than 2KiB + +In some Kerberos deployment GSSAPI tokens can be quite big, up and +beyond 64KiB in size due to various authorization extensions attacked to +the Kerberos tickets, that needs to be sent through the GSS layer in +order to perform context establishment. + +B) It does not properly handle creds where the user is member of more +than a few housand groups (the current hard limit in the kernel is 65K +groups) due to limitation on the size of the buffer that can be send +back to the kernel (4KiB). + +NFS Server New RPC Upcall Mechanism +----------------------------------- + +The newer upcall mechanism uses RPC over a unix socket to a daemon +called gss-proxy, implemented by a userspace program called Gssproxy. + +The gss_proxy RPC protocol is currently documented here: + +	https://fedorahosted.org/gss-proxy/wiki/ProtocolDocumentation + +This upcall mechanism uses the kernel rpc client and connects to the gssproxy +userspace program over a regular unix socket. The gssproxy protocol does not +suffer from the size limitations of the legacy protocol. + +Negotiating Upcall Mechanisms +----------------------------- + +To provide backward compatibility, the kernel defaults to using the +legacy mechanism.  To switch to the new mechanism, gss-proxy must bind +to /var/run/gssproxy.sock and then write "1" to +/proc/net/rpc/use-gss-proxy.  If gss-proxy dies, it must repeat both +steps. + +Once the upcall mechanism is chosen, it cannot be changed.  To prevent +locking into the legacy mechanisms, the above steps must be performed +before starting nfsd.  Whoever starts nfsd can guarantee this by reading +from /proc/net/rpc/use-gss-proxy and checking that it contains a +"1"--the read will block until gss-proxy has done its write to the file.  | 
