aboutsummaryrefslogtreecommitdiff
path: root/Documentation/filesystems/nfs
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems/nfs')
-rw-r--r--Documentation/filesystems/nfs/00-INDEX6
-rw-r--r--Documentation/filesystems/nfs/Exporting9
-rw-r--r--Documentation/filesystems/nfs/fault_injection.txt69
-rw-r--r--Documentation/filesystems/nfs/idmapper.txt24
-rw-r--r--Documentation/filesystems/nfs/nfs.txt44
-rw-r--r--Documentation/filesystems/nfs/nfs41-server.txt85
-rw-r--r--Documentation/filesystems/nfs/nfsd-admin-interfaces.txt41
-rw-r--r--Documentation/filesystems/nfs/nfsroot.txt12
-rw-r--r--Documentation/filesystems/nfs/pnfs.txt63
-rw-r--r--Documentation/filesystems/nfs/rpc-server-gss.txt91
10 files changed, 366 insertions, 78 deletions
diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX
index a57e12411d2..53f3b596ac0 100644
--- a/Documentation/filesystems/nfs/00-INDEX
+++ b/Documentation/filesystems/nfs/00-INDEX
@@ -2,6 +2,8 @@
- this file (nfs-related documentation).
Exporting
- explanation of how to make filesystems exportable.
+fault_injection.txt
+ - information for using fault injection on the server
knfsd-stats.txt
- statistics which the NFS server makes available to user space.
nfs.txt
@@ -10,6 +12,8 @@ nfs41-server.txt
- info on the Linux server implementation of NFSv4 minor version 1.
nfs-rdma.txt
- how to install and setup the Linux NFS/RDMA client and server software
+nfsd-admin-interfaces.txt
+ - Administrative interfaces for nfsd.
nfsroot.txt
- short guide on setting up a diskless box with NFS root filesystem.
pnfs.txt
@@ -18,3 +22,5 @@ rpc-cache.txt
- introduction to the caching mechanisms in the sunrpc layer.
idmapper.txt
- information for configuring request-keys to be used by idmapper
+rpc-server-gss.txt
+ - Information on GSS authentication support in the NFS Server
diff --git a/Documentation/filesystems/nfs/Exporting b/Documentation/filesystems/nfs/Exporting
index 87019d2b598..e543b1a619c 100644
--- a/Documentation/filesystems/nfs/Exporting
+++ b/Documentation/filesystems/nfs/Exporting
@@ -92,7 +92,14 @@ For a filesystem to be exportable it must:
1/ provide the filehandle fragment routines described below.
2/ make sure that d_splice_alias is used rather than d_add
when ->lookup finds an inode for a given parent and name.
- Typically the ->lookup routine will end with a:
+
+ If inode is NULL, d_splice_alias(inode, dentry) is equivalent to
+
+ d_add(dentry, inode), NULL
+
+ Similarly, d_splice_alias(ERR_PTR(err), dentry) = ERR_PTR(err)
+
+ Typically the ->lookup routine will simply end with a:
return d_splice_alias(inode, dentry);
}
diff --git a/Documentation/filesystems/nfs/fault_injection.txt b/Documentation/filesystems/nfs/fault_injection.txt
new file mode 100644
index 00000000000..426d166089a
--- /dev/null
+++ b/Documentation/filesystems/nfs/fault_injection.txt
@@ -0,0 +1,69 @@
+
+Fault Injection
+===============
+Fault injection is a method for forcing errors that may not normally occur, or
+may be difficult to reproduce. Forcing these errors in a controlled environment
+can help the developer find and fix bugs before their code is shipped in a
+production system. Injecting an error on the Linux NFS server will allow us to
+observe how the client reacts and if it manages to recover its state correctly.
+
+NFSD_FAULT_INJECTION must be selected when configuring the kernel to use this
+feature.
+
+
+Using Fault Injection
+=====================
+On the client, mount the fault injection server through NFS v4.0+ and do some
+work over NFS (open files, take locks, ...).
+
+On the server, mount the debugfs filesystem to <debug_dir> and ls
+<debug_dir>/nfsd. This will show a list of files that will be used for
+injecting faults on the NFS server. As root, write a number n to the file
+corresponding to the action you want the server to take. The server will then
+process the first n items it finds. So if you want to forget 5 locks, echo '5'
+to <debug_dir>/nfsd/forget_locks. A value of 0 will tell the server to forget
+all corresponding items. A log message will be created containing the number
+of items forgotten (check dmesg).
+
+Go back to work on the client and check if the client recovered from the error
+correctly.
+
+
+Available Faults
+================
+forget_clients:
+ The NFS server keeps a list of clients that have placed a mount call. If
+ this list is cleared, the server will have no knowledge of who the client
+ is, forcing the client to reauthenticate with the server.
+
+forget_openowners:
+ The NFS server keeps a list of what files are currently opened and who
+ they were opened by. Clearing this list will force the client to reopen
+ its files.
+
+forget_locks:
+ The NFS server keeps a list of what files are currently locked in the VFS.
+ Clearing this list will force the client to reclaim its locks (files are
+ unlocked through the VFS as they are cleared from this list).
+
+forget_delegations:
+ A delegation is used to assure the client that a file, or part of a file,
+ has not changed since the delegation was awarded. Clearing this list will
+ force the client to reaquire its delegation before accessing the file
+ again.
+
+recall_delegations:
+ Delegations can be recalled by the server when another client attempts to
+ access a file. This test will notify the client that its delegation has
+ been revoked, forcing the client to reaquire the delegation before using
+ the file again.
+
+
+tools/nfs/inject_faults.sh script
+=================================
+This script has been created to ease the fault injection process. This script
+will detect the mounted debugfs directory and write to the files located there
+based on the arguments passed by the user. For example, running
+`inject_faults.sh forget_locks 1` as root will instruct the server to forget
+one lock. Running `inject_faults forget_locks` will instruct the server to
+forgetall locks.
diff --git a/Documentation/filesystems/nfs/idmapper.txt b/Documentation/filesystems/nfs/idmapper.txt
index b9b4192ea8b..fe03d10bb79 100644
--- a/Documentation/filesystems/nfs/idmapper.txt
+++ b/Documentation/filesystems/nfs/idmapper.txt
@@ -4,13 +4,21 @@ ID Mapper
=========
Id mapper is used by NFS to translate user and group ids into names, and to
translate user and group names into ids. Part of this translation involves
-performing an upcall to userspace to request the information. Id mapper will
-user request-key to perform this upcall and cache the result. The program
-/usr/sbin/nfs.idmap should be called by request-key, and will perform the
-translation and initialize a key with the resulting information.
+performing an upcall to userspace to request the information. There are two
+ways NFS could obtain this information: placing a call to /sbin/request-key
+or by placing a call to the rpc.idmap daemon.
+
+NFS will attempt to call /sbin/request-key first. If this succeeds, the
+result will be cached using the generic request-key cache. This call should
+only fail if /etc/request-key.conf is not configured for the id_resolver key
+type, see the "Configuring" section below if you wish to use the request-key
+method.
+
+If the call to /sbin/request-key fails (if /etc/request-key.conf is not
+configured with the id_resolver key type), then the idmapper will ask the
+legacy rpc.idmap daemon for the id mapping. This result will be stored
+in a custom NFS idmap cache.
- NFS_USE_NEW_IDMAPPER must be selected when configuring the kernel to use this
- feature.
===========
Configuring
@@ -47,8 +55,8 @@ request-key will find the first matching line and corresponding program. In
this case, /some/other/program will handle all uid lookups and
/usr/sbin/nfs.idmap will handle gid, user, and group lookups.
-See <file:Documentation/keys-request-keys.txt> for more information about the
-request-key function.
+See <file:Documentation/security/keys-request-key.txt> for more information
+about the request-key function.
=========
diff --git a/Documentation/filesystems/nfs/nfs.txt b/Documentation/filesystems/nfs/nfs.txt
index f50f26ce6cd..f2571c8bef7 100644
--- a/Documentation/filesystems/nfs/nfs.txt
+++ b/Documentation/filesystems/nfs/nfs.txt
@@ -12,9 +12,47 @@ and work is in progress on adding support for minor version 1 of the NFSv4
protocol.
The purpose of this document is to provide information on some of the
-upcall interfaces that are used in order to provide the NFS client with
-some of the information that it requires in order to fully comply with
-the NFS spec.
+special features of the NFS client that can be configured by system
+administrators.
+
+
+The nfs4_unique_id parameter
+============================
+
+NFSv4 requires clients to identify themselves to servers with a unique
+string. File open and lock state shared between one client and one server
+is associated with this identity. To support robust NFSv4 state recovery
+and transparent state migration, this identity string must not change
+across client reboots.
+
+Without any other intervention, the Linux client uses a string that contains
+the local system's node name. System administrators, however, often do not
+take care to ensure that node names are fully qualified and do not change
+over the lifetime of a client system. Node names can have other
+administrative requirements that require particular behavior that does not
+work well as part of an nfs_client_id4 string.
+
+The nfs.nfs4_unique_id boot parameter specifies a unique string that can be
+used instead of a system's node name when an NFS client identifies itself to
+a server. Thus, if the system's node name is not unique, or it changes, its
+nfs.nfs4_unique_id stays the same, preventing collision with other clients
+or loss of state during NFS reboot recovery or transparent state migration.
+
+The nfs.nfs4_unique_id string is typically a UUID, though it can contain
+anything that is believed to be unique across all NFS clients. An
+nfs4_unique_id string should be chosen when a client system is installed,
+just as a system's root file system gets a fresh UUID in its label at
+install time.
+
+The string should remain fixed for the lifetime of the client. It can be
+changed safely if care is taken that the client shuts down cleanly and all
+outstanding NFSv4 state has expired, to prevent loss of NFSv4 state.
+
+This string can be stored in an NFS client's grub.conf, or it can be provided
+via a net boot facility such as PXE. It may also be specified as an nfs.ko
+module parameter. Specifying a uniquifier string is not support for NFS
+clients running in containers.
+
The DNS resolver
================
diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt
index 04884914a1c..c49cd7e796e 100644
--- a/Documentation/filesystems/nfs/nfs41-server.txt
+++ b/Documentation/filesystems/nfs/nfs41-server.txt
@@ -5,11 +5,11 @@ Server support for minorversion 1 can be controlled using the
by reading this file will contain either "+4.1" or "-4.1"
correspondingly.
-Currently, server support for minorversion 1 is disabled by default.
-It can be enabled at run time by writing the string "+4.1" to
+Currently, server support for minorversion 1 is enabled by default.
+It can be disabled at run time by writing the string "-4.1" to
the /proc/fs/nfsd/versions control file. Note that to write this
-control file, the nfsd service must be taken down. Use your user-mode
-nfs-utils to set this up; see rpc.nfsd(8)
+control file, the nfsd service must be taken down. You can use rpc.nfsd
+for this; see rpc.nfsd(8).
(Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and
"-4", respectively. Therefore, code meant to work on both new and old
@@ -29,49 +29,6 @@ are still under development out of tree.
See http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design
for more information.
-The current implementation is intended for developers only: while it
-does support ordinary file operations on clients we have tested against
-(including the linux client), it is incomplete in ways which may limit
-features unexpectedly, cause known bugs in rare cases, or cause
-interoperability problems with future clients. Known issues:
-
- - gss support is questionable: currently mounts with kerberos
- from a linux client are possible, but we aren't really
- conformant with the spec (for example, we don't use kerberos
- on the backchannel correctly).
- - no trunking support: no clients currently take advantage of
- trunking, but this is a mandatory feature, and its use is
- recommended to clients in a number of places. (E.g. to ensure
- timely renewal in case an existing connection's retry timeouts
- have gotten too long; see section 8.3 of the RFC.)
- Therefore, lack of this feature may cause future clients to
- fail.
- - Incomplete backchannel support: incomplete backchannel gss
- support and no support for BACKCHANNEL_CTL mean that
- callbacks (hence delegations and layouts) may not be
- available and clients confused by the incomplete
- implementation may fail.
- - Server reboot recovery is unsupported; if the server reboots,
- clients may fail.
- - We do not support SSV, which provides security for shared
- client-server state (thus preventing unauthorized tampering
- with locks and opens, for example). It is mandatory for
- servers to support this, though no clients use it yet.
- - Mandatory operations which we do not support, such as
- DESTROY_CLIENTID, FREE_STATEID, SECINFO_NO_NAME, and
- TEST_STATEID, are not currently used by clients, but will be
- (and the spec recommends their uses in common cases), and
- clients should not be expected to know how to recover from the
- case where they are not supported. This will eventually cause
- interoperability failures.
-
-In addition, some limitations are inherited from the current NFSv4
-implementation:
-
- - Incomplete delegation enforcement: if a file is renamed or
- unlinked, a client holding a delegation may continue to
- indefinitely allow opens of the file under the old name.
-
The table below, taken from the NFSv4.1 document, lists
the operations that are mandatory to implement (REQ), optional
(OPT), and NFSv4.0 operations that are required not to implement (MNI)
@@ -98,8 +55,8 @@ Operations
| | MNI | or OPT) | |
+----------------------+------------+--------------+----------------+
| ACCESS | REQ | | Section 18.1 |
-NS | BACKCHANNEL_CTL | REQ | | Section 18.33 |
-NS | BIND_CONN_TO_SESSION | REQ | | Section 18.34 |
+I | BACKCHANNEL_CTL | REQ | | Section 18.33 |
+I | BIND_CONN_TO_SESSION | REQ | | Section 18.34 |
| CLOSE | REQ | | Section 18.2 |
| COMMIT | REQ | | Section 18.3 |
| CREATE | REQ | | Section 18.4 |
@@ -108,10 +65,10 @@ NS*| DELEGPURGE | OPT | FDELG (REQ) | Section 18.5 |
| DELEGRETURN | OPT | FDELG, | Section 18.6 |
| | | DDELG, pNFS | |
| | | (REQ) | |
-NS | DESTROY_CLIENTID | REQ | | Section 18.50 |
+I | DESTROY_CLIENTID | REQ | | Section 18.50 |
I | DESTROY_SESSION | REQ | | Section 18.37 |
I | EXCHANGE_ID | REQ | | Section 18.35 |
-NS | FREE_STATEID | REQ | | Section 18.38 |
+I | FREE_STATEID | REQ | | Section 18.38 |
| GETATTR | REQ | | Section 18.7 |
P | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 |
P | GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 |
@@ -145,14 +102,14 @@ NS*| OPENATTR | OPT | | Section 18.17 |
| RESTOREFH | REQ | | Section 18.27 |
| SAVEFH | REQ | | Section 18.28 |
| SECINFO | REQ | | Section 18.29 |
-NS | SECINFO_NO_NAME | REC | pNFS files | Section 18.45, |
+I | SECINFO_NO_NAME | REC | pNFS files | Section 18.45, |
| | | layout (REQ) | Section 13.12 |
I | SEQUENCE | REQ | | Section 18.46 |
| SETATTR | REQ | | Section 18.30 |
| SETCLIENTID | MNI | | N/A |
| SETCLIENTID_CONFIRM | MNI | | N/A |
NS | SET_SSV | REQ | | Section 18.47 |
-NS | TEST_STATEID | REQ | | Section 18.48 |
+I | TEST_STATEID | REQ | | Section 18.48 |
| VERIFY | REQ | | Section 18.31 |
NS*| WANT_DELEGATION | OPT | FDELG (OPT) | Section 18.49 |
| WRITE | REQ | | Section 18.32 |
@@ -189,6 +146,16 @@ NS*| CB_WANTS_CANCELLED | OPT | FDELG, | Section 20.10 |
Implementation notes:
+SSV:
+* The spec claims this is mandatory, but we don't actually know of any
+ implementations, so we're ignoring it for now. The server returns
+ NFS4ERR_ENCR_ALG_UNSUPP on EXCHANGE_ID, which should be future-proof.
+
+GSS on the backchannel:
+* Again, theoretically required but not widely implemented (in
+ particular, the current Linux client doesn't request it). We return
+ NFS4ERR_ENCR_ALG_UNSUPP on CREATE_SESSION.
+
DELEGPURGE:
* mandatory only for servers that support CLAIM_DELEGATE_PREV and/or
CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that
@@ -196,26 +163,18 @@ DELEGPURGE:
now.
EXCHANGE_ID:
-* only SP4_NONE state protection supported
* implementation ids are ignored
CREATE_SESSION:
* backchannel attributes are ignored
-* backchannel security parameters are ignored
SEQUENCE:
* no support for dynamic slot table renegotiation (optional)
-nfsv4.1 COMPOUND rules:
-The following cases aren't supported yet:
-* Enforcing of NFS4ERR_NOT_ONLY_OP for: BIND_CONN_TO_SESSION, CREATE_SESSION,
- DESTROY_CLIENTID, DESTROY_SESSION, EXCHANGE_ID.
-* DESTROY_SESSION MUST be the final operation in the COMPOUND request.
-
Nonstandard compound limitations:
* No support for a sessions fore channel RPC compound that requires both a
ca_maxrequestsize request and a ca_maxresponsesize reply, so we may
fail to live up to the promise we made in CREATE_SESSION fore channel
negotiation.
-* No more than one IO operation (read, write, readdir) allowed per
- compound.
+
+See also http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues.
diff --git a/Documentation/filesystems/nfs/nfsd-admin-interfaces.txt b/Documentation/filesystems/nfs/nfsd-admin-interfaces.txt
new file mode 100644
index 00000000000..56a96fb08a7
--- /dev/null
+++ b/Documentation/filesystems/nfs/nfsd-admin-interfaces.txt
@@ -0,0 +1,41 @@
+Administrative interfaces for nfsd
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Note that normally these interfaces are used only by the utilities in
+nfs-utils.
+
+nfsd is controlled mainly by pseudofiles under the "nfsd" filesystem,
+which is normally mounted at /proc/fs/nfsd/.
+
+The server is always started by the first write of a nonzero value to
+nfsd/threads.
+
+Before doing that, NFSD can be told which sockets to listen on by
+writing to nfsd/portlist; that write may be:
+
+ - an ascii-encoded file descriptor, which should refer to a
+ bound (and listening, for tcp) socket, or
+ - "transportname port", where transportname is currently either
+ "udp", "tcp", or "rdma".
+
+If nfsd is started without doing any of these, then it will create one
+udp and one tcp listener at port 2049 (see nfsd_init_socks).
+
+On startup, nfsd and lockd grace periods start.
+
+nfsd is shut down by a write of 0 to nfsd/threads. All locks and state
+are thrown away at that point.
+
+Between startup and shutdown, the number of threads may be adjusted up
+or down by additional writes to nfsd/threads or by writes to
+nfsd/pool_threads.
+
+For more detail about files under nfsd/ and what they control, see
+fs/nfsd/nfsctl.c; most of them have detailed comments.
+
+Implementation notes
+^^^^^^^^^^^^^^^^^^^^
+
+Note that the rpc server requires the caller to serialize addition and
+removal of listening sockets, and startup and shutdown of the server.
+For nfsd this is done using nfsd_mutex.
diff --git a/Documentation/filesystems/nfs/nfsroot.txt b/Documentation/filesystems/nfs/nfsroot.txt
index 90c71c6f0d0..2d66ed68812 100644
--- a/Documentation/filesystems/nfs/nfsroot.txt
+++ b/Documentation/filesystems/nfs/nfsroot.txt
@@ -78,7 +78,8 @@ nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
flags = hard, nointr, noposix, cto, ac
-ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>
+ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:
+ <dns0-ip>:<dns1-ip>
This parameter tells the kernel how to configure IP addresses of devices
and also how to set up the IP routing table. It was originally called
@@ -158,6 +159,13 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>
Default: any
+ <dns0-ip> IP address of first nameserver.
+ Value gets exported by /proc/net/pnp which is often linked
+ on embedded systems by /etc/resolv.conf.
+
+ <dns1-ip> IP address of secound nameserver.
+ Same as above.
+
nfsrootdebug
@@ -226,7 +234,7 @@ They depend on various facilities being available:
cdrecord.
e.g.
- cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso
+ cdrecord dev=ATAPI:1,0,0 arch/x86/boot/image.iso
For more information on isolinux, including how to create bootdisks
for prebuilt kernels, see http://syslinux.zytor.com/
diff --git a/Documentation/filesystems/nfs/pnfs.txt b/Documentation/filesystems/nfs/pnfs.txt
index bc0b9cfe095..adc81a35fe2 100644
--- a/Documentation/filesystems/nfs/pnfs.txt
+++ b/Documentation/filesystems/nfs/pnfs.txt
@@ -12,7 +12,7 @@ struct pnfs_layout_hdr
----------------------
The on-the-wire command LAYOUTGET corresponds to struct
pnfs_layout_segment, usually referred to by the variable name lseg.
-Each nfs_inode may hold a pointer to a cache of of these layout
+Each nfs_inode may hold a pointer to a cache of these layout
segments in nfsi->layout, of type struct pnfs_layout_hdr.
We reference the header for the inode pointing to it, across each
@@ -46,3 +46,64 @@ data server cache
file driver devices refer to data servers, which are kept in a module
level cache. Its reference is held over the lifetime of the deviceid
pointing to it.
+
+lseg
+----
+lseg maintains an extra reference corresponding to the NFS_LSEG_VALID
+bit which holds it in the pnfs_layout_hdr's list. When the final lseg
+is removed from the pnfs_layout_hdr's list, the NFS_LAYOUT_DESTROYED
+bit is set, preventing any new lsegs from being added.
+
+layout drivers
+--------------
+
+PNFS utilizes what is called layout drivers. The STD defines 3 basic
+layout types: "files" "objects" and "blocks". For each of these types
+there is a layout-driver with a common function-vectors table which
+are called by the nfs-client pnfs-core to implement the different layout
+types.
+
+Files-layout-driver code is in: fs/nfs/nfs4filelayout.c && nfs4filelayoutdev.c
+Objects-layout-deriver code is in: fs/nfs/objlayout/.. directory
+Blocks-layout-deriver code is in: fs/nfs/blocklayout/.. directory
+
+objects-layout setup
+--------------------
+
+As part of the full STD implementation the objlayoutdriver.ko needs, at times,
+to automatically login to yet undiscovered iscsi/osd devices. For this the
+driver makes up-calles to a user-mode script called *osd_login*
+
+The path_name of the script to use is by default:
+ /sbin/osd_login.
+This name can be overridden by the Kernel module parameter:
+ objlayoutdriver.osd_login_prog
+
+If Kernel does not find the osd_login_prog path it will zero it out
+and will not attempt farther logins. An admin can then write new value
+to the objlayoutdriver.osd_login_prog Kernel parameter to re-enable it.
+
+The /sbin/osd_login is part of the nfs-utils package, and should usually
+be installed on distributions that support this Kernel version.
+
+The API to the login script is as follows:
+ Usage: $0 -u <URI> -o <OSDNAME> -s <SYSTEMID>
+ Options:
+ -u target uri e.g. iscsi://<ip>:<port>
+ (allways exists)
+ (More protocols can be defined in the future.
+ The client does not interpret this string it is
+ passed unchanged as received from the Server)
+ -o osdname of the requested target OSD
+ (Might be empty)
+ (A string which denotes the OSD name, there is a
+ limit of 64 chars on this string)
+ -s systemid of the requested target OSD
+ (Might be empty)
+ (This string, if not empty is always an hex
+ representation of the 20 bytes osd_system_id)
+
+blocks-layout setup
+-------------------
+
+TODO: Document the setup needs of the blocks layout driver
diff --git a/Documentation/filesystems/nfs/rpc-server-gss.txt b/Documentation/filesystems/nfs/rpc-server-gss.txt
new file mode 100644
index 00000000000..716f4be8e8b
--- /dev/null
+++ b/Documentation/filesystems/nfs/rpc-server-gss.txt
@@ -0,0 +1,91 @@
+
+rpcsec_gss support for kernel RPC servers
+=========================================
+
+This document gives references to the standards and protocols used to
+implement RPCGSS authentication in kernel RPC servers such as the NFS
+server and the NFS client's NFSv4.0 callback server. (But note that
+NFSv4.1 and higher don't require the client to act as a server for the
+purposes of authentication.)
+
+RPCGSS is specified in a few IETF documents:
+ - RFC2203 v1: http://tools.ietf.org/rfc/rfc2203.txt
+ - RFC5403 v2: http://tools.ietf.org/rfc/rfc5403.txt
+and there is a 3rd version being proposed:
+ - http://tools.ietf.org/id/draft-williams-rpcsecgssv3.txt
+ (At draft n. 02 at the time of writing)
+
+Background
+----------
+
+The RPCGSS Authentication method describes a way to perform GSSAPI
+Authentication for NFS. Although GSSAPI is itself completely mechanism
+agnostic, in many cases only the KRB5 mechanism is supported by NFS
+implementations.
+
+The Linux kernel, at the moment, supports only the KRB5 mechanism, and
+depends on GSSAPI extensions that are KRB5 specific.
+
+GSSAPI is a complex library, and implementing it completely in kernel is
+unwarranted. However GSSAPI operations are fundementally separable in 2
+parts:
+- initial context establishment
+- integrity/privacy protection (signing and encrypting of individual
+ packets)
+
+The former is more complex and policy-independent, but less
+performance-sensitive. The latter is simpler and needs to be very fast.
+
+Therefore, we perform per-packet integrity and privacy protection in the
+kernel, but leave the initial context establishment to userspace. We
+need upcalls to request userspace to perform context establishment.
+
+NFS Server Legacy Upcall Mechanism
+----------------------------------
+
+The classic upcall mechanism uses a custom text based upcall mechanism
+to talk to a custom daemon called rpc.svcgssd that is provide by the
+nfs-utils package.
+
+This upcall mechanism has 2 limitations:
+
+A) It can handle tokens that are no bigger than 2KiB
+
+In some Kerberos deployment GSSAPI tokens can be quite big, up and
+beyond 64KiB in size due to various authorization extensions attacked to
+the Kerberos tickets, that needs to be sent through the GSS layer in
+order to perform context establishment.
+
+B) It does not properly handle creds where the user is member of more
+than a few housand groups (the current hard limit in the kernel is 65K
+groups) due to limitation on the size of the buffer that can be send
+back to the kernel (4KiB).
+
+NFS Server New RPC Upcall Mechanism
+-----------------------------------
+
+The newer upcall mechanism uses RPC over a unix socket to a daemon
+called gss-proxy, implemented by a userspace program called Gssproxy.
+
+The gss_proxy RPC protocol is currently documented here:
+
+ https://fedorahosted.org/gss-proxy/wiki/ProtocolDocumentation
+
+This upcall mechanism uses the kernel rpc client and connects to the gssproxy
+userspace program over a regular unix socket. The gssproxy protocol does not
+suffer from the size limitations of the legacy protocol.
+
+Negotiating Upcall Mechanisms
+-----------------------------
+
+To provide backward compatibility, the kernel defaults to using the
+legacy mechanism. To switch to the new mechanism, gss-proxy must bind
+to /var/run/gssproxy.sock and then write "1" to
+/proc/net/rpc/use-gss-proxy. If gss-proxy dies, it must repeat both
+steps.
+
+Once the upcall mechanism is chosen, it cannot be changed. To prevent
+locking into the legacy mechanisms, the above steps must be performed
+before starting nfsd. Whoever starts nfsd can guarantee this by reading
+from /proc/net/rpc/use-gss-proxy and checking that it contains a
+"1"--the read will block until gss-proxy has done its write to the file.