diff options
author | Linus Torvalds <torvalds@woody.linux-foundation.org> | 2007-04-27 09:26:46 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@woody.linux-foundation.org> | 2007-04-27 09:26:46 -0700 |
commit | 15c54033964a943de7b0763efd3bd0ede7326395 (patch) | |
tree | 840b292612d1b5396d5bab5bde537a9013db3ceb /Documentation | |
parent | ad5da3cf39a5b11a198929be1f2644e17ecd767e (diff) | |
parent | 912a41a4ab935ce8c4308428ec13fc7f8b1f18f4 (diff) |
Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (448 commits)
[IPV4] nl_fib_lookup: Initialise res.r before fib_res_put(&res)
[IPV6]: Fix thinko in ipv6_rthdr_rcv() changes.
[IPV4]: Add multipath cached to feature-removal-schedule.txt
[WIRELESS] cfg80211: Clarify locking comment.
[WIRELESS] cfg80211: Fix locking in wiphy_new.
[WEXT] net_device: Don't include wext bits if not required.
[WEXT]: Misc code cleanups.
[WEXT]: Reduce inline abuse.
[WEXT]: Move EXPORT_SYMBOL statements where they belong.
[WEXT]: Cleanup early ioctl call path.
[WEXT]: Remove options.
[WEXT]: Remove dead debug code.
[WEXT]: Clean up how wext is called.
[WEXT]: Move to net/wireless
[AFS]: Eliminate cmpxchg() usage in vlocation code.
[RXRPC]: Fix pointers passed to bitops.
[RXRPC]: Remove bogus atomic_* overrides.
[AFS]: Fix u64 printing in debug logging.
[AFS]: Add "directory write" support.
[AFS]: Implement the CB.InitCallBackState3 operation.
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/feature-removal-schedule.txt | 40 | ||||
-rw-r--r-- | Documentation/filesystems/afs.txt | 214 | ||||
-rw-r--r-- | Documentation/filesystems/proc.txt | 9 | ||||
-rw-r--r-- | Documentation/keys.txt | 12 | ||||
-rw-r--r-- | Documentation/networking/bonding.txt | 35 | ||||
-rw-r--r-- | Documentation/networking/dccp.txt | 10 | ||||
-rw-r--r-- | Documentation/networking/ip-sysctl.txt | 31 | ||||
-rw-r--r-- | Documentation/networking/rxrpc.txt | 859 | ||||
-rw-r--r-- | Documentation/networking/wan-router.txt | 1 |
9 files changed, 1093 insertions, 118 deletions
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 19b4c96b2a4..6da663607f7 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -211,15 +211,6 @@ Who: Adrian Bunk <bunk@stusta.de> --------------------------- -What: IPv4 only connection tracking/NAT/helpers -When: 2.6.22 -Why: The new layer 3 independant connection tracking replaces the old - IPv4 only version. After some stabilization of the new code the - old one will be removed. -Who: Patrick McHardy <kaber@trash.net> - ---------------------------- - What: ACPI hooks (X86_SPEEDSTEP_CENTRINO_ACPI) in speedstep-centrino driver When: December 2006 Why: Speedstep-centrino driver with ACPI hooks and acpi-cpufreq driver are @@ -294,18 +285,6 @@ Who: Richard Purdie <rpurdie@rpsys.net> --------------------------- -What: Wireless extensions over netlink (CONFIG_NET_WIRELESS_RTNETLINK) -When: with the merge of wireless-dev, 2.6.22 or later -Why: The option/code is - * not enabled on most kernels - * not required by any userspace tools (except an experimental one, - and even there only for some parts, others use ioctl) - * pointless since wext is no longer evolving and the ioctl - interface needs to be kept -Who: Johannes Berg <johannes@sipsolutions.net> - ---------------------------- - What: i8xx_tco watchdog driver When: in 2.6.22 Why: the i8xx_tco watchdog driver has been replaced by the iTCO_wdt @@ -313,3 +292,22 @@ Why: the i8xx_tco watchdog driver has been replaced by the iTCO_wdt Who: Wim Van Sebroeck <wim@iguana.be> --------------------------- + +What: Multipath cached routing support in ipv4 +When: in 2.6.23 +Why: Code was merged, then submitter immediately disappeared leaving + us with no maintainer and lots of bugs. The code should not have + been merged in the first place, and many aspects of it's + implementation are blocking more critical core networking + development. It's marked EXPERIMENTAL and no distribution + enables it because it cause obscure crashes due to unfixable bugs + (interfaces don't return errors so memory allocation can't be + handled, calling contexts of these interfaces make handling + errors impossible too because they get called after we've + totally commited to creating a route object, for example). + This problem has existed for years and no forward progress + has ever been made, and nobody steps up to try and salvage + this code, so we're going to finally just get rid of it. +Who: David S. Miller <davem@davemloft.net> + +--------------------------- diff --git a/Documentation/filesystems/afs.txt b/Documentation/filesystems/afs.txt index 2f4237dfb8c..12ad6c7f4e5 100644 --- a/Documentation/filesystems/afs.txt +++ b/Documentation/filesystems/afs.txt @@ -1,31 +1,82 @@ + ==================== kAFS: AFS FILESYSTEM ==================== -ABOUT -===== +Contents: + + - Overview. + - Usage. + - Mountpoints. + - Proc filesystem. + - The cell database. + - Security. + - Examples. + + +======== +OVERVIEW +======== -This filesystem provides a fairly simple AFS filesystem driver. It is under -development and only provides very basic facilities. It does not yet support -the following AFS features: +This filesystem provides a fairly simple secure AFS filesystem driver. It is +under development and does not yet provide the full feature set. The features +it does support include: - (*) Write support. - (*) Communications security. - (*) Local caching. - (*) pioctl() system call. - (*) Automatic mounting of embedded mountpoints. + (*) Security (currently only AFS kaserver and KerberosIV tickets). + (*) File reading. + (*) Automounting. + +It does not yet support the following AFS features: + + (*) Write support. + + (*) Local caching. + + (*) pioctl() system call. + + +=========== +COMPILATION +=========== + +The filesystem should be enabled by turning on the kernel configuration +options: + + CONFIG_AF_RXRPC - The RxRPC protocol transport + CONFIG_RXKAD - The RxRPC Kerberos security handler + CONFIG_AFS - The AFS filesystem + +Additionally, the following can be turned on to aid debugging: + + CONFIG_AF_RXRPC_DEBUG - Permit AF_RXRPC debugging to be enabled + CONFIG_AFS_DEBUG - Permit AFS debugging to be enabled + +They permit the debugging messages to be turned on dynamically by manipulating +the masks in the following files: + + /sys/module/af_rxrpc/parameters/debug + /sys/module/afs/parameters/debug + + +===== USAGE ===== When inserting the driver modules the root cell must be specified along with a list of volume location server IP addresses: - insmod rxrpc.o + insmod af_rxrpc.o + insmod rxkad.o insmod kafs.o rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91 -The first module is a driver for the RxRPC remote operation protocol, and the -second is the actual filesystem driver for the AFS filesystem. +The first module is the AF_RXRPC network protocol driver. This provides the +RxRPC remote operation protocol and may also be accessed from userspace. See: + + Documentation/networking/rxrpc.txt + +The second module is the kerberos RxRPC security driver, and the third module +is the actual filesystem driver for the AFS filesystem. Once the module has been loaded, more modules can be added by the following procedure: @@ -33,7 +84,7 @@ procedure: echo add grand.central.org 18.7.14.88:128.2.191.224 >/proc/fs/afs/cells Where the parameters to the "add" command are the name of a cell and a list of -volume location servers within that cell. +volume location servers within that cell, with the latter separated by colons. Filesystems can be mounted anywhere by commands similar to the following: @@ -42,11 +93,6 @@ Filesystems can be mounted anywhere by commands similar to the following: mount -t afs "#root.afs." /afs mount -t afs "#root.cell." /afs/cambridge - NB: When using this on Linux 2.4, the mount command has to be different, - since the filesystem doesn't have access to the device name argument: - - mount -t afs none /afs -ovol="#root.afs." - Where the initial character is either a hash or a percent symbol depending on whether you definitely want a R/W volume (hash) or whether you'd prefer a R/O volume, but are willing to use a R/W volume instead (percent). @@ -60,55 +106,66 @@ named volume will be looked up in the cell specified during insmod. Additional cells can be added through /proc (see later section). +=========== MOUNTPOINTS =========== -AFS has a concept of mountpoints. These are specially formatted symbolic links -(of the same form as the "device name" passed to mount). kAFS presents these -to the user as directories that have special properties: +AFS has a concept of mountpoints. In AFS terms, these are specially formatted +symbolic links (of the same form as the "device name" passed to mount). kAFS +presents these to the user as directories that have a follow-link capability +(ie: symbolic link semantics). If anyone attempts to access them, they will +automatically cause the target volume to be mounted (if possible) on that site. - (*) They cannot be listed. Running a program like "ls" on them will incur an - EREMOTE error (Object is remote). +Automatically mounted filesystems will be automatically unmounted approximately +twenty minutes after they were last used. Alternatively they can be unmounted +directly with the umount() system call. - (*) Other objects can't be looked up inside of them. This also incurs an - EREMOTE error. +Manually unmounting an AFS volume will cause any idle submounts upon it to be +culled first. If all are culled, then the requested volume will also be +unmounted, otherwise error EBUSY will be returned. - (*) They can be queried with the readlink() system call, which will return - the name of the mountpoint to which they point. The "readlink" program - will also work. +This can be used by the administrator to attempt to unmount the whole AFS tree +mounted on /afs in one go by doing: - (*) They can be mounted on (which symbolic links can't). + umount /afs +=============== PROC FILESYSTEM =============== -The rxrpc module creates a number of files in various places in the /proc -filesystem: - - (*) Firstly, some information files are made available in a directory called - "/proc/net/rxrpc/". These list the extant transport endpoint, peer, - connection and call records. - - (*) Secondly, some control files are made available in a directory called - "/proc/sys/rxrpc/". Currently, all these files can be used for is to - turn on various levels of tracing. - The AFS modules creates a "/proc/fs/afs/" directory and populates it: - (*) A "cells" file that lists cells currently known to the afs module. + (*) A "cells" file that lists cells currently known to the afs module and + their usage counts: + + [root@andromeda ~]# cat /proc/fs/afs/cells + USE NAME + 3 cambridge.redhat.com (*) A directory per cell that contains files that list volume location servers, volumes, and active servers known within that cell. + [root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/servers + USE ADDR STATE + 4 172.16.18.91 0 + [root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/vlservers + ADDRESS + 172.16.18.91 + [root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/volumes + USE STT VLID[0] VLID[1] VLID[2] NAME + 1 Val 20000000 20000001 20000002 root.afs + +================= THE CELL DATABASE ================= -The filesystem maintains an internal database of all the cells it knows and -the IP addresses of the volume location servers for those cells. The cell to -which the computer belongs is added to the database when insmod is performed -by the "rootcell=" argument. +The filesystem maintains an internal database of all the cells it knows and the +IP addresses of the volume location servers for those cells. The cell to which +the system belongs is added to the database when insmod is performed by the +"rootcell=" argument or, if compiled in, using a "kafs.rootcell=" argument on +the kernel command line. Further cells can be added by commands similar to the following: @@ -118,20 +175,65 @@ Further cells can be added by commands similar to the following: No other cell database operations are available at this time. +======== +SECURITY +======== + +Secure operations are initiated by acquiring a key using the klog program. A +very primitive klog program is available at: + + http://people.redhat.com/~dhowells/rxrpc/klog.c + +This should be compiled by: + + make klog LDLIBS="-lcrypto -lcrypt -lkrb4 -lkeyutils" + +And then run as: + + ./klog + +Assuming it's successful, this adds a key of type RxRPC, named for the service +and cell, eg: "afs@<cellname>". This can be viewed with the keyctl program or +by cat'ing /proc/keys: + + [root@andromeda ~]# keyctl show + Session Keyring + -3 --alswrv 0 0 keyring: _ses.3268 + 2 --alswrv 0 0 \_ keyring: _uid.0 + 111416553 --als--v 0 0 \_ rxrpc: afs@CAMBRIDGE.REDHAT.COM + +Currently the username, realm, password and proposed ticket lifetime are +compiled in to the program. + +It is not required to acquire a key before using AFS facilities, but if one is +not acquired then all operations will be governed by the anonymous user parts +of the ACLs. + +If a key is acquired, then all AFS operations, including mounts and automounts, +made by a possessor of that key will be secured with that key. + +If a file is opened with a particular key and then the file descriptor is +passed to a process that doesn't have that key (perhaps over an AF_UNIX +socket), then the operations on the file will be made with key that was used to +open the file. + + +======== EXAMPLES ======== -Here's what I use to test this. Some of the names and IP addresses are local -to my internal DNS. My "root.afs" partition has a mount point within it for +Here's what I use to test this. Some of the names and IP addresses are local +to my internal DNS. My "root.afs" partition has a mount point within it for some public volumes volumes. -insmod -S /tmp/rxrpc.o -insmod -S /tmp/kafs.o rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91 +insmod /tmp/rxrpc.o +insmod /tmp/rxkad.o +insmod /tmp/kafs.o rootcell=cambridge.redhat.com:172.16.18.91 mount -t afs \%root.afs. /afs mount -t afs \%cambridge.redhat.com:root.cell. /afs/cambridge.redhat.com/ -echo add grand.central.org 18.7.14.88:128.2.191.224 > /proc/fs/afs/cells +echo add grand.central.org 18.7.14.88:128.2.191.224 > /proc/fs/afs/cells mount -t afs "#grand.central.org:root.cell." /afs/grand.central.org/ mount -t afs "#grand.central.org:root.archive." /afs/grand.central.org/archive mount -t afs "#grand.central.org:root.contrib." /afs/grand.central.org/contrib @@ -141,15 +243,7 @@ mount -t afs "#grand.central.org:root.service." /afs/grand.central.org/service mount -t afs "#grand.central.org:root.software." /afs/grand.central.org/software mount -t afs "#grand.central.org:root.user." /afs/grand.central.org/user -umount /afs/grand.central.org/user -umount /afs/grand.central.org/software -umount /afs/grand.central.org/service -umount /afs/grand.central.org/project -umount /afs/grand.central.org/doc -umount /afs/grand.central.org/contrib -umount /afs/grand.central.org/archive -umount /afs/grand.central.org -umount /afs/cambridge.redhat.com umount /afs rmmod kafs +rmmod rxkad rmmod rxrpc diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 5484ab5efd4..7aaf09b86a5 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -1421,6 +1421,15 @@ fewer messages that will be written. Message_burst controls when messages will be dropped. The default settings limit warning messages to one every five seconds. +warnings +-------- + +This controls console messages from the networking stack that can occur because +of problems on the network like duplicate address or bad checksums. Normally, +this should be enabled, but if the problem persists the messages can be +disabled. + + netdev_max_backlog ------------------ diff --git a/Documentation/keys.txt b/Documentation/keys.txt index 60c665d9cfa..81d9aa09729 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt @@ -859,6 +859,18 @@ payload contents" for more information. void unregister_key_type(struct key_type *type); +Under some circumstances, it may be desirable to desirable to deal with a +bundle of keys. The facility provides access to the keyring type for managing +such a bundle: + + struct key_type key_type_keyring; + +This can be used with a function such as request_key() to find a specific +keyring in a process's keyrings. A keyring thus found can then be searched +with keyring_search(). Note that it is not possible to use request_key() to +search a specific keyring, so using keyrings in this way is of limited utility. + + =================================== NOTES ON ACCESSING PAYLOAD CONTENTS =================================== diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index de809e58092..1da56663083 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -920,40 +920,9 @@ options, you may wish to use the "max_bonds" module parameter, documented above. To create multiple bonding devices with differing options, it -is necessary to load the bonding driver multiple times. Note that -current versions of the sysconfig network initialization scripts -handle this automatically; if your distro uses these scripts, no -special action is needed. See the section Configuring Bonding -Devices, above, if you're not sure about your network initialization -scripts. - - To load multiple instances of the module, it is necessary to -specify a different name for each instance (the module loading system -requires that every loaded module, even multiple instances of the same -module, have a unique name). This is accomplished by supplying -multiple sets of bonding options in /etc/modprobe.conf, for example: - -alias bond0 bonding -options bond0 -o bond0 mode=balance-rr miimon=100 - -alias bond1 bonding -options bond1 -o bond1 mode=balance-alb miimon=50 - - will load the bonding module two times. The first instance is -named "bond0" and creates the bond0 device in balance-rr mode with an -miimon of 100. The second instance is named "bond1" and creates the -bond1 device in balance-alb mode with an miimon of 50. - - In some circumstances (typically with older distributions), -the above does not work, and the second bonding instance never sees -its options. In that case, the second options line can be substituted -as follows: - -install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \ - mode=balance-alb miimon=50 +is necessary to use bonding parameters exported by sysfs, documented +in the section below. - This may be repeated any number of times, specifying a new and -unique name in place of bond1 for each subsequent instance. 3.4 Configuring Bonding Manually via Sysfs ------------------------------------------ diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt index 387482e46c4..4504cc59e40 100644 --- a/Documentation/networking/dccp.txt +++ b/Documentation/networking/dccp.txt @@ -57,6 +57,16 @@ DCCP_SOCKOPT_SEND_CSCOV is for the receiver and has a different meaning: it coverage value are also acceptable. The higher the number, the more restrictive this setting (see [RFC 4340, sec. 9.2.1]). +The following two options apply to CCID 3 exclusively and are getsockopt()-only. +In either case, a TFRC info struct (defined in <linux/tfrc.h>) is returned. +DCCP_SOCKOPT_CCID_RX_INFO + Returns a `struct tfrc_rx_info' in optval; the buffer for optval and + optlen must be set to at least sizeof(struct tfrc_rx_info). +DCCP_SOCKOPT_CCID_TX_INFO + Returns a `struct tfrc_tx_info' in optval; the buffer for optval and + optlen must be set to at least sizeof(struct tfrc_tx_info). + + Sysctl variables ================ Several DCCP default parameters can be managed by the following sysctls diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 702d1d8dd04..af6a63ab902 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -179,11 +179,31 @@ tcp_fin_timeout - INTEGER because they eat maximum 1.5K of memory, but they tend to live longer. Cf. tcp_max_orphans. -tcp_frto - BOOLEAN +tcp_frto - INTEGER Enables F-RTO, an enhanced recovery algorithm for TCP retransmission timeouts. It is particularly beneficial in wireless environments where packet loss is typically due to random radio interference - rather than intermediate router congestion. + rather than intermediate router congestion. If set to 1, basic + version is enabled. 2 enables SACK enhanced F-RTO, which is + EXPERIMENTAL. The basic version can be used also when SACK is + enabled for a flow through tcp_sack sysctl. + +tcp_frto_response - INTEGER + When F-RTO has detected that a TCP retransmission timeout was + spurious (i.e, the timeout would have been avoided had TCP set a + longer retransmission timeout), TCP has several options what to do + next. Possible values are: + 0 Rate halving based; a smooth and conservative response, + results in halved cwnd and ssthresh after one RTT + 1 Very conservative response; not recommended because even + though being valid, it interacts poorly with the rest of + Linux TCP, halves cwnd and ssthresh immediately + 2 Aggressive response; undoes congestion control measures + that are now known to be unnecessary (ignoring the + possibility of a lost retransmission that would require + TCP to be more cautious), cwnd and ssthresh are restored + to the values prior timeout + Default: 0 (rate halving based) tcp_keepalive_time - INTEGER How often TCP sends out keepalive messages when keepalive is enabled. @@ -995,7 +1015,12 @@ bridge-nf-call-ip6tables - BOOLEAN Default: 1 bridge-nf-filter-vlan-tagged - BOOLEAN - 1 : pass bridged vlan-tagged ARP/IP traffic to arptables/iptables. + 1 : pass bridged vlan-tagged ARP/IP/IPv6 traffic to {arp,ip,ip6}tables. + 0 : disable this. + Default: 1 + +bridge-nf-filter-pppoe-tagged - BOOLEAN + 1 : pass bridged pppoe-tagged IP/IPv6 traffic to {ip,ip6}tables. 0 : disable this. Default: 1 diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt new file mode 100644 index 00000000000..cae231b1c13 --- /dev/null +++ b/Documentation/networking/rxrpc.txt @@ -0,0 +1,859 @@ + ====================== + RxRPC NETWORK PROTOCOL + ====================== + +The RxRPC protocol driver provides a reliable two-phase transport on top of UDP +that can be used to perform RxRPC remote operations. This is done over sockets +of AF_RXRPC family, using sendmsg() and recvmsg() with control data to send and +receive data, aborts and errors. + +Contents of this document: + + (*) Overview. + + (*) RxRPC protocol summary. + + (*) AF_RXRPC driver model. + + (*) Control messages. + + (*) Socket options. + + (*) Security. + + (*) Example client usage. + + (*) Example server usage. + + (*) AF_RXRPC kernel interface. + + +======== +OVERVIEW +======== + +RxRPC is a two-layer protocol. There is a session layer which provides +reliable virtual connections using UDP over IPv4 (or IPv6) as the transport +layer, but implements a real network protocol; and there's the presentation +layer which renders structured data to binary blobs and back again using XDR +(as does SunRPC): + + +-------------+ + | Application | + +-------------+ + | XDR | Presentation + +-------------+ + | RxRPC | Session + +-------------+ + | UDP | Transport + +-------------+ + + +AF_RXRPC provides: + + (1) Part of an RxRPC facility for both kernel and userspace applications by + making the session part of it a Linux network protocol (AF_RXRPC). + + (2) A two-phase protocol. The client transmits a blob (the request) and then + receives a blob (the reply), and the server receives the request and then + transmits the reply. + + (3) Retention of the reusable bits of the transport system set up for one call + to speed up subsequent calls. + + (4) A secure protocol, using the Linux kernel's key retention facility to + manage security on the client end. The server end must of necessity be + more active in security negotiations. + +AF_RXRPC does not provide XDR marshalling/presentation facilities. That is +left to the application. AF_RXRPC only deals in blobs. Even the operation ID +is just the first four bytes of the request blob, and as such is beyond the +kernel's interest. + + +Sockets of AF_RXRPC family are: + + (1) created as type SOCK_DGRAM; + + (2) provided with a protocol of the type of underlying transport they're going + to use - currently only PF_INET is supported. + + +The Andrew File System (AFS) is an example of an application that uses this and +that has both kernel (filesystem) and userspace (utility) components. + + +====================== +RXRPC PROTOCOL SUMMARY +====================== + +An overview of the RxRPC protocol: + + (*) RxRPC sits on top of another networking protocol (UDP is the only option + currently), and uses this to provide network transport. UDP ports, for + example, provide transport endpoints. + + (*) RxRPC supports multiple virtual "connections" from any given transport + endpoint, thus allowing the endpoints to be shared, even to the same + remote endpoint. + + (*) Each connection goes to a particular "service". A connection may not go + to multiple services. A service may be considered the RxRPC equivalent of + a port number. AF_RXRPC permits multiple services to share an endpoint. + + (*) Client-originating packets are marked, thus a transport endpoint can be + shared between client and server connections (connections have a + direction). + + (*) Up to a billion connections may be supported concurrently between one + local transport endpoint and one service on one remote endpoint. An RxRPC + connection is described by seven numbers: + + Local address } + Local port } Transport (UDP) address + Remote address } + Remote port } + Direction + Connection ID + Service ID + + (*) Each RxRPC operation is a "call". A connection may make up to four + billion calls, but only up to four calls may be in progress on a + connection at any one time. + + (*) Calls are two-phase and asymmetric: the client sends its request data, + which the service receives; then the service transmits the reply data + which the client receives. + + (*) The data blobs are of indefinite size, the end of a phase is marked with a + flag in the packet. The number of packets of data making up one blob may + not exceed 4 billion, however, as this would cause the sequence number to + wrap. + + (*) The first four bytes of the request data are the service operation ID. + + (*) Security is negotiated on a per-connection basis. The connection is + initiated by the first data packet on it arriving. If security is + requested, the server then issues a "challenge" and then the client + replies with a "response". If the response is successful, the security is + set for the lifetime of that connection, and all subsequent calls made + upon it use that same security. In the event that the server lets a + connection lapse before the client, the security will be renegotiated if + the client uses the connection again. + + (*) Calls use ACK packets to handle reliability. Data packets are also + explicitly sequenced per call. + + (*) There are two types of positive acknowledgement: hard-ACKs and soft-ACKs. + A hard-ACK indicates to the far side that all the data received to a point + has been received and processed; a soft-ACK indicates that the data has + been received but may yet be discarded and re-requested. The sender may + not discard any transmittable packets until they've been hard-ACK'd. + + (*) Reception of a reply data packet implicitly hard-ACK's all the data + packets that make up the request. + + (*) An call is complete when the request has been sent, the reply has been + received and the final hard-ACK on the last packet of the reply has + reached the server. + + (*) An call may be aborted by either end at any time up to its completion. + + +===================== +AF_RXRPC DRIVER MODEL +===================== + +About the AF_RXRPC driver: + + (*) The AF_RXRPC protocol transparently uses internal sockets of the transport + protocol to represent transport endpoints. + + (*) AF_RXRPC sockets map onto RxRPC connection bundles. Actual RxRPC + connections are handled transparently. One client socket may be used to + make multiple simultaneous calls to the same service. One server socket + may handle calls from many clients. + + (*) Additional parallel client connections will be initiated to support extra + concurrent calls, up to a tunable limit. + + (*) Each connection is retained for a certain amount of time [tunable] after + the last call currently using it has completed in case a new call is made + that could reuse it. + + (*) Each internal UDP socket is retained [tunable] for a certain amount of + time [tunable] after the last connection using it discarded, in case a new + connection is made that could use it. + + (*) A client-side connection is only shared between calls if they have have + the same key struct describing their security (and assuming the calls + would otherwise share the connection). Non-secured calls would also be + able to share connections with each other. + + (*) A server-side connection is shared if the client says it is. + + (*) ACK'ing is handled by the protocol driver automatically, including ping + replying. + + (*) SO_KEEPALIVE automatically pings the other side to keep the connection + alive [TODO]. + + (*) If an ICMP error is received, all calls affected by that error will be + aborted with an appropriate network error passed through recvmsg(). + + +Interaction with the user of the RxRPC socket: + + (*) A socket is made into a server socket by binding an address with a + non-zero service ID. + + (*) In the client, sending a request is achieved with one or more sendmsgs, + followed by the reply being received with one or more recvmsgs. + + (*) The first sendmsg for a request to be sent from a client contains a tag to + be used in all other sendmsgs or recvmsgs associated with that call. The + tag is carried in the control data. + + (*) connect() is used to supply a default destination address for a client + socket. This may be overridden by supplying an alternate address to the + first sendmsg() of a call (struct msghdr::msg_name). + + (*) If connect() is called on an unbound client, a random local port will + bound before the operation takes place. + + (*) A server socket may also be used to make client calls. To do this, the + first sendmsg() of the call must specify the target address. The server's + transport endpoint is used to send the packets. + + (*) Once the application has received the last message associated with a call, + the tag is guaranteed not to be seen again, and so it can be used to pin + client resources. A new call can then be initiated with the same tag + without fear of interference. + + (*) In the server, a request is received with one or more recvmsgs, then the + the reply is transmitted with one or more sendmsgs, and then the final ACK + is received with a last recvmsg. + + (*) When sending data for a call, sendmsg is given MSG_MORE if there's more + data to come on that call. + + (*) When receiving data for a call, recvmsg flags MSG_MORE if there's more + data to come for that call. + + (*) When receiving data or messages for a call, MSG_EOR is flagged by recvmsg + to indicate the terminal message for that call. + + (*) A call may be aborted by adding an abort control message to the control + data. Issuing an abort terminates the kernel's use of that call's tag. + Any messages waiting in the receive queue for that call will be discarded. + + (*) Aborts, busy notifications and challenge packets are delivered by recvmsg, + and control data messages will be set to indicate the context. Receiving + an abort or a busy message terminates the kernel's use of that call's tag. + + (*) The control data part of the msghdr struct is used for a number of things: + + (*) The tag of the intended or affected call. + + (*) Sending or receiving errors, aborts and busy notifications. + + (*) Notifications of incoming calls. + + (*) Sending debug requests and receiving debug replies [TODO]. + + (*) When the kernel has received and set up an incoming call, it sends a + message to server application to let it know there's a new call awaiting + its acceptance [recvmsg reports a special control message]. The server + application then uses sendmsg to assign a tag to the new call. Once that + is done, the first part of the request data will be delivered by recvmsg. + + (*) The server application has to provide the server socket with a keyring of + secret keys corresponding to the security types it permits. When a secure + connection is being set up, the kernel looks up the appropriate secret key + in the keyring and then sends a challenge packet to the client and + receives a response packet. The kernel then checks the authorisation of + the packet and either aborts the connection or sets up the security. + + (*) The name of the key a client will use to secure its communications is + nominated by a socket option. + + +Notes on recvmsg: + + (*) If there's a sequence of data messages belonging to a particular call on + the receive queue, then recvmsg will keep working through them until: + + (a) it meets the end of that call's received data, + + (b) it meets a non-data message, + + (c) it meets a message belonging to a different call, or + + (d) it fills the user buffer. + + If recvmsg is called in blocking mode, it will keep sleeping, awaiting the + reception of further data, until one of the above four conditions is met. + + (2) MSG_PEEK operates similarly, but will return immediately if it has put any + data in the buffer rather than sleeping until it can fill the buffer. + + (3) If a data message is only partially consumed in filling a user buffer, + then the remainder of that message will be left on the front of the queue + for the next taker. MSG_TRUNC will never be flagged. + + (4) If there is more data to be had on a call (it hasn't copied the last byte + of the last data message in that phase yet), then MSG_MORE will be + flagged. + + +================ +CONTROL MESSAGES +================ + +AF_RXRPC makes use of control messages in sendmsg() and recvmsg() to multiplex +calls, to invoke certain actions and to report certain conditions. These are: + + MESSAGE ID SRT DATA MEANING + ======================= === =========== =============================== + RXRPC_USER_CALL_ID sr- User ID App's call specifier + RXRPC_ABORT srt Abort code Abort code to issue/received + RXRPC_ACK -rt n/a Final ACK received + RXRPC_NET_ERROR -rt error num Network error on call + RXRPC_BUSY -rt n/a Call rejected (server busy) + RXRPC_LOCAL_ERROR -rt error num Local error encountered + RXRPC_NEW_CALL -r- n/a New call received + RXRPC_ACCEPT s-- n/a Accept new call< |