10 files changed, 568 insertions, 118 deletions
diff --git a/Documentation/networking/altera_tse.txt b/Documentation/networking/altera_tse.txt
new file mode 100644
index 00000000000..3f24df8c6e6
--- /dev/null
+++ b/Documentation/networking/altera_tse.txt
@@ -0,0 +1,263 @@
+       Altera Triple-Speed Ethernet MAC driver
+
+Copyright (C) 2008-2014 Altera Corporation
+
+This is the driver for the Altera Triple-Speed Ethernet (TSE) controllers
+using the SGDMA and MSGDMA soft DMA IP components. The driver uses the
+platform bus to obtain component resources. The designs used to test this
+driver were built for a Cyclone(R) V SOC FPGA board, a Cyclone(R) V FPGA board,
+and tested with ARM and NIOS processor hosts seperately. The anticipated use
+cases are simple communications between an embedded system and an external peer
+for status and simple configuration of the embedded system.
+
+For more information visit www.altera.com and www.rocketboards.org. Support
+forums for the driver may be found on www.rocketboards.org, and a design used
+to test this driver may be found there as well. Support is also available from
+the maintainer of this driver, found in MAINTAINERS.
+
+The Triple-Speed Ethernet, SGDMA, and MSGDMA components are all soft IP
+components that can be assembled and built into an FPGA using the Altera
+Quartus toolchain. Quartus 13.1 and 14.0 were used to build the design that
+this driver was tested against. The sopc2dts tool is used to create the
+device tree for the driver, and may be found at rocketboards.org.
+
+The driver probe function examines the device tree and determines if the
+Triple-Speed Ethernet instance is using an SGDMA or MSGDMA component. The
+probe function then installs the appropriate set of DMA routines to
+initialize, setup transmits, receives, and interrupt handling primitives for
+the respective configurations.
+
+The SGDMA component is to be deprecated in the near future (over the next 1-2
+years as of this writing in early 2014) in favor of the MSGDMA component.
+SGDMA support is included for existing designs and reference in case a
+developer wishes to support their own soft DMA logic and driver support. Any
+new designs should not use the SGDMA.
+
+The SGDMA supports only a single transmit or receive operation at a time, and
+therefore will not perform as well compared to the MSGDMA soft IP. Please
+visit www.altera.com for known, documented SGDMA errata.
+
+Scatter-gather DMA is not supported by the SGDMA or MSGDMA at this time.
+Scatter-gather DMA will be added to a future maintenance update to this
+driver.
+
+Jumbo frames are not supported at this time.
+
+The driver limits PHY operations to 10/100Mbps, and has not yet been fully
+tested for 1Gbps. This support will be added in a future maintenance update.
+
+1) Kernel Configuration
+The kernel configuration option is ALTERA_TSE:
+ Device Drivers ---> Network device support ---> Ethernet driver support --->
+ Altera Triple-Speed Ethernet MAC support (ALTERA_TSE)
+
+2) Driver parameters list:
+	debug: message level (0: no output, 16: all);
+	dma_rx_num: Number of descriptors in the RX list (default is 64);
+	dma_tx_num: Number of descriptors in the TX list (default is 64).
+
+3) Command line options
+Driver parameters can be also passed in command line by using:
+	altera_tse=dma_rx_num:128,dma_tx_num:512
+
+4) Driver information and notes
+
+4.1) Transmit process
+When the driver's transmit routine is called by the kernel, it sets up a
+transmit descriptor by calling the underlying DMA transmit routine (SGDMA or
+MSGDMA), and initites a transmit operation. Once the transmit is complete, an
+interrupt is driven by the transmit DMA logic. The driver handles the transmit
+completion in the context of the interrupt handling chain by recycling
+resource required to send and track the requested transmit operation.
+
+4.2) Receive process
+The driver will post receive buffers to the receive DMA logic during driver
+intialization. Receive buffers may or may not be queued depending upon the
+underlying DMA logic (MSGDMA is able queue receive buffers, SGDMA is not able
+to queue receive buffers to the SGDMA receive logic). When a packet is
+received, the DMA logic generates an interrupt. The driver handles a receive
+interrupt by obtaining the DMA receive logic status, reaping receive
+completions until no more receive completions are available.
+
+4.3) Interrupt Mitigation
+The driver is able to mitigate the number of its DMA interrupts
+using NAPI for receive operations. Interrupt mitigation is not yet supported
+for transmit operations, but will be added in a future maintenance release.
+
+4.4) Ethtool support
+Ethtool is supported. Driver statistics and internal errors can be taken using:
+ethtool -S ethX command. It is possible to dump registers etc.
+
+4.5) PHY Support
+The driver is compatible with PAL to work with PHY and GPHY devices.
+
+4.7) List of source files:
+ o Kconfig
+ o Makefile
+ o altera_tse_main.c: main network device driver
+ o altera_tse_ethtool.c: ethtool support
+ o altera_tse.h: private driver structure and common definitions
+ o altera_msgdma.h: MSGDMA implementation function definitions
+ o altera_sgdma.h: SGDMA implementation function definitions
+ o altera_msgdma.c: MSGDMA implementation
+ o altera_sgdma.c: SGDMA implementation
+ o altera_sgdmahw.h: SGDMA register and descriptor definitions
+ o altera_msgdmahw.h: MSGDMA register and descriptor definitions
+ o altera_utils.c: Driver utility functions
+ o altera_utils.h: Driver utility function definitions
+
+5) Debug Information
+
+The driver exports debug information such as internal statistics,
+debug information, MAC and DMA registers etc.
+
+A user may use the ethtool support to get statistics:
+e.g. using: ethtool -S ethX (that shows the statistics counters)
+or sees the MAC registers: e.g. using: ethtool -d ethX
+
+The developer can also use the "debug" module parameter to get
+further debug information.
+
+6) Statistics Support
+
+The controller and driver support a mix of IEEE standard defined statistics,
+RFC defined statistics, and driver or Altera defined statistics. The four
+specifications containing the standard definitions for these statistics are
+as follows:
+
+ o IEEE 802.3-2012 - IEEE Standard for Ethernet.
+ o RFC 2863 found at http://www.rfc-editor.org/rfc/rfc2863.txt.
+ o RFC 2819 found at http://www.rfc-editor.org/rfc/rfc2819.txt.
+ o Altera Triple Speed Ethernet User Guide, found at http://www.altera.com
+
+The statistics supported by the TSE and the device driver are as follows:
+
+"tx_packets" is equivalent to aFramesTransmittedOK defined in IEEE 802.3-2012,
+Section 5.2.2.1.2. This statistics is the count of frames that are successfully
+transmitted.
+
+"rx_packets" is equivalent to aFramesReceivedOK defined in IEEE 802.3-2012,
+Section 5.2.2.1.5. This statistic is the count of frames that are successfully
+received. This count does not include any error packets such as CRC errors,
+length errors, or alignment errors.
+
+"rx_crc_errors" is equivalent to aFrameCheckSequenceErrors defined in IEEE
+802.3-2012, Section 5.2.2.1.6. This statistic is the count of frames that are
+an integral number of bytes in length and do not pass the CRC test as the frame
+is received.
+
+"rx_align_errors" is equivalent to aAlignmentErrors defined in IEEE 802.3-2012,
+Section 5.2.2.1.7. This statistic is the count of frames that are not an
+integral number of bytes in length and do not pass the CRC test as the frame is
+received.
+
+"tx_bytes" is equivalent to aOctetsTransmittedOK defined in IEEE 802.3-2012,
+Section 5.2.2.1.8. This statistic is the count of data and pad bytes
+successfully transmitted from the interface.
+
+"rx_bytes" is equivalent to aOctetsReceivedOK defined in IEEE 802.3-2012,
+Section 5.2.2.1.14. This statistic is the count of data and pad bytes
+successfully received by the controller.
+
+"tx_pause" is equivalent to aPAUSEMACCtrlFramesTransmitted defined in IEEE
+802.3-2012, Section 30.3.4.2. This statistic is a count of PAUSE frames
+transmitted from the network controller.
+
+"rx_pause" is equivalent to aPAUSEMACCtrlFramesReceived defined in IEEE
+802.3-2012, Section 30.3.4.3. This statistic is a count of PAUSE frames
+received by the network controller.
+
+"rx_errors" is equivalent to ifInErrors defined in RFC 2863. This statistic is
+a count of the number of packets received containing errors that prevented the
+packet from being delivered to a higher level protocol.
+
+"tx_errors" is equivalent to ifOutErrors defined in RFC 2863. This statistic
+is a count of the number of packets that could not be transmitted due to errors.
+
+"rx_unicast" is equivalent to ifInUcastPkts defined in RFC 2863. This
+statistic is a count of the number of packets received that were not addressed
+to the broadcast address or a multicast group.
+
+"rx_multicast" is equivalent to ifInMulticastPkts defined in RFC 2863. This
+statistic is a count of the number of packets received that were addressed to
+a multicast address group.
+
+"rx_broadcast" is equivalent to ifInBroadcastPkts defined in RFC 2863. This
+statistic is a count of the number of packets received that were addressed to
+the broadcast address.
+
+"tx_discards" is equivalent to ifOutDiscards defined in RFC 2863. This
+statistic is the number of outbound packets not transmitted even though an
+error was not detected. An example of a reason this might occur is to free up
+internal buffer space.
+
+"tx_unicast" is equivalent to ifOutUcastPkts defined in RFC 2863. This
+statistic counts the number of packets transmitted that were not addressed to
+a multicast group or broadcast address.
+
+"tx_multicast" is equivalent to ifOutMulticastPkts defined in RFC 2863. This
+statistic counts the number of packets transmitted that were addressed to a
+multicast group.
+
+"tx_broadcast" is equivalent to ifOutBroadcastPkts defined in RFC 2863. This
+statistic counts the number of packets transmitted that were addressed to a
+broadcast address.
+
+"ether_drops" is equivalent to etherStatsDropEvents defined in RFC 2819.
+This statistic counts the number of packets dropped due to lack of internal
+controller resources.
+
+"rx_total_bytes" is equivalent to etherStatsOctets defined in RFC 2819.
+This statistic counts the total number of bytes received by the controller,
+including error and discarded packets.
+
+"rx_total_packets" is equivalent to etherStatsPkts defined in RFC 2819.
+This statistic counts the total number of packets received by the controller,
+including error, discarded, unicast, multicast, and broadcast packets.
+
+"rx_undersize" is equivalent to etherStatsUndersizePkts defined in RFC 2819.
+This statistic counts the number of correctly formed packets received less
+than 64 bytes long.
+
+"rx_oversize" is equivalent to etherStatsOversizePkts defined in RFC 2819.
+This statistic counts the number of correctly formed packets greater than 1518
+bytes long.
+
+"rx_64_bytes" is equivalent to etherStatsPkts64Octets defined in RFC 2819.
+This statistic counts the total number of packets received that were 64 octets
+in length.
+
+"rx_65_127_bytes" is equivalent to etherStatsPkts65to127Octets defined in RFC
+2819. This statistic counts the total number of packets received that were
+between 65 and 127 octets in length inclusive.
+
+"rx_128_255_bytes" is equivalent to etherStatsPkts128to255Octets defined in
+RFC 2819. This statistic is the total number of packets received that were
+between 128 and 255 octets in length inclusive.
+
+"rx_256_511_bytes" is equivalent to etherStatsPkts256to511Octets defined in
+RFC 2819. This statistic is the total number of packets received that were
+between 256 and 511 octets in length inclusive.
+
+"rx_512_1023_bytes" is equivalent to etherStatsPkts512to1023Octets defined in
+RFC 2819. This statistic is the total number of packets received that were
+between 512 and 1023 octets in length inclusive.
+
+"rx_1024_1518_bytes" is equivalent to etherStatsPkts1024to1518Octets define
+in RFC 2819. This statistic is the total number of packets received that were
+between 1024 and 1518 octets in length inclusive.
+
+"rx_gte_1519_bytes" is a statistic defined specific to the behavior of the
+Altera TSE. This statistics counts the number of received good and errored
+frames between the length of 1519 and the maximum frame length configured
+in the frm_length register. See the Altera TSE User Guide for More details.
+
+"rx_jabbers" is equivalent to etherStatsJabbers defined in RFC 2819. This
+statistic is the total number of packets received that were longer than 1518
+octets, and had either a bad CRC with an integral number of octets (CRC Error)
+or a bad CRC with a non-integral number of octets (Alignment Error).
+
+"rx_runts" is equivalent to etherStatsFragments defined in RFC 2819. This
+statistic is the total number of packets received that were less than 64 octets
+in length and had either a bad CRC with an integral number of octets (CRC
+error) or a bad CRC with a non-integral number of octets (Alignment Error).
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 5cdb22971d1..a383c00392d 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -270,16 +270,15 @@ arp_ip_target
 arp_validate
 
 	Specifies whether or not ARP probes and replies should be
-	validated in the active-backup mode.  This causes the ARP
-	monitor to examine the incoming ARP requests and replies, and
-	only consider a slave to be up if it is receiving the
-	appropriate ARP traffic.
+	validated in any mode that supports arp monitoring, or whether
+	non-ARP traffic should be filtered (disregarded) for link
+	monitoring purposes.
 
 	Possible values are:
 
 	none or 0
 
-		No validation is performed.  This is the default.
+		No validation or filtering is performed.
 
 	active or 1
 
@@ -293,31 +292,68 @@ arp_validate
 
 		Validation is performed for all slaves.
 
-	For the active slave, the validation checks ARP replies to
-	confirm that they were generated by an arp_ip_target.  Since
-	backup slaves do not typically receive these replies, the
-	validation performed for backup slaves is on the ARP request
-	sent out via the active slave.  It is possible that some
-	switch or network configurations may result in situations
-	wherein the backup slaves do not receive the ARP requests; in
-	such a situation, validation of backup slaves must be
-	disabled.
-
-	The validation of ARP requests on backup slaves is mainly
-	helping bonding to decide which slaves are more likely to
-	work in case of the active slave failure, it doesn't really
-	guarantee that the backup slave will work if it's selected
-	as the next active slave.
-
-	This option is useful in network configurations in which
-	multiple bonding hosts are concurrently issuing ARPs to one or
-	more targets beyond a common switch.  Should the link between
-	the switch and target fail (but not the switch itself), the
-	probe traffic generated by the multiple bonding instances will
-	fool the standard ARP monitor into considering the links as
-	still up.  Use of the arp_validate option can resolve this, as
-	the ARP monitor will only consider ARP requests and replies
-	associated with its own instance of bonding.
+	filter or 4
+
+		Filtering is applied to all slaves. No validation is
+		performed.
+
+	filter_active or 5
+
+		Filtering is applied to all slaves, validation is performed
+		only for the active slave.
+
+	filter_backup or 6
+
+		Filtering is applied to all slaves, validation is performed
+		only for backup slaves.
+
+	Validation:
+
+	Enabling validation causes the ARP monitor to examine the incoming
+	ARP requests and replies, and only consider a slave to be up if it
+	is receiving the appropriate ARP traffic.
+
+	For an active slave, the validation checks ARP replies to confirm
+	that they were generated by an arp_ip_target.  Since backup slaves
+	do not typically receive these replies, the validation performed
+	for backup slaves is on the broadcast ARP request sent out via the
+	active slave.  It is possible that some switch or network
+	configurations may result in situations wherein the backup slaves
+	do not receive the ARP requests; in such a situation, validation
+	of backup slaves must be disabled.
+
+	The validation of ARP requests on backup slaves is mainly helping
+	bonding to decide which slaves are more likely to work in case of
+	the active slave failure, it doesn't really guarantee that the
+	backup slave will work if it's selected as the next active slave.
+
+	Validation is useful in network configurations in which multiple
+	bonding hosts are concurrently issuing ARPs to one or more targets
+	beyond a common switch.  Should the link between the switch and
+	target fail (but not the switch itself), the probe traffic
+	generated by the multiple bonding instances will fool the standard
+	ARP monitor into considering the links as still up.  Use of
+	validation can resolve this, as the ARP monitor will only consider
+	ARP requests and replies associated with its own instance of
+	bonding.
+
+	Filtering:
+
+	Enabling filtering causes the ARP monitor to only use incoming ARP
+	packets for link availability purposes.  Arriving packets that are
+	not ARPs are delivered normally, but do not count when determining
+	if a slave is available.
+
+	Filtering operates by only considering the reception of ARP
+	packets (any ARP packet, regardless of source or destination) when
+	determining if a slave has received traffic for link availability
+	purposes.
+
+	Filtering is useful in network configurations in which significant
+	levels of third party broadcast traffic would fool the standard
+	ARP monitor into considering the links as still up.  Use of
+	filtering can resolve this, as only ARP traffic is considered for
+	link availability purposes.
 
 	This option was added in bonding version 3.1.0.
 
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index a06b48d2f5c..81f940f4e88 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -546,6 +546,130 @@ ffffffffa0069c8f + <x>:
 For BPF JIT developers, bpf_jit_disasm, bpf_asm and bpf_dbg provides a useful
 toolchain for developing and testing the kernel's JIT compiler.
 
+BPF kernel internals
+--------------------
+Internally, for the kernel interpreter, a different BPF instruction set
+format with similar underlying principles from BPF described in previous
+paragraphs is being used. However, the instruction set format is modelled
+closer to the underlying architecture to mimic native instruction sets, so
+that a better performance can be achieved (more details later).
+
+It is designed to be JITed with one to one mapping, which can also open up
+the possibility for GCC/LLVM compilers to generate optimized BPF code through
+a BPF backend that performs almost as fast as natively compiled code.
+
+The new instruction set was originally designed with the possible goal in
+mind to write programs in "restricted C" and compile into BPF with a optional
+GCC/LLVM backend, so that it can just-in-time map to modern 64-bit CPUs with
+minimal performance overhead over two steps, that is, C -> BPF -> native code.
+
+Currently, the new format is being used for running user BPF programs, which
+includes seccomp BPF, classic socket filters, cls_bpf traffic classifier,
+team driver's classifier for its load-balancing mode, netfilter's xt_bpf
+extension, PTP dissector/classifier, and much more. They are all internally
+converted by the kernel into the new instruction set representation and run
+in the extended interpreter. For in-kernel handlers, this all works
+transparently by using sk_unattached_filter_create() for setting up the
+filter, resp. sk_unattached_filter_destroy() for destroying it. The macro
+SK_RUN_FILTER(filter, ctx) transparently invokes the right BPF function to
+run the filter. 'filter' is a pointer to struct sk_filter that we got from
+sk_unattached_filter_create(), and 'ctx' the given context (e.g. skb pointer).
+All constraints and restrictions from sk_chk_filter() apply before a
+conversion to the new layout is being done behind the scenes!
+
+Currently, for JITing, the user BPF format is being used and current BPF JIT
+compilers reused whenever possible. In other words, we do not (yet!) perform
+a JIT compilation in the new layout, however, future work will successively
+migrate traditional JIT compilers into the new instruction format as well, so
+that they will profit from the very same benefits. Thus, when speaking about
+JIT in the following, a JIT compiler (TBD) for the new instruction format is
+meant in this context.
+
+Some core changes of the new internal format:
+
+- Number of registers increase from 2 to 10:
+
+  The old format had two registers A and X, and a hidden frame pointer. The
+  new layout extends this to be 10 internal registers and a read-only frame
+  pointer. Since 64-bit CPUs are passing arguments to functions via registers
+  the number of args from BPF program to in-kernel function is restricted
+  to 5 and one register is used to accept return value from an in-kernel
+  function. Natively, x86_64 passes first 6 arguments in registers, aarch64/
+  sparcv9/mips64 have 7 - 8 registers for arguments; x86_64 has 6 callee saved
+  registers, and aarch64/sparcv9/mips64 have 11 or more callee saved registers.
+
+  Therefore, BPF calling convention is defined as:
+
+    * R0	- return value from in-kernel function
+    * R1 - R5	- arguments from BPF program to in-kernel function
+    * R6 - R9	- callee saved registers that in-kernel function will preserve
+    * R10	- read-only frame pointer to access stack
+
+  Thus, all BPF registers map one to one to HW registers on x86_64, aarch64,
+  etc, and BPF calling convention maps directly to ABIs used by the kernel on
+  64-bit architectures.
+
+  On 32-bit architectures JIT may map programs that use only 32-bit arithmetic
+  and may let more complex programs to be interpreted.
+
+  R0 - R5 are scratch registers and BPF program needs spill/fill them if
+  necessary across calls. Note that there is only one BPF program (== one BPF
+  main routine) and it cannot call other BPF functions, it can only call
+  predefined in-kernel functions, though.
+
+- Register width increases from 32-bit to 64-bit:
+
+  Still, the semantics of the original 32-bit ALU operations are preserved
+  via 32-bit subregisters. All BPF registers are 64-bit with 32-bit lower
+  subregisters that zero-extend into 64-bit if they are being written to.
+  That behavior maps directly to x86_64 and arm64 subregister definition, but
+  makes other JITs more difficult.
+
+  32-bit architectures run 64-bit internal BPF programs via interpreter.
+  Their JITs may convert BPF programs that only use 32-bit subregisters into
+  native instruction set and let the rest being interpreted.
+
+  Operation is 64-bit, because on 64-bit architectures, pointers are also
+  64-bit wide, and we want to pass 64-bit values in/out of kernel functions,
+  so 32-bit BPF registers would otherwise require to define register-pair
+  ABI, thus, there won't be able to use a direct BPF register to HW register
+  mapping and JIT would need to do combine/split/move operations for every
+  register in and out of the function, which is complex, bug prone and slow.
+  Another reason is the use of atomic 64-bit counters.
+
+- Conditional jt/jf targets replaced with jt/fall-through:
+
+  While the original design has constructs such as "if (cond) jump_true;
+  else jump_false;", they are being replaced into alternative constructs like
+  "if (cond) jump_true; /* else fall-through */".
+
+- Introduces bpf_call insn and register passing convention for zero overhead
+  calls from/to other kernel functions:
+
+  After a kernel function call, R1 - R5 are reset to unreadable and R0 has a
+  return type of the function. Since R6 - R9 are callee saved, their state is
+  preserved across the call.
+
+Also in the new design, BPF is limited to 4096 insns, which means that any
+program will terminate quickly and will only call a fixed number of kernel
+functions. Original BPF and the new format are two operand instructions,
+which helps to do one-to-one mapping between BPF insn and x86 insn during JIT.
+
+The input context pointer for invoking the interpreter function is generic,
+its content is defined by a specific use case. For seccomp register R1 points
+to seccomp_data, for converted BPF filters R1 points to a skb.
+
+A program, that is translated internally consists of the following elements:
+
+  op:16, jt:8, jf:8, k:32    ==>    op:8, a_reg:4, x_reg:4, off:16, imm:32
+
+Just like the original BPF, the new format runs within a controlled environment,
+is deterministic and the kernel can easily prove that. The safety of the program
+can be determined in two steps: first step does depth-first-search to disallow
+loops and other CFG validation; second step starts from the first insn and
+descends all possible paths. It simulates execution of every insn and observes
+the state change of registers and stack.
+
 Misc
 ----
 
@@ -561,3 +685,4 @@ the underlying architecture.
 
 Jay Schulist <jschlst@samba.org>
 Daniel Borkmann <dborkman@redhat.com>
+Alexei Starovoitov <ast@plumgrid.com>
diff --git a/Documentation/networking/gianfar.txt b/Documentation/networking/gianfar.txt
index ad474ea07d0..ba1daea7f2e 100644
--- a/Documentation/networking/gianfar.txt
+++ b/Documentation/networking/gianfar.txt
@@ -1,38 +1,8 @@
 The Gianfar Ethernet Driver
-Sysfs File description
 
 Author: Andy Fleming <afleming@freescale.com>
 Updated: 2005-07-28
 
-SYSFS
-
-Several of the features of the gianfar driver are controlled
-through sysfs files.  These are:
-
-bd_stash:
-To stash RX Buffer Descriptors in the L2, echo 'on' or '1' to
-bd_stash, echo 'off' or '0' to disable
-
-rx_stash_len:
-To stash the first n bytes of the packet in L2, echo the number
-of bytes to buf_stash_len.  echo 0 to disable.
-
-WARNING: You could really screw these up if you set them too low or high!
-fifo_threshold:
-To change the number of bytes the controller needs in the
-fifo before it starts transmission, echo the number of bytes to 
-fifo_thresh.  Range should be 0-511.
-
-fifo_starve:
-When the FIFO has less than this many bytes during a transmit, it
-enters starve mode, and increases the priority of TX memory
-transactions.  To change, echo the number of bytes to
-fifo_starve.  Range should be 0-511.
-
-fifo_starve_off:
-Once in starve mode, the FIFO remains there until it has this
-many bytes.  To change, echo the number of bytes to
-fifo_starve_off.  Range should be 0-511.
 
 CHECKSUM OFFLOADING
 
diff --git a/Documentation/networking/igb.txt b/Documentation/networking/igb.txt
index 4ebbd659256..43d3549366a 100644
--- a/Documentation/networking/igb.txt
+++ b/Documentation/networking/igb.txt
@@ -36,54 +36,6 @@ Default Value: 0
 This parameter adds support for SR-IOV.  It causes the driver to spawn up to
 max_vfs worth of virtual function.
 
-QueuePairs
-----------
-Valid Range:  0-1
-Default Value:  1 (TX and RX will be paired onto one interrupt vector)
-
-If set to 0, when MSI-X is enabled, the TX and RX will attempt to occupy
-separate vectors.
-
-This option can be overridden to 1 if there are not sufficient interrupts
-available.  This can occur if any combination of RSS, VMDQ, and max_vfs
-results in more than 4 queues being used.
-
-Node
-----
-Valid Range:   0-n
-Default Value: -1 (off)
-
-  0 - n: where n is the number of the NUMA node that should be used to
-         allocate memory for this adapter port.
-  -1: uses the driver default of allocating memory on whichever processor is
-      running insmod/modprobe.
-
-  The Node parameter will allow you to pick which NUMA node you want to have
-  the adapter allocate memory from.  All driver structures, in-memory queues,
-  and receive buffers will be allocated on the node specified.  This parameter
-  is only useful when interrupt affinity is specified, otherwise some portion
-  of the time the interrupt could run on a different core than the memory is
-  allocated on, causing slower memory access and impacting throughput, CPU, or
-  both.
-
-EEE
----
-Valid Range:  0-1
-Default Value: 1 (enabled)
-
-  A link between two EEE-compliant devices will result in periodic bursts of
-  data followed by long periods where in the link is in an idle state. This Low
-  Power Idle (LPI) state is supported in both 1Gbps and 100Mbps link speeds.
-  NOTE: EEE support requires autonegotiation.
-
-DMAC
-----
-Valid Range: 0-1
-Default Value: 1 (enabled)
-  Enables or disables DMA Coalescing feature.
-
-
-
 Additional Configurations
 =========================
 
diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index ebf27071940..3544c98401f 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -48,7 +48,7 @@ The MDIO bus
    time, so it is safe for them to block, waiting for an interrupt to signal
    the operation is complete
  
- 2) A reset function is necessary.  This is used to return the bus to an
+ 2) A reset function is optional.  This is used to return the bus to an
    initialized state.
 
  3) A probe function is needed.  This function should set up anything the bus
@@ -253,16 +253,25 @@ Writing a PHY driver
 
  Each driver consists of a number of function pointers:
 
+   soft_reset: perform a PHY software reset
    config_init: configures PHY into a sane state after a reset.
      For instance, a Davicom PHY requires descrambling disabled.
    probe: Allocate phy->priv, optionally refuse to bind.
    PHY may not have been reset or had fixups run yet.
    suspend/resume: power management
    config_aneg: Changes the speed/duplex/negotiation settings
+   aneg_done: Determines the auto-negotiation result
    read_status: Reads the current speed/duplex/negotiation settings
    ack_interrupt: Clear a pending interrupt
+   did_interrupt: Checks if the PHY generated an interrupt
    config_intr: Enable or disable interrupts
    remove: Does any driver take-down
+   ts_info: Queries about the HW timestamping status
+   hwtstamp: Set the PHY HW timestamping configuration
+   rxtstamp: Requests a receive timestamp at the PHY level for a 'skb'
+   txtsamp: Requests a transmit timestamp at the PHY level for a 'skb'
+   set_wol: Enable Wake-on-LAN at the PHY level
+   get_wol: Get the Wake-on-LAN status at the PHY level
 
  Of these, only config_aneg and read_status are required to be
  assigned by the driver code.  The rest are optional.  Also, it is
diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt
index 5a61a240a65..0e30c7845b2 100644
--- a/Documentation/networking/pktgen.txt
+++ b/Documentation/networking/pktgen.txt
@@ -102,13 +102,18 @@ Examples:
                          The 'minimum' MAC is what you set with dstmac.
 
  pgset "flag [name]"     Set a flag to determine behaviour.  Current flags
-                         are: IPSRC_RND #IP Source is random (between min/max),
-                              IPDST_RND, UDPSRC_RND,
-                              UDPDST_RND, MACSRC_RND, MACDST_RND 
+                         are: IPSRC_RND # IP source is random (between min/max)
+                              IPDST_RND # IP destination is random
+                              UDPSRC_RND, UDPDST_RND,
+                              MACSRC_RND, MACDST_RND
+                              TXSIZE_RND, IPV6,
                               MPLS_RND, VID_RND, SVID_RND
+                              FLOW_SEQ,
                               QUEUE_MAP_RND # queue map random
                               QUEUE_MAP_CPU # queue map mirrors smp_processor_id()
-                              IPSEC # Make IPsec encapsulation for packet
+                              UDPCSUM,
+                              IPSEC # IPsec encapsulation (needs CONFIG_XFRM)
+                              NODE_ALLOC # node specific memory allocation
 
  pgset spi SPI_VALUE     Set specific SA used to transform packet.
 
@@ -233,13 +238,22 @@ udp_dst_max
 
 flag
   IPSRC_RND
-  TXSIZE_RND
   IPDST_RND
   UDPSRC_RND
   UDPDST_RND
   MACSRC_RND
   MACDST_RND
+  TXSIZE_RND
+  IPV6
+  MPLS_RND
+  VID_RND
+  SVID_RND
+  FLOW_SEQ
+  QUEUE_MAP_RND
+  QUEUE_MAP_CPU
+  UDPCSUM
   IPSEC
+  NODE_ALLOC
 
 dst_min
 dst_max
diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt
index b89bc82eed4..16a924c486b 100644
--- a/Documentation/networking/rxrpc.txt
+++ b/Documentation/networking/rxrpc.txt
@@ -27,6 +27,8 @@ Contents of this document:
 
  (*) AF_RXRPC kernel interface.
 
+ (*) Configurable parameters.
+
 
 ========
 OVERVIEW
@@ -864,3 +866,82 @@ The kernel interface functions are as follows:
 
      This is used to allocate a null RxRPC key that can be used to indicate
      anonymous security for a particular domain.
+
+
+=======================
+CONFIGURABLE PARAMETERS
+=======================
+
+The RxRPC protocol driver has a number of configurable parameters that can be
+adjusted through sysctls in /proc/net/rxrpc/:
+
+ (*) req_ack_delay
+
+     The amount of time in milliseconds after receiving a packet with the
+     request-ack flag set before we honour the flag and actually send the
+     requested ack.
+
+     Usually the other side won't stop sending packets until the advertised
+     reception window is full (to a maximum of 255 packets), so delaying the
+     ACK permits several packets to be ACK'd in one go.
+
+ (*) soft_ack_delay
+
+     The amount of time in milliseconds after receiving a new packet before we
+     generate a soft-ACK to tell the sender that it doesn't need to resend.
+
+ (*) idle_ack_delay
+
+     The amount of time in milliseconds after all the packets currently in the
+     received queue have been consumed before we generate a hard-ACK to tell
+     the sender it can free its buffers, assuming no other reason occurs that
+     we would send an ACK.
+
+ (*) resend_timeout
+
+     The amount of time in milliseconds after transmitting a packet before we
+     transmit it again, assuming no ACK is received from the receiver telling
+     us they got it.
+
+ (*) max_call_lifetime
+
+     The maximum amount of time in seconds that a call may be in progress
+     before we preemptively kill it.
+
+ (*) dead_call_expiry
+
+     The amount of time in seconds before we remove a dead call from the call
+     list.  Dead calls are kept around for a little while for the purpose of
+     repeating ACK and ABORT packets.
+
+ (*) connection_expiry
+
+     The amount of time in seconds after a connection was last used before we
+     remove it from the connection list.  Whilst a connection is in existence,
+     it serves as a placeholder for negotiated security; when it is deleted,
+     the security must be renegotiated.
+
+ (*) transport_expiry
+
+     The amount of time in seconds after a transport was last used before we
+     remove it from the transport list.  Whilst a transport is in existence, it
+     serves to anchor the peer data and keeps the connection ID counter.
+
+ (*) rxrpc_rx_window_size
+
+     The size of the receive window in packets.  This is the maximum number of
+     unconsumed received packets we're willing to hold in memory for any
+     particular call.
+
+ (*) rxrpc_rx_mtu
+
+     The maximum packet MTU size that we're willing to receive in bytes.  This
+     indicates to the peer whether we're willing to accept jumbo packets.
+
+ (*) rxrpc_rx_jumbo_max
+
+     The maximum number of packets that we're willing to accept in a jumbo
+     packet.  Non-terminal packets in a jumbo packet must contain a four byte
+     header plus exactly 1412 bytes of data.  The terminal packet must contain
+     a four byte header plus any amount of data.  In any event, a jumbo packet
+     may not exceed rxrpc_rx_mtu in size.
diff --git a/Documentation/networking/tcp.txt b/Documentation/networking/tcp.txt
index 7d11bb5dc30..bdc4c0db51e 100644
--- a/Documentation/networking/tcp.txt
+++ b/Documentation/networking/tcp.txt
@@ -30,7 +30,7 @@ A congestion control mechanism can be registered through functions in
 tcp_cong.c. The functions used by the congestion control mechanism are
 registered via passing a tcp_congestion_ops struct to
 tcp_register_congestion_control. As a minimum name, ssthresh,
-cong_avoid, min_cwnd must be valid.
+cong_avoid must be valid.
 
 Private data for a congestion control mechanism is stored in tp->ca_priv.
 tcp_ca(tp) returns a pointer to this space.  This is preallocated space - it
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index 048c92b487f..bc355412490 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -202,6 +202,9 @@ Time stamps for outgoing packets are to be generated as follows:
   and not free the skb. A driver not supporting hardware time stamping doesn't
   do that. A driver must never touch sk_buff::tstamp! It is used to store
   software generated time stamps by the network subsystem.
+- Driver should call skb_tx_timestamp() as close to passing sk_buff to hardware
+  as possible. skb_tx_timestamp() provides a software time stamp if requested
+  and hardware timestamping is not possible (SKBTX_IN_PROGRESS not set).
 - As soon as the driver has sent the packet and/or obtained a
   hardware time stamp for it, it passes the time stamp back by
   calling skb_hwtstamp_tx() with the original skb, the raw
@@ -212,6 +215,3 @@ Time stamps for outgoing packets are to be generated as follows:
   this would occur at a later time in the processing pipeline than other
   software time stamping and therefore could lead to unexpected deltas
   between time stamps.
-- If the driver did not set the SKBTX_IN_PROGRESS flag (see above), then
-  dev_hard_start_xmit() checks whether software time stamping
-  is wanted as fallback and potentially generates the time stamp.