linux/net/ipv4, branch v2.6.28.9

tcp: Fix length tcp_splice_data_recv passes to skb_splice_bits.

2009-02-17T17:29:00Z

[ Upstream commit 9fa5fdf291c9b58b1cb8b4bb2a0ee57efa21d635 ] tcp_splice_data_recv has two lengths to consider: the len parameter it gets from tcp_read_sock, which specifies the amount of data in the skb, and rd_desc->count, which is the amount of data the splice caller still wants. Currently it passes just the latter to skb_splice_bits, which then splices min(rd_desc->count, skb->len - offset) bytes. Most of the time this is fine, except when the skb contains urgent data. In that case len goes only up to the urgent byte and is less than skb->len - offset. By ignoring len tcp_splice_data_recv may a) splice data tcp_read_sock told it not to, b) return to tcp_read_sock a value > len. Now, tcp_read_sock doesn't handle used > len and leaves the socket in a bad state (both sk_receive_queue and copied_seq are bad at that point) resulting in duplicated data and corruption. Fix by passing min(rd_desc->count, len) to skb_splice_bits. Signed-off-by: Dimitris Michailidis Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

tcp: splice as many packets as possible at once

2009-02-17T17:29:00Z

[ Upstream commit 33966dd0e2f68f26943cd9ee93ec6abbc6547a8e ] As spotted by Willy Tarreau, current splice() from tcp socket to pipe is not optimal. It processes at most one segment per call. This results in low performance and very high overhead due to syscall rate when splicing from interfaces which do not support LRO. Willy provided a patch inside tcp_splice_read(), but a better fix is to let tcp_read_sock() process as many segments as possible, so that tcp_rcv_space_adjust() and tcp_cleanup_rbuf() are called less often. With this change, splice() behaves like tcp_recvmsg(), being able to consume many skbs in one system call. With typical 1460 bytes of payload per frame, that means splice(SPLICE_F_NONBLOCK) can return 16*1460 = 23360 bytes. Signed-off-by: Willy Tarreau Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

udp: increments sk_drops in __udp_queue_rcv_skb()

2009-02-17T17:28:57Z

[ Upstream commit e408b8dcb5ce42243a902205005208e590f28454 ] Commit 93821778def10ec1e69aa3ac10adee975dad4ff3 (udp: Fix rcv socket locking) accidentally removed sk_drops increments for UDP IPV4 sockets. This field can be used to detect incorrect sizing of socket receive buffers. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

udp: Fix UDP short packet false positive

2009-02-17T17:28:56Z

[ Upstream commit 7b5e56f9d635643ad54f2f42e69ad16b80a2cff1 ] The UDP header pointer assignment must happen after calling pskb_may_pull(). As pskb_may_pull() can potentially alter the SKB buffer. This was exposted by running multicast traffic through the NIU driver, as it won't prepull the protocol headers into the linear area on receive. Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

ipv4: fix infinite retry loop in IP-Config

2009-02-17T17:28:54Z

[ Upstream commit 9d8dba6c979fa99c96938c869611b9a23b73efa9 ] Signed-off-by: Benjamin Zores Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

netfilter: nf_conntrack: fix ICMP/ICMPv6 timeout sysctls on big-endian

2009-01-25T00:41:45Z

Upstream commit 71320af: An old bug crept back into the ICMP/ICMPv6 conntrack protocols: the timeout values are defined as unsigned longs, the sysctl's maxsize is set to sizeof(unsigned int). Use unsigned int for the timeout values as in the other conntrack protocols. Reported-by: Jean-Mickael Guerin Signed-off-by: Patrick McHardy Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

tcp: don't mask EOF and socket errors on nonblocking splice receive

2009-01-25T00:41:43Z

[ Upstream commit: 4f7d54f59bc470f0aaa932f747a95232d7ebf8b1 ] Currently, setting SPLICE_F_NONBLOCK on splice from a TCP socket results in masking of EOF (RDHUP) and error conditions on the socket by an -EAGAIN return. Move the NONBLOCK check in tcp_splice_read() to be after the EOF and error checks to fix this. Signed-off-by: Lennert Buytenhek Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

netfilter: update rwlock initialization for nat_table

2008-12-15T08:19:14Z

The commit e099a173573ce1ba171092aee7bb3c72ea686e59 (netfilter: netns nat: per-netns NAT table) renamed the nat_table from __nat_table to nat_table without updating the __RW_LOCK_UNLOCKED(__nat_table.lock). Signed-off-by: Steven Rostedt Signed-off-by: David S. Miller

tcp: tcp_vegas cong avoid fix

2008-12-09T08:13:04Z

This patch addresses a book-keeping issue in tcp_vegas.c. At present tcp_vegas does separate book-keeping of cwnd based on packet sequence numbers. A mismatch can develop between this book-keeping and tp->snd_cwnd due, for example, to delayed acks acking multiple packets. When vegas transitions to reno operation (e.g. following loss), then this mismatch leads to incorrect behaviour (akin to a cwnd backoff). This seems mostly to affect operation at low cwnds where delayed acking can lead to a significant fraction of cwnd being covered by a single ack, leading to the book-keeping mismatch. This patch modifies the congestion avoidance update to avoid the need for separate book-keeping while leaving vegas congestion avoidance functionally unchanged. A secondary advantage of this modification is that the use of fixed-point (via V_PARAM_SHIFT) and 64 bit arithmetic is no longer necessary, simplifying the code. Some example test measurements with the patched code (confirming no functional change in the congestion avoidance algorithm) can be seen at: http://www.hamilton.ie/doug/vegaspatch/ Signed-off-by: Doug Leith Signed-off-by: David S. Miller

tcp: tcp_vegas ssthresh bug fix

2008-12-05T01:17:18Z

This patch fixes a bug in tcp_vegas.c. At the moment this code leaves ssthresh untouched. However, this means that the vegas congestion control algorithm is effectively unable to reduce cwnd below the ssthresh value (if the vegas update lowers the cwnd below ssthresh, then slow start is activated to raise it back up). One example where this matters is when during slow start cwnd overshoots the link capacity and a flow then exits slow start with ssthresh set to a value above where congestion avoidance would like to adjust it. Signed-off-by: Doug Leith Signed-off-by: David S. Miller