<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4, branch v3.4</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/net/ipv4?h=v3.4</id>
<link rel='self' href='https://git.amat.us/linux/atom/net/ipv4?h=v3.4'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2012-05-17T22:31:43Z</updated>
<entry>
<title>tcp: do_tcp_sendpages() must try to push data out on oom conditions</title>
<updated>2012-05-17T22:31:43Z</updated>
<author>
<name>Willy Tarreau</name>
<email>w@1wt.eu</email>
</author>
<published>2012-05-17T11:14:14Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=bad115cfe5b509043b684d3a007ab54b80090aa1'/>
<id>urn:sha1:bad115cfe5b509043b684d3a007ab54b80090aa1</id>
<content type='text'>
Since recent changes on TCP splicing (starting with commits 2f533844
"tcp: allow splice() to build full TSO packets" and 35f9c09f "tcp:
tcp_sendpages() should call tcp_push() once"), I started seeing
massive stalls when forwarding traffic between two sockets using
splice() when pipe buffers were larger than socket buffers.

Latest changes (net: netdev_alloc_skb() use build_skb()) made the
problem even more apparent.

The reason seems to be that if do_tcp_sendpages() fails on out of memory
condition without being able to send at least one byte, tcp_push() is not
called and the buffers cannot be flushed.

After applying the attached patch, I cannot reproduce the stalls at all
and the data rate it perfectly stable and steady under any condition
which previously caused the problem to be permanent.

The issue seems to have been there since before the kernel migrated to
git, which makes me think that the stalls I occasionally experienced
with tux during stress-tests years ago were probably related to the
same issue.

This issue was first encountered on 3.0.31 and 3.2.17, so please backport
to -stable.

Signed-off-by: Willy Tarreau &lt;w@1wt.eu&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: Do not use dead fib_info entries.</title>
<updated>2012-05-11T02:16:32Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2012-05-11T02:16:32Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=dccd9ecc374462e5d6a5b8f8110415a86c2213d8'/>
<id>urn:sha1:dccd9ecc374462e5d6a5b8f8110415a86c2213d8</id>
<content type='text'>
Due to RCU lookups and RCU based release, fib_info objects can
be found during lookup which have fi-&gt;fib_dead set.

We must ignore these entries, otherwise we risk dereferencing
the parts of the entry which are being torn down.

Reported-by: Yevgen Pronenko &lt;yevgen.pronenko@sonymobile.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: change tcp_adv_win_scale and tcp_rmem[2]</title>
<updated>2012-05-03T01:08:58Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-05-02T02:28:41Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=b49960a05e32121d29316cfdf653894b88ac9190'/>
<id>urn:sha1:b49960a05e32121d29316cfdf653894b88ac9190</id>
<content type='text'>
tcp_adv_win_scale default value is 2, meaning we expect a good citizen
skb to have skb-&gt;len / skb-&gt;truesize ratio of 75% (3/4)

In 2.6 kernels we (mis)accounted for typical MSS=1460 frame :
1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392.
So these skbs were considered as not bloated.

With recent truesize fixes, a typical MSS=1460 frame truesize is now the
more precise :
2048 + 256 = 2304. But 2304 * 3/4 = 1728.
So these skb are not good citizen anymore, because 1460 &lt; 1728

(GRO can escape this problem because it build skbs with a too low
truesize.)

This also means tcp advertises a too optimistic window for a given
allocated rcvspace : When receiving frames, sk_rmem_alloc can hit
sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often,
especially when application is slow to drain its receive queue or in
case of losses (netperf is fast, scp is slow). This is a major latency
source.

We should adjust the len/truesize ratio to 50% instead of 75%

This patch :

1) changes tcp_adv_win_scale default to 1 instead of 2

2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account
better truesize tracking and to allow autotuning tcp receive window to
reach same value than before. Note that same amount of kernel memory is
consumed compared to 2.6 kernels.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: fix infinite cwnd in tcp_complete_cwr()</title>
<updated>2012-04-30T17:44:39Z</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2012-04-30T06:00:18Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=1cebce36d660c83bd1353e41f3e66abd4686f215'/>
<id>urn:sha1:1cebce36d660c83bd1353e41f3e66abd4686f215</id>
<content type='text'>
When the cwnd reduction is done, ssthresh may be infinite
if TCP enters CWR via ECN or F-RTO. If cwnd is not undone, i.e.,
undo_marker is set, tcp_complete_cwr() falsely set cwnd to the
infinite ssthresh value. The correct operation is to keep cwnd
intact because it has been updated in ECN or F-RTO.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: clean up use of jiffies in tcp_rcv_rtt_measure()</title>
<updated>2012-04-27T16:34:39Z</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2012-04-27T15:29:37Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=651913ce9de2bbcedef608c5d6cf39c244248509'/>
<id>urn:sha1:651913ce9de2bbcedef608c5d6cf39c244248509</id>
<content type='text'>
Clean up a reference to jiffies in tcp_rcv_rtt_measure() that should
instead reference tcp_time_stamp. Since the result of the subtraction
is passed into a function taking u32, this should not change any
behavior (and indeed the generated assembly does not change on
x86_64). However, it seems worth cleaning this up for consistency and
clarity (and perhaps to avoid bugs if this is copied and pasted
somewhere else).

Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>udp_diag: implement idiag_get_info for udp/udplite to get queue information</title>
<updated>2012-04-26T00:43:01Z</updated>
<author>
<name>Shan Wei</name>
<email>davidshan@tencent.com</email>
</author>
<published>2012-04-24T18:15:41Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=62ad6fcd743792bf294f2a7ba26ab8f462065150'/>
<id>urn:sha1:62ad6fcd743792bf294f2a7ba26ab8f462065150</id>
<content type='text'>
When we use netlink to monitor queue information for udp socket,
idiag_rqueue and idiag_wqueue of inet_diag_msg are returned with 0.

Keep consistent with netstat, just return back allocated rmem/wmem size.

Signed-off-by: Shan Wei &lt;davidshan@tencent.com&gt;
Acked-by: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: fix retransmit of partially acked frames</title>
<updated>2012-04-18T20:52:45Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-04-18T10:14:23Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=22b4a4f22da4b39c6f7f679fd35f3d35c91bf851'/>
<id>urn:sha1:22b4a4f22da4b39c6f7f679fd35f3d35c91bf851</id>
<content type='text'>
Alexander Beregalov reported skb_over_panic errors and provided stack
trace.

I occurs commit a21d45726aca (tcp: avoid order-1 allocations on wifi and
tx path) added a regression, when a retransmit is done after a partial
ACK.

tcp_retransmit_skb() tries to aggregate several frames if the first one
has enough available room to hold the following ones payload. This is
controlled by /proc/sys/net/ipv4/tcp_retrans_collapse tunable (default :
enabled)

Problem is we must make sure _pskb_trim_head() doesnt fool
skb_availroom() when pulling some bytes from skb (this pull is done when
receiver ACK part of the frame).

Reported-by: Alexander Beregalov &lt;a.beregalov@gmail.com&gt;
Cc: Marc MERLIN &lt;marc@merlins.org&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: fix tcp_grow_window() for large incoming frames</title>
<updated>2012-04-18T02:32:00Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-04-16T23:28:07Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4d846f02392a710f9604892ac3329e628e60a230'/>
<id>urn:sha1:4d846f02392a710f9604892ac3329e628e60a230</id>
<content type='text'>
tcp_grow_window() has to grow rcv_ssthresh up to window_clamp, allowing
sender to increase its window.

tcp_grow_window() still assumes a tcp frame is under MSS, but its no
longer true with LRO/GRO.

This patch fixes one of the performance issue we noticed with GRO on.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2012-04-12T21:04:33Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-04-12T21:04:33Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=174808af90a06ee59ffedd60c00c252f1f887f25'/>
<id>urn:sha1:174808af90a06ee59ffedd60c00c252f1f887f25</id>
<content type='text'>
Pull networking fixes from David Miller:

 1) Fix bluetooth userland regression reported by Keith Packard, from
    Gustavo Padovan.

 2) Revert ath9k PS idle change, from Sujith Manoharan.

 3) Correct default TCP memory limits (again), from Eric Dumazet.

 4) Fix tcp_rcv_rtt_update() accidental use of unscaled RTT, from Neal
    Cardwell.

 5) We made a facility for layers like wireless to say how much tailroom
    they need in the SKB for link layer stuff such as wireless
    encryption etc., but TCP works hard to fill every SKB out to the end
    defeating this specification.

    This leads to every TCP packet getting reallocated by the wireless
    code in order to have the right amount of tailroom available.

    Fix TCP to only fill SKBs out to the real amount of data area it
    asked for during the allocation, this way it won't eat into the
    slack added for the device's tailroom needs.

    Reported by Marc Merlin and fixed by Eric Dumazet.

 6) Leaks, endian bugs, and new device IDs in bluetooth from Santosh
    Nayak, João Paulo Rechi Vita, Cho, Yu-Chen, Andrei Emeltchenko,
    AceLan Kao, and Andrei Emeltchenko.

 7) OOPS on tty_close fix in bluetooth's hci_ldisc from Johan Hovold.

 8) netfilter erroneously scales TCP window twice, fix from Changli Gao.

 9) Memleak fix in wext-core from Julia Lawall.

10) Consistently handle invalid TCP packets in ipv4 vs.  ipv6 conntrack,
    from Jozsef Kadlecsik.

11) Validate IP header length properly in netfilter conntrack's
    ipv4_get_l4proto().

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (39 commits)
  NFC: Fix the LLCP Tx fragmentation loop
  rtlwifi: Add missing DMA buffer unmapping for PCI drivers
  rtlwifi: Preallocate USB read buffers and eliminate kalloc in read routine
  tcp: avoid order-1 allocations on wifi and tx path
  net: allow pskb_expand_head() to get maximum tailroom
  bridge: Do not send queries on multicast group leaves
  MAINTAINERS: Mark NATSEMI driver as orphan'd.
  tcp: fix tcp_rcv_rtt_update() use of an unscaled RTT sample
  tcp: restore correct limit
  Revert "ath9k: fix going to full-sleep on PS idle"
  rt2x00: Fix rfkill_polling register function.
  bcma: fix build error on MIPS; implicit pcibios_enable_device
  netfilter: nf_conntrack: fix incorrect logic in nf_conntrack_init_net
  netfilter: nf_ct_ipv4: packets with wrong ihl are invalid
  netfilter: nf_ct_ipv4: handle invalid IPv4 and IPv6 packets consistently
  net/wireless/wext-core.c: add missing kfree
  rtlwifi: Fix oops on rate-control failure
  mac80211: Convert WARN_ON to WARN_ON_ONCE
  rtlwifi: rtl8192de: Fix firmware initialization
  nl80211: ensure interface is up in various APIs
  ...
</content>
</entry>
<entry>
<title>tcp: avoid order-1 allocations on wifi and tx path</title>
<updated>2012-04-11T14:11:12Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2012-04-10T20:30:48Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a21d45726acacc963d8baddf74607d9b74e2b723'/>
<id>urn:sha1:a21d45726acacc963d8baddf74607d9b74e2b723</id>
<content type='text'>
Marc Merlin reported many order-1 allocations failures in TX path on its
wireless setup, that dont make any sense with MTU=1500 network, and non
SG capable hardware.

After investigation, it turns out TCP uses sk_stream_alloc_skb() and
used as a convention skb_tailroom(skb) to know how many bytes of data
payload could be put in this skb (for non SG capable devices)

Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER +
sizeof(struct skb_shared_info) being above 2048)

Later, mac80211 layer need to add some bytes at the tail of skb
(IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is
available has to call pskb_expand_head() and request order-1
allocations.

This patch changes sk_stream_alloc_skb() so that only
sk-&gt;sk_prot-&gt;max_header bytes of headroom are reserved, and use a new
skb field, avail_size to hold the data payload limit.

This way, order-0 allocations done by TCP stack can leave more than 2 KB
of tailroom and no more allocation is performed in mac80211 layer (or
any layer needing some tailroom)

avail_size is unioned with mark/dropcount, since mark will be set later
in IP stack for output packets. Therefore, skb size is unchanged.

Reported-by: Marc MERLIN &lt;marc@merlins.org&gt;
Tested-by: Marc MERLIN &lt;marc@merlins.org&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
