<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4, branch v3.4-rc3</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/net/ipv4?h=v3.4-rc3</id>
<link rel='self' href='https://git.amat.us/linux/atom/net/ipv4?h=v3.4-rc3'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2012-04-12T21:04:33Z</updated>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2012-04-12T21:04:33Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-04-12T21:04:33Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=174808af90a06ee59ffedd60c00c252f1f887f25'/>
<id>urn:sha1:174808af90a06ee59ffedd60c00c252f1f887f25</id>
<content type='text'>
Pull networking fixes from David Miller:

 1) Fix bluetooth userland regression reported by Keith Packard, from
    Gustavo Padovan.

 2) Revert ath9k PS idle change, from Sujith Manoharan.

 3) Correct default TCP memory limits (again), from Eric Dumazet.

 4) Fix tcp_rcv_rtt_update() accidental use of unscaled RTT, from Neal
    Cardwell.

 5) We made a facility for layers like wireless to say how much tailroom
    they need in the SKB for link layer stuff such as wireless
    encryption etc., but TCP works hard to fill every SKB out to the end
    defeating this specification.

    This leads to every TCP packet getting reallocated by the wireless
    code in order to have the right amount of tailroom available.

    Fix TCP to only fill SKBs out to the real amount of data area it
    asked for during the allocation, this way it won't eat into the
    slack added for the device's tailroom needs.

    Reported by Marc Merlin and fixed by Eric Dumazet.

 6) Leaks, endian bugs, and new device IDs in bluetooth from Santosh
    Nayak, João Paulo Rechi Vita, Cho, Yu-Chen, Andrei Emeltchenko,
    AceLan Kao, and Andrei Emeltchenko.

 7) OOPS on tty_close fix in bluetooth's hci_ldisc from Johan Hovold.

 8) netfilter erroneously scales TCP window twice, fix from Changli Gao.

 9) Memleak fix in wext-core from Julia Lawall.

10) Consistently handle invalid TCP packets in ipv4 vs.  ipv6 conntrack,
    from Jozsef Kadlecsik.

11) Validate IP header length properly in netfilter conntrack's
    ipv4_get_l4proto().

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (39 commits)
  NFC: Fix the LLCP Tx fragmentation loop
  rtlwifi: Add missing DMA buffer unmapping for PCI drivers
  rtlwifi: Preallocate USB read buffers and eliminate kalloc in read routine
  tcp: avoid order-1 allocations on wifi and tx path
  net: allow pskb_expand_head() to get maximum tailroom
  bridge: Do not send queries on multicast group leaves
  MAINTAINERS: Mark NATSEMI driver as orphan'd.
  tcp: fix tcp_rcv_rtt_update() use of an unscaled RTT sample
  tcp: restore correct limit
  Revert "ath9k: fix going to full-sleep on PS idle"
  rt2x00: Fix rfkill_polling register function.
  bcma: fix build error on MIPS; implicit pcibios_enable_device
  netfilter: nf_conntrack: fix incorrect logic in nf_conntrack_init_net
  netfilter: nf_ct_ipv4: packets with wrong ihl are invalid
  netfilter: nf_ct_ipv4: handle invalid IPv4 and IPv6 packets consistently
  net/wireless/wext-core.c: add missing kfree
  rtlwifi: Fix oops on rate-control failure
  mac80211: Convert WARN_ON to WARN_ON_ONCE
  rtlwifi: rtl8192de: Fix firmware initialization
  nl80211: ensure interface is up in various APIs
  ...
</content>
</entry>
<entry>
<title>tcp: avoid order-1 allocations on wifi and tx path</title>
<updated>2012-04-11T14:11:12Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2012-04-10T20:30:48Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a21d45726acacc963d8baddf74607d9b74e2b723'/>
<id>urn:sha1:a21d45726acacc963d8baddf74607d9b74e2b723</id>
<content type='text'>
Marc Merlin reported many order-1 allocations failures in TX path on its
wireless setup, that dont make any sense with MTU=1500 network, and non
SG capable hardware.

After investigation, it turns out TCP uses sk_stream_alloc_skb() and
used as a convention skb_tailroom(skb) to know how many bytes of data
payload could be put in this skb (for non SG capable devices)

Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER +
sizeof(struct skb_shared_info) being above 2048)

Later, mac80211 layer need to add some bytes at the tail of skb
(IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is
available has to call pskb_expand_head() and request order-1
allocations.

This patch changes sk_stream_alloc_skb() so that only
sk-&gt;sk_prot-&gt;max_header bytes of headroom are reserved, and use a new
skb field, avail_size to hold the data payload limit.

This way, order-0 allocations done by TCP stack can leave more than 2 KB
of tailroom and no more allocation is performed in mac80211 layer (or
any layer needing some tailroom)

avail_size is unioned with mark/dropcount, since mark will be set later
in IP stack for output packets. Therefore, skb size is unchanged.

Reported-by: Marc MERLIN &lt;marc@merlins.org&gt;
Tested-by: Marc MERLIN &lt;marc@merlins.org&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge tag 'dmaengine-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine</title>
<updated>2012-04-10T22:30:16Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-04-10T22:30:16Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=94fb175c0414902ad9dbd956addf3a5feafbc85b'/>
<id>urn:sha1:94fb175c0414902ad9dbd956addf3a5feafbc85b</id>
<content type='text'>
Pull dmaengine fixes from Dan Williams:

1/ regression fix for Xen as it now trips over a broken assumption
   about the dma address size on 32-bit builds

2/ new quirk for netdma to ignore dma channels that cannot meet
   netdma alignment requirements

3/ fixes for two long standing issues in ioatdma (ring size overflow)
   and iop-adma (potential stack corruption)

* tag 'dmaengine-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
  netdma: adding alignment check for NETDMA ops
  ioatdma: DMA copy alignment needed to address IOAT DMA silicon errata
  ioat: ring size variables need to be 32bit to avoid overflow
  iop-adma: Corrected array overflow in RAID6 Xscale(R) test.
  ioat: fix size of 'completion' for Xen
</content>
</entry>
<entry>
<title>tcp: fix tcp_rcv_rtt_update() use of an unscaled RTT sample</title>
<updated>2012-04-10T18:47:09Z</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2012-04-10T07:59:20Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=18a223e0b9ec8979320ba364b47c9772391d6d05'/>
<id>urn:sha1:18a223e0b9ec8979320ba364b47c9772391d6d05</id>
<content type='text'>
Fix a code path in tcp_rcv_rtt_update() that was comparing scaled and
unscaled RTT samples.

The intent in the code was to only use the 'm' measurement if it was a
new minimum.  However, since 'm' had not yet been shifted left 3 bits
but 'new_sample' had, this comparison would nearly always succeed,
leading us to erroneously set our receive-side RTT estimate to the 'm'
sample when that sample could be nearly 8x too high to use.

The overall effect is to often cause the receive-side RTT estimate to
be significantly too large (up to 40% too large for brief periods in
my tests).

Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: restore correct limit</title>
<updated>2012-04-10T18:39:26Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2012-04-10T00:56:42Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=5fb84b1428b271f8767e0eb3fcd7231896edfaa4'/>
<id>urn:sha1:5fb84b1428b271f8767e0eb3fcd7231896edfaa4</id>
<content type='text'>
Commit c43b874d5d714f (tcp: properly initialize tcp memory limits) tried
to fix a regression added in commits 4acb4190 &amp; 3dc43e3,
but still get it wrong.

Result is machines with low amount of memory have too small tcp_rmem[2]
value and slow tcp receives : Per socket limit being 1/1024 of memory
instead of 1/128 in old kernels, so rcv window is capped to small
values.

Fix this to match comment and previous behavior.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Jason Wang &lt;jasowang@redhat.com&gt;
Cc: Glauber Costa &lt;glommer@parallels.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_ct_ipv4: packets with wrong ihl are invalid</title>
<updated>2012-04-10T10:50:49Z</updated>
<author>
<name>Jozsef Kadlecsik</name>
<email>kadlec@blackhole.kfki.hu</email>
</author>
<published>2012-04-03T20:02:01Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=07153c6ec074257ade76a461429b567cff2b3a1e'/>
<id>urn:sha1:07153c6ec074257ade76a461429b567cff2b3a1e</id>
<content type='text'>
It was reported that the Linux kernel sometimes logs:

klogd: [2629147.402413] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 447!
klogd: [1072212.887368] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 392

ipv4_get_l4proto() in nf_conntrack_l3proto_ipv4.c and tcp_error() in
nf_conntrack_proto_tcp.c should catch malformed packets, so the errors
at the indicated lines - TCP options parsing - should not happen.
However, tcp_error() relies on the "dataoff" offset to the TCP header,
calculated by ipv4_get_l4proto().  But ipv4_get_l4proto() does not check
bogus ihl values in IPv4 packets, which then can slip through tcp_error()
and get caught at the TCP options parsing routines.

The patch fixes ipv4_get_l4proto() by invalidating packets with bogus
ihl value.

The patch closes netfilter bugzilla id 771.

Signed-off-by: Jozsef Kadlecsik &lt;kadlec@blackhole.kfki.hu&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_ct_ipv4: handle invalid IPv4 and IPv6 packets consistently</title>
<updated>2012-04-09T22:38:34Z</updated>
<author>
<name>Jozsef Kadlecsik</name>
<email>kadlec@blackhole.kfki.hu</email>
</author>
<published>2012-04-09T14:32:16Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=8430eac2f6a3c2adce22d490e2ab8bb50d59077a'/>
<id>urn:sha1:8430eac2f6a3c2adce22d490e2ab8bb50d59077a</id>
<content type='text'>
IPv6 conntrack marked invalid packets as INVALID and let the user
drop those by an explicit rule, while IPv4 conntrack dropped such
packets itself.

IPv4 conntrack is changed so that it marks INVALID packets and let
the user to drop them.

Signed-off-by: Jozsef Kadlecsik &lt;kadlec@blackhole.kfki.hu&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>tcp: tcp_sendpages() should call tcp_push() once</title>
<updated>2012-04-05T23:04:27Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2012-04-05T03:05:35Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=35f9c09fe9c72eb8ca2b8e89a593e1c151f28fc2'/>
<id>urn:sha1:35f9c09fe9c72eb8ca2b8e89a593e1c151f28fc2</id>
<content type='text'>
commit 2f533844242 (tcp: allow splice() to build full TSO packets) added
a regression for splice() calls using SPLICE_F_MORE.

We need to call tcp_flush() at the end of the last page processed in
tcp_sendpages(), or else transmits can be deferred and future sends
stall.

Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but
with different semantic.

For all sendpage() providers, its a transparent change. Only
sock_sendpage() and tcp_sendpages() can differentiate the two different
flags provided by pipe_to_sendpage()

Reported-by: Tom Herbert &lt;therbert@google.com&gt;
Cc: Nandita Dukkipati &lt;nanditad@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: H.K. Jerry Chu &lt;hkchu@google.com&gt;
Cc: Maciej Żenczykowski &lt;maze@google.com&gt;
Cc: Mahesh Bandewar &lt;maheshb@google.com&gt;
Cc: Ilpo Järvinen &lt;ilpo.jarvinen@helsinki.fi&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail&gt;com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netdma: adding alignment check for NETDMA ops</title>
<updated>2012-04-05T22:27:12Z</updated>
<author>
<name>Dave Jiang</name>
<email>dave.jiang@intel.com</email>
</author>
<published>2012-04-04T23:10:46Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a2bd1140a264b561e38d99e656cd843c2d840e86'/>
<id>urn:sha1:a2bd1140a264b561e38d99e656cd843c2d840e86</id>
<content type='text'>
This is the fallout from adding memcpy alignment workaround for certain
IOATDMA hardware. NetDMA will only use DMA engine that can handle byte align
ops.

Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>tcp: allow splice() to build full TSO packets</title>
<updated>2012-04-03T21:35:43Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2012-04-03T09:37:01Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=2f53384424251c06038ae612e56231b96ab610ee'/>
<id>urn:sha1:2f53384424251c06038ae612e56231b96ab610ee</id>
<content type='text'>
vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)

The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.

4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)

This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)

In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.

Call tcp_push() only if MSG_MORE is not set in the flags parameter.

This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()
flag.

In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)

Reported-by: Tom Herbert &lt;therbert@google.com&gt;
Cc: Nandita Dukkipati &lt;nanditad@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: H.K. Jerry Chu &lt;hkchu@google.com&gt;
Cc: Maciej Żenczykowski &lt;maze@google.com&gt;
Cc: Mahesh Bandewar &lt;maheshb@google.com&gt;
Cc: Ilpo Järvinen &lt;ilpo.jarvinen@helsinki.fi&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail&gt;com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
