<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4/ip_input.c, branch v3.12.10</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/net/ipv4/ip_input.c?h=v3.12.10</id>
<link rel='self' href='https://git.amat.us/linux/atom/net/ipv4/ip_input.c?h=v3.12.10'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2014-02-06T19:22:21Z</updated>
<entry>
<title>net: Fix memory leak if TPROXY used with TCP early demux</title>
<updated>2014-02-06T19:22:21Z</updated>
<author>
<name>Holger Eitzenberger</name>
<email>holger@eitzenberger.org</email>
</author>
<published>2014-01-27T09:33:18Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=1b90dd2223bdc2a60e3bb7ed468158b2f867c3aa'/>
<id>urn:sha1:1b90dd2223bdc2a60e3bb7ed468158b2f867c3aa</id>
<content type='text'>
[ Upstream commit a452ce345d63ddf92cd101e4196569f8718ad319 ]

I see a memory leak when using a transparent HTTP proxy using TPROXY
together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable):

unreferenced object 0xffff88008cba4a40 (size 1696):
  comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s)
  hex dump (first 32 bytes):
    0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00  .. j@..7..2.....
    02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [&lt;ffffffff810b710a&gt;] kmem_cache_alloc+0xad/0xb9
    [&lt;ffffffff81270185&gt;] sk_prot_alloc+0x29/0xc5
    [&lt;ffffffff812702cf&gt;] sk_clone_lock+0x14/0x283
    [&lt;ffffffff812aaf3a&gt;] inet_csk_clone_lock+0xf/0x7b
    [&lt;ffffffff8129a893&gt;] netlink_broadcast+0x14/0x16
    [&lt;ffffffff812c1573&gt;] tcp_create_openreq_child+0x1b/0x4c3
    [&lt;ffffffff812c033e&gt;] tcp_v4_syn_recv_sock+0x38/0x25d
    [&lt;ffffffff812c13e4&gt;] tcp_check_req+0x25c/0x3d0
    [&lt;ffffffff812bf87a&gt;] tcp_v4_do_rcv+0x287/0x40e
    [&lt;ffffffff812a08a7&gt;] ip_route_input_noref+0x843/0xa55
    [&lt;ffffffff812bfeca&gt;] tcp_v4_rcv+0x4c9/0x725
    [&lt;ffffffff812a26f4&gt;] ip_local_deliver_finish+0xe9/0x154
    [&lt;ffffffff8127a927&gt;] __netif_receive_skb+0x4b2/0x514
    [&lt;ffffffff8127aa77&gt;] process_backlog+0xee/0x1c5
    [&lt;ffffffff8127c949&gt;] net_rx_action+0xa7/0x200
    [&lt;ffffffff81209d86&gt;] add_interrupt_randomness+0x39/0x157

But there are many more, resulting in the machine going OOM after some
days.

From looking at the TPROXY code, and with help from Florian, I see
that the memory leak is introduced in tcp_v4_early_demux():

  void tcp_v4_early_demux(struct sk_buff *skb)
  {
    /* ... */

    iph = ip_hdr(skb);
    th = tcp_hdr(skb);

    if (th-&gt;doff &lt; sizeof(struct tcphdr) / 4)
        return;

    sk = __inet_lookup_established(dev_net(skb-&gt;dev), &amp;tcp_hashinfo,
                       iph-&gt;saddr, th-&gt;source,
                       iph-&gt;daddr, ntohs(th-&gt;dest),
                       skb-&gt;skb_iif);
    if (sk) {
        skb-&gt;sk = sk;

where the socket is assigned unconditionally to skb-&gt;sk, also bumping
the refcnt on it.  This is problematic, because in our case the skb
has already a socket assigned in the TPROXY target.  This then results
in the leak I see.

The very same issue seems to be with IPv6, but haven't tested.

Reviewed-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Holger Eitzenberger &lt;holger@eitzenberger.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: add SNMP counters tracking incoming ECN bits</title>
<updated>2013-08-09T05:24:59Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-08-06T10:32:11Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=1f07d03e2069df2ecd82301936b598a3b257c6d6'/>
<id>urn:sha1:1f07d03e2069df2ecd82301936b598a3b257c6d6</id>
<content type='text'>
With GRO/LRO processing, there is a problem because Ip[6]InReceives SNMP
counters do not count the number of frames, but number of aggregated
segments.

Its probably too late to change this now.

This patch adds four new counters, tracking number of frames, regardless
of LRO/GRO, and on a per ECN status basis, for IPv4 and IPv6.

Ip[6]NoECTPkts : Number of packets received with NOECT
Ip[6]ECT1Pkts  : Number of packets received with ECT(1)
Ip[6]ECT0Pkts  : Number of packets received with ECT(0)
Ip[6]CEPkts    : Number of packets received with Congestion Experienced

lph37:~# nstat | egrep "Pkts|InReceive"
IpInReceives                    1634137            0.0
Ip6InReceives                   3714107            0.0
Ip6InNoECTPkts                  19205              0.0
Ip6InECT0Pkts                   52651828           0.0
IpExtInNoECTPkts                33630              0.0
IpExtInECT0Pkts                 15581379           0.0
IpExtInCEPkts                   6                  0.0

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: set transport header earlier</title>
<updated>2013-07-16T19:59:28Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-07-16T03:03:19Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=21d1196a35f5686c4323e42a62fdb4b23b0ab4a3'/>
<id>urn:sha1:21d1196a35f5686c4323e42a62fdb4b23b0ab4a3</id>
<content type='text'>
commit 45f00f99d6e ("ipv4: tcp: clean up tcp_v4_early_demux()") added a
performance regression for non GRO traffic, basically disabling
IP early demux.

IPv6 stack resets transport header in ip6_rcv() before calling
IP early demux in ip6_rcv_finish(), while IPv4 does this only in
ip_local_deliver_finish(), _after_ IP early demux.

GRO traffic happened to enable IP early demux because transport header
is also set in inet_gro_receive()

Instead of reverting the faulty commit, we can make IPv4/IPv6 behave the
same : transport_header should be set in ip_rcv() instead of
ip_local_deliver_finish()

ip_local_deliver_finish() can also use skb_network_header_len() which is
faster than ip_hdrlen()

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Add MIB counters for checksum errors</title>
<updated>2013-04-29T19:14:03Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-04-29T08:39:56Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=6a5dc9e598fe90160fee7de098fa319665f5253e'/>
<id>urn:sha1:6a5dc9e598fe90160fee7de098fa319665f5253e</id>
<content type='text'>
Add MIB counters for checksum errors in IP layer,
and TCP/UDP/ICMP layers, to help diagnose problems.

$ nstat -a | grep  Csum
IcmpInCsumErrors                72                 0.0
TcpInCsumErrors                 382                0.0
UdpInCsumErrors                 463221             0.0
Icmp6InCsumErrors               75                 0.0
Udp6InCsumErrors                173442             0.0
IpExtInCsumErrors               10884              0.0

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv[4|6]: correct dropwatch false positive in local_deliver_finish</title>
<updated>2013-03-01T20:56:29Z</updated>
<author>
<name>Neil Horman</name>
<email>nhorman@tuxdriver.com</email>
</author>
<published>2013-03-01T07:44:08Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d8c6f4b9b7848bca8babfc0ae43a50c8ab22fbb9'/>
<id>urn:sha1:d8c6f4b9b7848bca8babfc0ae43a50c8ab22fbb9</id>
<content type='text'>
I had a report recently of a user trying to use dropwatch to localise some frame
loss, and they were getting false positives.  Turned out they were using a user
space SCTP stack that used raw sockets to grab frames.  When we don't have a
registered protocol for a given packet, we record it as a drop, even if a raw
socket receieves the frame.  We should only record the drop in the event a raw
socket doesnt exist to receive the frames

Tested by the reported successfully

Signed-off-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Reported-by: William Reich &lt;reich@ulticom.com&gt;
Tested-by: William Reich &lt;reich@ulticom.com&gt;
CC: "David S. Miller" &lt;davem@davemloft.net&gt;
CC: William Reich &lt;reich@ulticom.com&gt;
CC: eric.dumazet@gmail.com
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Disallow non-namespace aware protocols to register.</title>
<updated>2013-02-05T19:42:23Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2013-02-05T19:42:23Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=547472b8e1da72ae226430c0c4273e36fc8ca768'/>
<id>urn:sha1:547472b8e1da72ae226430c0c4273e36fc8ca768</id>
<content type='text'>
All in-tree ipv4 protocol implementations are now namespace
aware.  Therefore all the run-time checks are superfluous.

Reject registry of any non-namespace aware ipv4 protocol.
Eventually we'll remove prot-&gt;netns_ok and this registry
time check as well.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: TCP early demux cleanup</title>
<updated>2012-07-30T21:53:21Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-07-29T21:06:13Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=cca32e4bf999a34ac08d959f351f2b30bcd02460'/>
<id>urn:sha1:cca32e4bf999a34ac08d959f351f2b30bcd02460</id>
<content type='text'>
early_demux() handlers should be called in RCU context, and as we
use skb_dst_set_noref(skb, dst), caller must not exit from RCU context
before dst use (skb_dst(skb)) or release (skb_drop(dst))

Therefore, rcu_read_lock()/rcu_read_unlock() pairs around
-&gt;early_demux() are confusing and not needed :

Protocol handlers are already in an RCU read lock section.
(__netif_receive_skb() does the rcu_read_lock() )

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: Early TCP socket demux</title>
<updated>2012-07-26T22:50:39Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-07-26T12:18:11Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=c7109986db3c945f50ceed884a30e0fd8af3b89b'/>
<id>urn:sha1:c7109986db3c945f50ceed884a30e0fd8af3b89b</id>
<content type='text'>
This is the IPv6 missing bits for infrastructure added in commit
41063e9dd1195 (ipv4: Early TCP socket demux.)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Fix input route performance regression.</title>
<updated>2012-07-26T22:50:39Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2012-07-26T11:14:38Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=c6cffba4ffa26a8ffacd0bb9f3144e34f20da7de'/>
<id>urn:sha1:c6cffba4ffa26a8ffacd0bb9f3144e34f20da7de</id>
<content type='text'>
With the routing cache removal we lost the "noref" code paths on
input, and this can kill some routing workloads.

Reinstate the noref path when we hit a cached route in the FIB
nexthops.

With help from Eric Dumazet.

Reported-by: Alexander Duyck &lt;alexander.duyck@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: early_demux fixes</title>
<updated>2012-07-24T20:54:15Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-07-24T01:19:31Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9cb429d692b341e972b12e6cd097364050ebbb26'/>
<id>urn:sha1:9cb429d692b341e972b12e6cd097364050ebbb26</id>
<content type='text'>
1) Remove a non needed pskb_may_pull() in tcp_v4_early_demux()
   and fix a potential bug if skb-&gt;head was reallocated
   (iph &amp; th pointers were not reloaded)

TCP stack will pull/check headers anyway.

2) must reload iph in ip_rcv_finish() after early_demux()
 call since skb-&gt;head might have changed.

3) skb-&gt;dev-&gt;ifindex can be now replaced by skb-&gt;skb_iif

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
