<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4, branch v3.12.10</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/net/ipv4?h=v3.12.10</id>
<link rel='self' href='https://git.amat.us/linux/atom/net/ipv4?h=v3.12.10'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2014-02-06T19:22:21Z</updated>
<entry>
<title>inet_diag: fix inet_diag_dump_icsk() timewait socket state logic</title>
<updated>2014-02-06T19:22:21Z</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2014-02-03T01:40:13Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=18fe5f72e8cd2f549002ada925c1a43daafbf4c1'/>
<id>urn:sha1:18fe5f72e8cd2f549002ada925c1a43daafbf4c1</id>
<content type='text'>
[ Based upon upstream commit 70315d22d3c7383f9a508d0aab21e2eb35b2303a ]

Fix inet_diag_dump_icsk() to reflect the fact that both TIME_WAIT and
FIN_WAIT2 connections are represented by inet_timewait_sock (not just
TIME_WAIT). Thus:

(a) We need to iterate through the time_wait buckets if the user wants
either TIME_WAIT or FIN_WAIT2. (Before fixing this, "ss -nemoi state
fin-wait-2" would not return any sockets, even if there were some in
FIN_WAIT2.)

(b) We need to check tw_substate to see if the user wants to dump
sockets in the particular substate (TIME_WAIT or FIN_WAIT2) that a
given connection is in. (Before fixing this, "ss -nemoi state
time-wait" would actually return sockets in state FIN_WAIT2.)

An analogous fix is in v3.13: 70315d22d3c7383f9a508d0aab21e2eb35b2303a
("inet_diag: fix inet_diag_dump_icsk() to use correct state for
timewait sockets") but that patch is quite different because 3.13 code
is very different in this area due to the unification of TCP hash
tables in 05dbc7b ("tcp/dccp: remove twchain") in v3.13-rc1.

I tested that this applies cleanly between v3.3 and v3.12, and tested
that it works in both 3.3 and 3.12. It does not apply cleanly to 3.2
and earlier (though it makes semantic sense), and semantically is not
the right fix for 3.13 and beyond (as mentioned above).

Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: gre: use icmp_hdr() to get inner ip header</title>
<updated>2014-02-06T19:22:21Z</updated>
<author>
<name>Duan Jiong</name>
<email>duanj.fnst@cn.fujitsu.com</email>
</author>
<published>2014-01-28T03:49:43Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3760d6b277229dbfbbacbab08dbebc71c0a94145'/>
<id>urn:sha1:3760d6b277229dbfbbacbab08dbebc71c0a94145</id>
<content type='text'>
[ Upstream commit c0c0c50ff7c3e331c90bab316d21f724fb9e1994 ]

When dealing with icmp messages, the skb-&gt;data points the
ip header that triggered the sending of the icmp message.

In gre_cisco_err(), the parse_gre_header() is called, and the
iptunnel_pull_header() is called to pull the skb at the end of
the parse_gre_header(), so the skb-&gt;data doesn't point the
inner ip header.

Unfortunately, the ipgre_err still needs those ip addresses in
inner ip header to look up tunnel by ip_tunnel_lookup().

So just use icmp_hdr() to get inner ip header instead of skb-&gt;data.

Signed-off-by: Duan Jiong &lt;duanj.fnst@cn.fujitsu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: Fix memory leak if TPROXY used with TCP early demux</title>
<updated>2014-02-06T19:22:21Z</updated>
<author>
<name>Holger Eitzenberger</name>
<email>holger@eitzenberger.org</email>
</author>
<published>2014-01-27T09:33:18Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=1b90dd2223bdc2a60e3bb7ed468158b2f867c3aa'/>
<id>urn:sha1:1b90dd2223bdc2a60e3bb7ed468158b2f867c3aa</id>
<content type='text'>
[ Upstream commit a452ce345d63ddf92cd101e4196569f8718ad319 ]

I see a memory leak when using a transparent HTTP proxy using TPROXY
together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable):

unreferenced object 0xffff88008cba4a40 (size 1696):
  comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s)
  hex dump (first 32 bytes):
    0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00  .. j@..7..2.....
    02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [&lt;ffffffff810b710a&gt;] kmem_cache_alloc+0xad/0xb9
    [&lt;ffffffff81270185&gt;] sk_prot_alloc+0x29/0xc5
    [&lt;ffffffff812702cf&gt;] sk_clone_lock+0x14/0x283
    [&lt;ffffffff812aaf3a&gt;] inet_csk_clone_lock+0xf/0x7b
    [&lt;ffffffff8129a893&gt;] netlink_broadcast+0x14/0x16
    [&lt;ffffffff812c1573&gt;] tcp_create_openreq_child+0x1b/0x4c3
    [&lt;ffffffff812c033e&gt;] tcp_v4_syn_recv_sock+0x38/0x25d
    [&lt;ffffffff812c13e4&gt;] tcp_check_req+0x25c/0x3d0
    [&lt;ffffffff812bf87a&gt;] tcp_v4_do_rcv+0x287/0x40e
    [&lt;ffffffff812a08a7&gt;] ip_route_input_noref+0x843/0xa55
    [&lt;ffffffff812bfeca&gt;] tcp_v4_rcv+0x4c9/0x725
    [&lt;ffffffff812a26f4&gt;] ip_local_deliver_finish+0xe9/0x154
    [&lt;ffffffff8127a927&gt;] __netif_receive_skb+0x4b2/0x514
    [&lt;ffffffff8127aa77&gt;] process_backlog+0xee/0x1c5
    [&lt;ffffffff8127c949&gt;] net_rx_action+0xa7/0x200
    [&lt;ffffffff81209d86&gt;] add_interrupt_randomness+0x39/0x157

But there are many more, resulting in the machine going OOM after some
days.

From looking at the TPROXY code, and with help from Florian, I see
that the memory leak is introduced in tcp_v4_early_demux():

  void tcp_v4_early_demux(struct sk_buff *skb)
  {
    /* ... */

    iph = ip_hdr(skb);
    th = tcp_hdr(skb);

    if (th-&gt;doff &lt; sizeof(struct tcphdr) / 4)
        return;

    sk = __inet_lookup_established(dev_net(skb-&gt;dev), &amp;tcp_hashinfo,
                       iph-&gt;saddr, th-&gt;source,
                       iph-&gt;daddr, ntohs(th-&gt;dest),
                       skb-&gt;skb_iif);
    if (sk) {
        skb-&gt;sk = sk;

where the socket is assigned unconditionally to skb-&gt;sk, also bumping
the refcnt on it.  This is problematic, because in our case the skb
has already a socket assigned in the TPROXY target.  This then results
in the leak I see.

The very same issue seems to be with IPv6, but haven't tested.

Reviewed-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Holger Eitzenberger &lt;holger@eitzenberger.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>fib_frontend: fix possible NULL pointer dereference</title>
<updated>2014-02-06T19:22:21Z</updated>
<author>
<name>Oliver Hartkopp</name>
<email>socketcan@hartkopp.net</email>
</author>
<published>2014-01-23T09:19:34Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=2b29a742344c0837b1a548ebcd9eccb19aed83f7'/>
<id>urn:sha1:2b29a742344c0837b1a548ebcd9eccb19aed83f7</id>
<content type='text'>
[ Upstream commit a0065f266a9b5d51575535a25c15ccbeed9a9966 ]

The two commits 0115e8e30d (net: remove delay at device dismantle) and
748e2d9396a (net: reinstate rtnl in call_netdevice_notifiers()) silently
removed a NULL pointer check for in_dev since Linux 3.7.

This patch re-introduces this check as it causes crashing the kernel when
setting small mtu values on non-ip capable netdevices.

Signed-off-by: Oliver Hartkopp &lt;socketcan@hartkopp.net&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called</title>
<updated>2014-02-06T19:22:20Z</updated>
<author>
<name>Duan Jiong</name>
<email>duanj.fnst@cn.fujitsu.com</email>
</author>
<published>2014-01-23T06:00:25Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9cafae949e2126d396b745bf06626a08e803b282'/>
<id>urn:sha1:9cafae949e2126d396b745bf06626a08e803b282</id>
<content type='text'>
[ Upstream commit 11c21a307d79ea5f6b6fc0d3dfdeda271e5e65f6 ]

commit a622260254ee48("ip_tunnel: fix kernel panic with icmp_dest_unreach")
clear IPCB in ip_tunnel_xmit()  , or else skb-&gt;cb[] may contain garbage from
GSO segmentation layer.

But commit 0e6fbc5b6c621("ip_tunnels: extend iptunnel_xmit()") refactor codes,
and it clear IPCB behind the dst_link_failure().

So clear IPCB in ip_tunnel_xmit() just like commti a622260254ee48("ip_tunnel:
fix kernel panic with icmp_dest_unreach").

Signed-off-by: Duan Jiong &lt;duanj.fnst@cn.fujitsu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tcp: metrics: Avoid duplicate entries with the same destination-IP</title>
<updated>2014-02-06T19:22:20Z</updated>
<author>
<name>Christoph Paasch</name>
<email>christoph.paasch@uclouvain.be</email>
</author>
<published>2014-01-16T19:01:21Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=5e3b0f65e71dfa5c1a2e276f1a0df01a79aa07e3'/>
<id>urn:sha1:5e3b0f65e71dfa5c1a2e276f1a0df01a79aa07e3</id>
<content type='text'>
[ Upstream commit 77f99ad16a07aa062c2d30fae57b1fee456f6ef6 ]

Because the tcp-metrics is an RCU-list, it may be that two
soft-interrupts are inside __tcp_get_metrics() for the same
destination-IP at the same time. If this destination-IP is not yet part of
the tcp-metrics, both soft-interrupts will end up in tcpm_new and create
a new entry for this IP.
So, we will have two tcp-metrics with the same destination-IP in the list.

This patch checks twice __tcp_get_metrics(). First without holding the
lock, then while holding the lock. The second one is there to confirm
that the entry has not been added by another soft-irq while waiting for
the spin-lock.

Fixes: 51c5d0c4b169b (tcp: Maintain dynamic metrics in local cache.)
Signed-off-by: Christoph Paasch &lt;christoph.paasch@uclouvain.be&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: avoid reference counter overflows on fib_rules in multicast forwarding</title>
<updated>2014-02-06T19:22:20Z</updated>
<author>
<name>Hannes Frederic Sowa</name>
<email>hannes@stressinduktion.org</email>
</author>
<published>2014-01-13T01:45:22Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d09a96563bed3178012014275aa3ff912a0b38ce'/>
<id>urn:sha1:d09a96563bed3178012014275aa3ff912a0b38ce</id>
<content type='text'>
[ Upstream commit 95f4a45de1a0f172b35451fc52283290adb21f6e ]

Bob Falken reported that after 4G packets, multicast forwarding stopped
working. This was because of a rule reference counter overflow which
freed the rule as soon as the overflow happend.

This patch solves this by adding the FIB_LOOKUP_NOREF flag to
fib_rules_lookup calls. This is safe even from non-rcu locked sections
as in this case the flag only implies not taking a reference to the rule,
which we don't need at all.

Rules only hold references to the namespace, which are guaranteed to be
available during the call of the non-rcu protected function reg_vif_xmit
because of the interface reference which itself holds a reference to
the net namespace.

Fixes: f0ad0860d01e47 ("ipv4: ipmr: support multiple tables")
Fixes: d1db275dd3f6e4 ("ipv6: ip6mr: support multiple tables")
Reported-by: Bob Falken &lt;NetFestivalHaveFun@gmx.com&gt;
Cc: Patrick McHardy &lt;kaber@trash.net&gt;
Cc: Thomas Graf &lt;tgraf@suug.ch&gt;
Cc: Julian Anastasov &lt;ja@ssi.bg&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ipv4: fix tunneled VM traffic over hw VXLAN/GRE GSO NIC</title>
<updated>2014-01-15T23:31:37Z</updated>
<author>
<name>Wei-Chun Chao</name>
<email>weichunc@plumgrid.com</email>
</author>
<published>2013-12-26T21:10:22Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=b21d217ce4403082908511d0296ee854f30a8014'/>
<id>urn:sha1:b21d217ce4403082908511d0296ee854f30a8014</id>
<content type='text'>
[ Upstream commit 7a7ffbabf99445704be01bff5d7e360da908cf8e ]

VM to VM GSO traffic is broken if it goes through VXLAN or GRE
tunnel and the physical NIC on the host supports hardware VXLAN/GRE
GSO offload (e.g. bnx2x and next-gen mlx4).

Two issues -
(VXLAN) VM traffic has SKB_GSO_DODGY and SKB_GSO_UDP_TUNNEL with
SKB_GSO_TCP/UDP set depending on the inner protocol. GSO header
integrity check fails in udp4_ufo_fragment if inner protocol is
TCP. Also gso_segs is calculated incorrectly using skb-&gt;len that
includes tunnel header. Fix: robust check should only be applied
to the inner packet.

(VXLAN &amp; GRE) Once GSO header integrity check passes, NULL segs
is returned and the original skb is sent to hardware. However the
tunnel header is already pulled. Fix: tunnel header needs to be
restored so that hardware can perform GSO properly on the original
packet.

Signed-off-by: Wei-Chun Chao &lt;weichunc@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: inet_diag: zero out uninitialized idiag_{src,dst} fields</title>
<updated>2014-01-15T23:31:35Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>dborkman@redhat.com</email>
</author>
<published>2013-12-16T23:38:39Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=fbe0048b51d62f0c44e6e2b7267c29b66ba59912'/>
<id>urn:sha1:fbe0048b51d62f0c44e6e2b7267c29b66ba59912</id>
<content type='text'>
[ Upstream commit b1aac815c0891fe4a55a6b0b715910142227700f ]

Jakub reported while working with nlmon netlink sniffer that parts of
the inet_diag_sockid are not initialized when r-&gt;idiag_family != AF_INET6.
That is, fields of r-&gt;id.idiag_src[1 ... 3], r-&gt;id.idiag_dst[1 ... 3].

In fact, it seems that we can leak 6 * sizeof(u32) byte of kernel [slab]
memory through this. At least, in udp_dump_one(), we allocate a skb in ...

  rep = nlmsg_new(sizeof(struct inet_diag_msg) + ..., GFP_KERNEL);

... and then pass that to inet_sk_diag_fill() that puts the whole struct
inet_diag_msg into the skb, where we only fill out r-&gt;id.idiag_src[0],
r-&gt;id.idiag_dst[0] and leave the rest untouched:

  r-&gt;id.idiag_src[0] = inet-&gt;inet_rcv_saddr;
  r-&gt;id.idiag_dst[0] = inet-&gt;inet_daddr;

struct inet_diag_msg embeds struct inet_diag_sockid that is correctly /
fully filled out in IPv6 case, but for IPv4 not.

So just zero them out by using plain memset (for this little amount of
bytes it's probably not worth the extra check for idiag_family == AF_INET).

Similarly, fix also other places where we fill that out.

Reported-by: Jakub Zawadzki &lt;darkjames-ws@darkjames.pl&gt;
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ip_gre: fix msg_name parsing for recvfrom/recvmsg</title>
<updated>2014-01-15T23:31:35Z</updated>
<author>
<name>Timo Teräs</name>
<email>timo.teras@iki.fi</email>
</author>
<published>2013-12-16T09:02:09Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d6e4f9236bf279e92be402319d57060303d32cfa'/>
<id>urn:sha1:d6e4f9236bf279e92be402319d57060303d32cfa</id>
<content type='text'>
[ Upstream commit 0e3da5bb8da45890b1dc413404e0f978ab71173e ]

ipgre_header_parse() needs to parse the tunnel's ip header and it
uses mac_header to locate the iphdr. This got broken when gre tunneling
was refactored as mac_header is no longer updated to point to iphdr.
Introduce skb_pop_mac_header() helper to do the mac_header assignment
and use it in ipgre_rcv() to fix msg_name parsing.

Bug introduced in commit c54419321455 (GRE: Refactor GRE tunneling code.)

Cc: Pravin B Shelar &lt;pshelar@nicira.com&gt;
Signed-off-by: Timo Teräs &lt;timo.teras@iki.fi&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
