linux/net, branch v3.12.10

inet_diag: fix inet_diag_dump_icsk() timewait socket state logic

2014-02-06T19:22:21Z

[ Based upon upstream commit 70315d22d3c7383f9a508d0aab21e2eb35b2303a ] Fix inet_diag_dump_icsk() to reflect the fact that both TIME_WAIT and FIN_WAIT2 connections are represented by inet_timewait_sock (not just TIME_WAIT). Thus: (a) We need to iterate through the time_wait buckets if the user wants either TIME_WAIT or FIN_WAIT2. (Before fixing this, "ss -nemoi state fin-wait-2" would not return any sockets, even if there were some in FIN_WAIT2.) (b) We need to check tw_substate to see if the user wants to dump sockets in the particular substate (TIME_WAIT or FIN_WAIT2) that a given connection is in. (Before fixing this, "ss -nemoi state time-wait" would actually return sockets in state FIN_WAIT2.) An analogous fix is in v3.13: 70315d22d3c7383f9a508d0aab21e2eb35b2303a ("inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets") but that patch is quite different because 3.13 code is very different in this area due to the unification of TCP hash tables in 05dbc7b ("tcp/dccp: remove twchain") in v3.13-rc1. I tested that this applies cleanly between v3.3 and v3.12, and tested that it works in both 3.3 and 3.12. It does not apply cleanly to 3.2 and earlier (though it makes semantic sense), and semantically is not the right fix for 3.13 and beyond (as mentioned above). Signed-off-by: Neal Cardwell Cc: Eric Dumazet Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

net: gre: use icmp_hdr() to get inner ip header

2014-02-06T19:22:21Z

[ Upstream commit c0c0c50ff7c3e331c90bab316d21f724fb9e1994 ] When dealing with icmp messages, the skb->data points the ip header that triggered the sending of the icmp message. In gre_cisco_err(), the parse_gre_header() is called, and the iptunnel_pull_header() is called to pull the skb at the end of the parse_gre_header(), so the skb->data doesn't point the inner ip header. Unfortunately, the ipgre_err still needs those ip addresses in inner ip header to look up tunnel by ip_tunnel_lookup(). So just use icmp_hdr() to get inner ip header instead of skb->data. Signed-off-by: Duan Jiong Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

net: Fix memory leak if TPROXY used with TCP early demux

2014-02-06T19:22:21Z

[ Upstream commit a452ce345d63ddf92cd101e4196569f8718ad319 ] I see a memory leak when using a transparent HTTP proxy using TPROXY together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable): unreferenced object 0xffff88008cba4a40 (size 1696): comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s) hex dump (first 32 bytes): 0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00 .. j@..7..2..... 02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [] kmem_cache_alloc+0xad/0xb9 [] sk_prot_alloc+0x29/0xc5 [] sk_clone_lock+0x14/0x283 [] inet_csk_clone_lock+0xf/0x7b [] netlink_broadcast+0x14/0x16 [] tcp_create_openreq_child+0x1b/0x4c3 [] tcp_v4_syn_recv_sock+0x38/0x25d [] tcp_check_req+0x25c/0x3d0 [] tcp_v4_do_rcv+0x287/0x40e [] ip_route_input_noref+0x843/0xa55 [] tcp_v4_rcv+0x4c9/0x725 [] ip_local_deliver_finish+0xe9/0x154 [] __netif_receive_skb+0x4b2/0x514 [] process_backlog+0xee/0x1c5 [] net_rx_action+0xa7/0x200 [] add_interrupt_randomness+0x39/0x157 But there are many more, resulting in the machine going OOM after some days. From looking at the TPROXY code, and with help from Florian, I see that the memory leak is introduced in tcp_v4_early_demux(): void tcp_v4_early_demux(struct sk_buff *skb) { /* ... */ iph = ip_hdr(skb); th = tcp_hdr(skb); if (th->doff < sizeof(struct tcphdr) / 4) return; sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo, iph->saddr, th->source, iph->daddr, ntohs(th->dest), skb->skb_iif); if (sk) { skb->sk = sk; where the socket is assigned unconditionally to skb->sk, also bumping the refcnt on it. This is problematic, because in our case the skb has already a socket assigned in the TPROXY target. This then results in the leak I see. The very same issue seems to be with IPv6, but haven't tested. Reviewed-by: Florian Westphal Signed-off-by: Holger Eitzenberger Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

fib_frontend: fix possible NULL pointer dereference

2014-02-06T19:22:21Z

[ Upstream commit a0065f266a9b5d51575535a25c15ccbeed9a9966 ] The two commits 0115e8e30d (net: remove delay at device dismantle) and 748e2d9396a (net: reinstate rtnl in call_netdevice_notifiers()) silently removed a NULL pointer check for in_dev since Linux 3.7. This patch re-introduces this check as it causes crashing the kernel when setting small mtu values on non-ip capable netdevices. Signed-off-by: Oliver Hartkopp Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called

2014-02-06T19:22:20Z

[ Upstream commit 11c21a307d79ea5f6b6fc0d3dfdeda271e5e65f6 ] commit a622260254ee48("ip_tunnel: fix kernel panic with icmp_dest_unreach") clear IPCB in ip_tunnel_xmit() , or else skb->cb[] may contain garbage from GSO segmentation layer. But commit 0e6fbc5b6c621("ip_tunnels: extend iptunnel_xmit()") refactor codes, and it clear IPCB behind the dst_link_failure(). So clear IPCB in ip_tunnel_xmit() just like commti a622260254ee48("ip_tunnel: fix kernel panic with icmp_dest_unreach"). Signed-off-by: Duan Jiong Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

bpf: do not use reciprocal divide

2014-02-06T19:22:20Z

[ Upstream commit aee636c4809fa54848ff07a899b326eb1f9987a2 ] At first Jakub Zawadzki noticed that some divisions by reciprocal_divide were not correct. (off by one in some cases) http://www.wireshark.org/~darkjames/reciprocal-buggy.c He could also show this with BPF: http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c The reciprocal divide in linux kernel is not generic enough, lets remove its use in BPF, as it is not worth the pain with current cpus. Signed-off-by: Eric Dumazet Reported-by: Jakub Zawadzki Cc: Mircea Gherzan Cc: Daniel Borkmann Cc: Hannes Frederic Sowa Cc: Matt Evans Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: David S. Miller Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

tcp: metrics: Avoid duplicate entries with the same destination-IP

2014-02-06T19:22:20Z

[ Upstream commit 77f99ad16a07aa062c2d30fae57b1fee456f6ef6 ] Because the tcp-metrics is an RCU-list, it may be that two soft-interrupts are inside __tcp_get_metrics() for the same destination-IP at the same time. If this destination-IP is not yet part of the tcp-metrics, both soft-interrupts will end up in tcpm_new and create a new entry for this IP. So, we will have two tcp-metrics with the same destination-IP in the list. This patch checks twice __tcp_get_metrics(). First without holding the lock, then while holding the lock. The second one is there to confirm that the entry has not been added by another soft-irq while waiting for the spin-lock. Fixes: 51c5d0c4b169b (tcp: Maintain dynamic metrics in local cache.) Signed-off-by: Christoph Paasch Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

net: rds: fix per-cpu helper usage

2014-02-06T19:22:20Z

[ Upstream commit c196403b79aa241c3fefb3ee5bb328aa7c5cc860 ] commit ae4b46e9d "net: rds: use this_cpu_* per-cpu helper" broke per-cpu handling for rds. chpfirst is the result of __this_cpu_read(), so it is an absolute pointer and not __percpu. Therefore, __this_cpu_write() should not operate on chpfirst, but rather on cache->percpu->first, just like __this_cpu_read() did before. Signed-off-byd Gerald Schaefer Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

net: avoid reference counter overflows on fib_rules in multicast forwarding

2014-02-06T19:22:20Z

[ Upstream commit 95f4a45de1a0f172b35451fc52283290adb21f6e ] Bob Falken reported that after 4G packets, multicast forwarding stopped working. This was because of a rule reference counter overflow which freed the rule as soon as the overflow happend. This patch solves this by adding the FIB_LOOKUP_NOREF flag to fib_rules_lookup calls. This is safe even from non-rcu locked sections as in this case the flag only implies not taking a reference to the rule, which we don't need at all. Rules only hold references to the namespace, which are guaranteed to be available during the call of the non-rcu protected function reg_vif_xmit because of the interface reference which itself holds a reference to the net namespace. Fixes: f0ad0860d01e47 ("ipv4: ipmr: support multiple tables") Fixes: d1db275dd3f6e4 ("ipv6: ip6mr: support multiple tables") Reported-by: Bob Falken Cc: Patrick McHardy Cc: Thomas Graf Cc: Julian Anastasov Cc: Eric Dumazet Signed-off-by: Hannes Frederic Sowa Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

ieee802154: Fix memory leak in ieee802154_add_iface()

2014-02-06T19:22:20Z

[ Upstream commit 267d29a69c6af39445f36102a832b25ed483f299 ] Fix a memory leak in the ieee802154_add_iface() error handling path. Detected by Coverity: CID 710490. Signed-off-by: Christian Engelmayer Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman