aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2013-11-28mac80211: drop spoofed packets in ad-hoc modeFelix Fietkau
commit 6329b8d917adc077caa60c2447385554130853a3 upstream. If an Ad-Hoc node receives packets with the Cell ID or its own MAC address as source address, it hits a WARN_ON in sta_info_insert_check() With many packets, this can massively spam the logs. One way that this can easily happen is through having Cisco APs in the area with rouge AP detection and countermeasures enabled. Such Cisco APs will regularly send fake beacons, disassoc and deauth packets that trigger these warnings. To fix this issue, drop such spoofed packets early in the rx path. Reported-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> [bwh: Backported to 3.2: use compare_ether_addr() instead of ether_addr_equal()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-11-28netfilter: nf_ct_sip: don't drop packets with offsets pointing outside the ↵Patrick McHardy
packet commit 3a7b21eaf4fb3c971bdb47a98f570550ddfe4471 upstream. Some Cisco phones create huge messages that are spread over multiple packets. After calculating the offset of the SIP body, it is validated to be within the packet and the packet is dropped otherwise. This breaks operation of these phones. Since connection tracking is supposed to be passive, just let those packets pass unmodified and untracked. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> [bwh: Backported to 3.2: there is no log message to delete] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-11-28inet: fix possible memory corruption with UDP_CORK and UFOHannes Frederic Sowa
[ This is a simplified -stable version of a set of upstream commits. ] This is a replacement patch only for stable which does fix the problems handled by the following two commits in -net: "ip_output: do skb ufo init for peeked non ufo skb as well" (e93b7d748be887cd7639b113ba7d7ef792a7efb9) "ip6_output: do skb ufo init for peeked non ufo skb as well" (c547dbf55d5f8cf615ccc0e7265e98db27d3fb8b) Three frames are written on a corked udp socket for which the output netdevice has UFO enabled. If the first and third frame are smaller than the mtu and the second one is bigger, we enqueue the second frame with skb_append_datato_frags without initializing the gso fields. This leads to the third frame appended regulary and thus constructing an invalid skb. This fixes the problem by always using skb_append_datato_frags as soon as the first frag got enqueued to the skb without marking the packet as SKB_GSO_UDP. The problem with only two frames for ipv6 was fixed by "ipv6: udp packets following an UFO enqueued packet need also be handled by UFO" (2811ebac2521ceac84f2bdae402455baa6a7fb47). Cc: Jiri Pirko <jiri@resnulli.us> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-11-28net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix raceDaniel Borkmann
[ Upstream commit 90c6bd34f884cd9cee21f1d152baf6c18bcac949 ] In the case of credentials passing in unix stream sockets (dgram sockets seem not affected), we get a rather sparse race after commit 16e5726 ("af_unix: dont send SCM_CREDENTIALS by default"). We have a stream server on receiver side that requests credential passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED on each spawned/accepted socket on server side to 1 first (as it's not inherited), it can happen that in the time between accept() and setsockopt() we get interrupted, the sender is being scheduled and continues with passing data to our receiver. At that time SO_PASSCRED is neither set on sender nor receiver side, hence in cmsg's SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534 (== overflow{u,g}id) instead of what we actually would like to see. On the sender side, here nc -U, the tests in maybe_add_creds() invoked through unix_stream_sendmsg() would fail, as at that exact time, as mentioned, the sender has neither SO_PASSCRED on his side nor sees it on the server side, and we have a valid 'other' socket in place. Thus, sender believes it would just look like a normal connection, not needing/requesting SO_PASSCRED at that time. As reverting 16e5726 would not be an option due to the significant performance regression reported when having creds always passed, one way/trade-off to prevent that would be to set SO_PASSCRED on the listener socket and allow inheriting these flags to the spawned socket on server side in accept(). It seems also logical to do so if we'd tell the listener socket to pass those flags onwards, and would fix the race. Before, strace: recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}}, msg_flags=0}, 0) = 5 After, strace: recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}}, msg_flags=0}, 0) = 5 Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-11-28sctp: Perform software checksum if packet has to be fragmented.Vlad Yasevich
[ Upstream commit d2dbbba77e95dff4b4f901fee236fef6d9552072 ] IP/IPv6 fragmentation knows how to compute only TCP/UDP checksum. This causes problems if SCTP packets has to be fragmented and ipsummed has been set to PARTIAL due to checksum offload support. This condition can happen when retransmitting after MTU discover, or when INIT or other control chunks are larger then MTU. Check for the rare fragmentation condition in SCTP and use software checksum calculation in this case. CC: Fan Du <fan.du@windriver.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-11-28sctp: Use software crc32 checksum when xfrm transform will happen.Fan Du
[ Upstream commit 27127a82561a2a3ed955ce207048e1b066a80a2a ] igb/ixgbe have hardware sctp checksum support, when this feature is enabled and also IPsec is armed to protect sctp traffic, ugly things happened as xfrm_output checks CHECKSUM_PARTIAL to do checksum operation(sum every thing up and pack the 16bits result in the checksum field). The result is fail establishment of sctp communication. Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-11-28l2tp: must disable bh before calling l2tp_xmit_skb()Eric Dumazet
[ Upstream commit 455cc32bf128e114455d11ad919321ab89a2c312 ] François Cachereul made a very nice bug report and suspected the bh_lock_sock() / bh_unlok_sock() pair used in l2tp_xmit_skb() from process context was not good. This problem was added by commit 6af88da14ee284aaad6e4326da09a89191ab6165 ("l2tp: Fix locking in l2tp_core.c"). l2tp_eth_dev_xmit() runs from BH context, so we must disable BH from other l2tp_xmit_skb() users. [ 452.060011] BUG: soft lockup - CPU#1 stuck for 23s! [accel-pppd:6662] [ 452.061757] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core pppoe pppox ppp_generic slhc ipv6 ext3 mbcache jbd virtio_balloon xfs exportfs dm_mod virtio_blk ata_generic virtio_net floppy ata_piix libata virtio_pci virtio_ring virtio [last unloaded: scsi_wait_scan] [ 452.064012] CPU 1 [ 452.080015] BUG: soft lockup - CPU#2 stuck for 23s! [accel-pppd:6643] [ 452.080015] CPU 2 [ 452.080015] [ 452.080015] Pid: 6643, comm: accel-pppd Not tainted 3.2.46.mini #1 Bochs Bochs [ 452.080015] RIP: 0010:[<ffffffff81059f6c>] [<ffffffff81059f6c>] do_raw_spin_lock+0x17/0x1f [ 452.080015] RSP: 0018:ffff88007125fc18 EFLAGS: 00000293 [ 452.080015] RAX: 000000000000aba9 RBX: ffffffff811d0703 RCX: 0000000000000000 [ 452.080015] RDX: 00000000000000ab RSI: ffff8800711f6896 RDI: ffff8800745c8110 [ 452.080015] RBP: ffff88007125fc18 R08: 0000000000000020 R09: 0000000000000000 [ 452.080015] R10: 0000000000000000 R11: 0000000000000280 R12: 0000000000000286 [ 452.080015] R13: 0000000000000020 R14: 0000000000000240 R15: 0000000000000000 [ 452.080015] FS: 00007fdc0cc24700(0000) GS:ffff8800b6f00000(0000) knlGS:0000000000000000 [ 452.080015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 452.080015] CR2: 00007fdb054899b8 CR3: 0000000074404000 CR4: 00000000000006a0 [ 452.080015] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 452.080015] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 452.080015] Process accel-pppd (pid: 6643, threadinfo ffff88007125e000, task ffff8800b27e6dd0) [ 452.080015] Stack: [ 452.080015] ffff88007125fc28 ffffffff81256559 ffff88007125fc98 ffffffffa01b2bd1 [ 452.080015] ffff88007125fc58 000000000000000c 00000000029490d0 0000009c71dbe25e [ 452.080015] 000000000000005c 000000080000000e 0000000000000000 ffff880071170600 [ 452.080015] Call Trace: [ 452.080015] [<ffffffff81256559>] _raw_spin_lock+0xe/0x10 [ 452.080015] [<ffffffffa01b2bd1>] l2tp_xmit_skb+0x189/0x4ac [l2tp_core] [ 452.080015] [<ffffffffa01c2d36>] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp] [ 452.080015] [<ffffffff811c7872>] __sock_sendmsg_nosec+0x22/0x24 [ 452.080015] [<ffffffff811c83bd>] sock_sendmsg+0xa1/0xb6 [ 452.080015] [<ffffffff81254e88>] ? __schedule+0x5c1/0x616 [ 452.080015] [<ffffffff8103c7c6>] ? __dequeue_signal+0xb7/0x10c [ 452.080015] [<ffffffff810bbd21>] ? fget_light+0x75/0x89 [ 452.080015] [<ffffffff811c8444>] ? sockfd_lookup_light+0x20/0x56 [ 452.080015] [<ffffffff811c9b34>] sys_sendto+0x10c/0x13b [ 452.080015] [<ffffffff8125cac2>] system_call_fastpath+0x16/0x1b [ 452.080015] Code: 81 48 89 e5 72 0c 31 c0 48 81 ff 45 66 25 81 0f 92 c0 5d c3 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 0f b6 d4 38 d0 74 06 f3 90 <8a> 07 eb f6 5d c3 90 90 55 48 89 e5 9c 58 0f 1f 44 00 00 5d c3 [ 452.080015] Call Trace: [ 452.080015] [<ffffffff81256559>] _raw_spin_lock+0xe/0x10 [ 452.080015] [<ffffffffa01b2bd1>] l2tp_xmit_skb+0x189/0x4ac [l2tp_core] [ 452.080015] [<ffffffffa01c2d36>] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp] [ 452.080015] [<ffffffff811c7872>] __sock_sendmsg_nosec+0x22/0x24 [ 452.080015] [<ffffffff811c83bd>] sock_sendmsg+0xa1/0xb6 [ 452.080015] [<ffffffff81254e88>] ? __schedule+0x5c1/0x616 [ 452.080015] [<ffffffff8103c7c6>] ? __dequeue_signal+0xb7/0x10c [ 452.080015] [<ffffffff810bbd21>] ? fget_light+0x75/0x89 [ 452.080015] [<ffffffff811c8444>] ? sockfd_lookup_light+0x20/0x56 [ 452.080015] [<ffffffff811c9b34>] sys_sendto+0x10c/0x13b [ 452.080015] [<ffffffff8125cac2>] system_call_fastpath+0x16/0x1b [ 452.064012] [ 452.064012] Pid: 6662, comm: accel-pppd Not tainted 3.2.46.mini #1 Bochs Bochs [ 452.064012] RIP: 0010:[<ffffffff81059f6e>] [<ffffffff81059f6e>] do_raw_spin_lock+0x19/0x1f [ 452.064012] RSP: 0018:ffff8800b6e83ba0 EFLAGS: 00000297 [ 452.064012] RAX: 000000000000aaa9 RBX: ffff8800b6e83b40 RCX: 0000000000000002 [ 452.064012] RDX: 00000000000000aa RSI: 000000000000000a RDI: ffff8800745c8110 [ 452.064012] RBP: ffff8800b6e83ba0 R08: 000000000000c802 R09: 000000000000001c [ 452.064012] R10: ffff880071096c4e R11: 0000000000000006 R12: ffff8800b6e83b18 [ 452.064012] R13: ffffffff8125d51e R14: ffff8800b6e83ba0 R15: ffff880072a589c0 [ 452.064012] FS: 00007fdc0b81e700(0000) GS:ffff8800b6e80000(0000) knlGS:0000000000000000 [ 452.064012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 452.064012] CR2: 0000000000625208 CR3: 0000000074404000 CR4: 00000000000006a0 [ 452.064012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 452.064012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 452.064012] Process accel-pppd (pid: 6662, threadinfo ffff88007129a000, task ffff8800744f7410) [ 452.064012] Stack: [ 452.064012] ffff8800b6e83bb0 ffffffff81256559 ffff8800b6e83bc0 ffffffff8121c64a [ 452.064012] ffff8800b6e83bf0 ffffffff8121ec7a ffff880072a589c0 ffff880071096c62 [ 452.064012] 0000000000000011 ffffffff81430024 ffff8800b6e83c80 ffffffff8121f276 [ 452.064012] Call Trace: [ 452.064012] <IRQ> [ 452.064012] [<ffffffff81256559>] _raw_spin_lock+0xe/0x10 [ 452.064012] [<ffffffff8121c64a>] spin_lock+0x9/0xb [ 452.064012] [<ffffffff8121ec7a>] udp_queue_rcv_skb+0x186/0x269 [ 452.064012] [<ffffffff8121f276>] __udp4_lib_rcv+0x297/0x4ae [ 452.064012] [<ffffffff8121c178>] ? raw_rcv+0xe9/0xf0 [ 452.064012] [<ffffffff8121f4a7>] udp_rcv+0x1a/0x1c [ 452.064012] [<ffffffff811fe385>] ip_local_deliver_finish+0x12b/0x1a5 [ 452.064012] [<ffffffff811fe54e>] ip_local_deliver+0x53/0x84 [ 452.064012] [<ffffffff811fe1d0>] ip_rcv_finish+0x2bc/0x2f3 [ 452.064012] [<ffffffff811fe78f>] ip_rcv+0x210/0x269 [ 452.064012] [<ffffffff8101911e>] ? kvm_clock_get_cycles+0x9/0xb [ 452.064012] [<ffffffff811d88cd>] __netif_receive_skb+0x3a5/0x3f7 [ 452.064012] [<ffffffff811d8eba>] netif_receive_skb+0x57/0x5e [ 452.064012] [<ffffffff811cf30f>] ? __netdev_alloc_skb+0x1f/0x3b [ 452.064012] [<ffffffffa0049126>] virtnet_poll+0x4ba/0x5a4 [virtio_net] [ 452.064012] [<ffffffff811d9417>] net_rx_action+0x73/0x184 [ 452.064012] [<ffffffffa01b2cc2>] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core] [ 452.064012] [<ffffffff810343b9>] __do_softirq+0xc3/0x1a8 [ 452.064012] [<ffffffff81013b56>] ? ack_APIC_irq+0x10/0x12 [ 452.064012] [<ffffffff81256559>] ? _raw_spin_lock+0xe/0x10 [ 452.064012] [<ffffffff8125e0ac>] call_softirq+0x1c/0x26 [ 452.064012] [<ffffffff81003587>] do_softirq+0x45/0x82 [ 452.064012] [<ffffffff81034667>] irq_exit+0x42/0x9c [ 452.064012] [<ffffffff8125e146>] do_IRQ+0x8e/0xa5 [ 452.064012] [<ffffffff8125676e>] common_interrupt+0x6e/0x6e [ 452.064012] <EOI> [ 452.064012] [<ffffffff810b82a1>] ? kfree+0x8a/0xa3 [ 452.064012] [<ffffffffa01b2cc2>] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core] [ 452.064012] [<ffffffffa01b2c25>] ? l2tp_xmit_skb+0x1dd/0x4ac [l2tp_core] [ 452.064012] [<ffffffffa01c2d36>] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp] [ 452.064012] [<ffffffff811c7872>] __sock_sendmsg_nosec+0x22/0x24 [ 452.064012] [<ffffffff811c83bd>] sock_sendmsg+0xa1/0xb6 [ 452.064012] [<ffffffff81254e88>] ? __schedule+0x5c1/0x616 [ 452.064012] [<ffffffff8103c7c6>] ? __dequeue_signal+0xb7/0x10c [ 452.064012] [<ffffffff810bbd21>] ? fget_light+0x75/0x89 [ 452.064012] [<ffffffff811c8444>] ? sockfd_lookup_light+0x20/0x56 [ 452.064012] [<ffffffff811c9b34>] sys_sendto+0x10c/0x13b [ 452.064012] [<ffffffff8125cac2>] system_call_fastpath+0x16/0x1b [ 452.064012] Code: 89 e5 72 0c 31 c0 48 81 ff 45 66 25 81 0f 92 c0 5d c3 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 0f b6 d4 38 d0 74 06 f3 90 8a 07 <eb> f6 5d c3 90 90 55 48 89 e5 9c 58 0f 1f 44 00 00 5d c3 55 48 [ 452.064012] Call Trace: [ 452.064012] <IRQ> [<ffffffff81256559>] _raw_spin_lock+0xe/0x10 [ 452.064012] [<ffffffff8121c64a>] spin_lock+0x9/0xb [ 452.064012] [<ffffffff8121ec7a>] udp_queue_rcv_skb+0x186/0x269 [ 452.064012] [<ffffffff8121f276>] __udp4_lib_rcv+0x297/0x4ae [ 452.064012] [<ffffffff8121c178>] ? raw_rcv+0xe9/0xf0 [ 452.064012] [<ffffffff8121f4a7>] udp_rcv+0x1a/0x1c [ 452.064012] [<ffffffff811fe385>] ip_local_deliver_finish+0x12b/0x1a5 [ 452.064012] [<ffffffff811fe54e>] ip_local_deliver+0x53/0x84 [ 452.064012] [<ffffffff811fe1d0>] ip_rcv_finish+0x2bc/0x2f3 [ 452.064012] [<ffffffff811fe78f>] ip_rcv+0x210/0x269 [ 452.064012] [<ffffffff8101911e>] ? kvm_clock_get_cycles+0x9/0xb [ 452.064012] [<ffffffff811d88cd>] __netif_receive_skb+0x3a5/0x3f7 [ 452.064012] [<ffffffff811d8eba>] netif_receive_skb+0x57/0x5e [ 452.064012] [<ffffffff811cf30f>] ? __netdev_alloc_skb+0x1f/0x3b [ 452.064012] [<ffffffffa0049126>] virtnet_poll+0x4ba/0x5a4 [virtio_net] [ 452.064012] [<ffffffff811d9417>] net_rx_action+0x73/0x184 [ 452.064012] [<ffffffffa01b2cc2>] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core] [ 452.064012] [<ffffffff810343b9>] __do_softirq+0xc3/0x1a8 [ 452.064012] [<ffffffff81013b56>] ? ack_APIC_irq+0x10/0x12 [ 452.064012] [<ffffffff81256559>] ? _raw_spin_lock+0xe/0x10 [ 452.064012] [<ffffffff8125e0ac>] call_softirq+0x1c/0x26 [ 452.064012] [<ffffffff81003587>] do_softirq+0x45/0x82 [ 452.064012] [<ffffffff81034667>] irq_exit+0x42/0x9c [ 452.064012] [<ffffffff8125e146>] do_IRQ+0x8e/0xa5 [ 452.064012] [<ffffffff8125676e>] common_interrupt+0x6e/0x6e [ 452.064012] <EOI> [<ffffffff810b82a1>] ? kfree+0x8a/0xa3 [ 452.064012] [<ffffffffa01b2cc2>] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core] [ 452.064012] [<ffffffffa01b2c25>] ? l2tp_xmit_skb+0x1dd/0x4ac [l2tp_core] [ 452.064012] [<ffffffffa01c2d36>] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp] [ 452.064012] [<ffffffff811c7872>] __sock_sendmsg_nosec+0x22/0x24 [ 452.064012] [<ffffffff811c83bd>] sock_sendmsg+0xa1/0xb6 [ 452.064012] [<ffffffff81254e88>] ? __schedule+0x5c1/0x616 [ 452.064012] [<ffffffff8103c7c6>] ? __dequeue_signal+0xb7/0x10c [ 452.064012] [<ffffffff810bbd21>] ? fget_light+0x75/0x89 [ 452.064012] [<ffffffff811c8444>] ? sockfd_lookup_light+0x20/0x56 [ 452.064012] [<ffffffff811c9b34>] sys_sendto+0x10c/0x13b [ 452.064012] [<ffffffff8125cac2>] system_call_fastpath+0x16/0x1b Reported-by: François Cachereul <f.cachereul@alphalink.fr> Tested-by: François Cachereul <f.cachereul@alphalink.fr> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-11-28net: vlan: fix nlmsg size calculation in vlan_get_size()Marc Kleine-Budde
[ Upstream commit c33a39c575068c2ea9bffb22fd6de2df19c74b89 ] This patch fixes the calculation of the nlmsg size, by adding the missing nla_total_size(). Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-11-28ipv6: restrict neighbor entry creation to output flowMarcelo Ricardo Leitner
This patch is based on 3.2.y branch, the one used by reporter. Please let me know if it should be different. Thanks. The patch which introduced the regression was applied on stables: 3.0.64 3.4.31 3.7.8 3.2.39 The patch which introduced the regression was for stable trees only. ---8<--- Commit 0d6a77079c475033cb622c07c5a880b392ef664e "ipv6: do not create neighbor entries for local delivery" introduced a regression on which routes to local delivery would not work anymore. Like this: $ ip -6 route add local 2001::/64 dev lo $ ping6 -c1 2001::9 PING 2001::9(2001::9) 56 data bytes ping: sendmsg: Invalid argument As this is a local delivery, that commit would not allow the creation of a neighbor entry and thus the packet cannot be sent. But as TPROXY scenario actually needs to avoid the neighbor entry creation only for input flow, this patch now limits previous patch to input flow, keeping output as before that patch. Reported-by: Debabrata Banerjee <dbavatar@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-11-28ipv4: fix ineffective source address selectionJiri Benc
[ Upstream commit 0a7e22609067ff524fc7bbd45c6951dd08561667 ] When sending out multicast messages, the source address in inet->mc_addr is ignored and rewritten by an autoselected one. This is caused by a typo in commit 813b3b5db831 ("ipv4: Use caller's on-stack flowi as-is in output route lookups"). Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-11-28net: heap overflow in __audit_sockaddr()Dan Carpenter
[ Upstream commit 1661bf364ae9c506bc8795fef70d1532931be1e8 ] We need to cap ->msg_namelen or it leads to a buffer overflow when we to the memcpy() in __audit_sockaddr(). It requires CAP_AUDIT_CONTROL to exploit this bug. The call tree is: ___sys_recvmsg() move_addr_to_user() audit_sockaddr() __audit_sockaddr() Reported-by: Jüri Aedla <juri.aedla@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-11-28net: do not call sock_put() on TIMEWAIT socketsEric Dumazet
[ Upstream commit 80ad1d61e72d626e30ebe8529a0455e660ca4693 ] commit 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls") incorrectly used sock_put() on TIMEWAIT sockets. We should instead use inet_twsk_put() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-11-28tcp: do not forget FIN in tcp_shifted_skb()Eric Dumazet
[ Upstream commit 5e8a402f831dbe7ee831340a91439e46f0d38acd ] Yuchung found following problem : There are bugs in the SACK processing code, merging part in tcp_shift_skb_data(), that incorrectly resets or ignores the sacked skbs FIN flag. When a receiver first SACK the FIN sequence, and later throw away ofo queue (e.g., sack-reneging), the sender will stop retransmitting the FIN flag, and hangs forever. Following packetdrill test can be used to reproduce the bug. $ cat sack-merge-bug.pkt `sysctl -q net.ipv4.tcp_fack=0` // Establish a connection and send 10 MSS. 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +.000 bind(3, ..., ...) = 0 +.000 listen(3, 1) = 0 +.050 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +.000 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> +.001 < . 1:1(0) ack 1 win 1024 +.000 accept(3, ..., ...) = 4 +.100 write(4, ..., 12000) = 12000 +.000 shutdown(4, SHUT_WR) = 0 +.000 > . 1:10001(10000) ack 1 +.050 < . 1:1(0) ack 2001 win 257 +.000 > FP. 10001:12001(2000) ack 1 +.050 < . 1:1(0) ack 2001 win 257 <sack 10001:11001,nop,nop> +.050 < . 1:1(0) ack 2001 win 257 <sack 10001:12002,nop,nop> // SACK reneg +.050 < . 1:1(0) ack 12001 win 257 +0 %{ print "unacked: ",tcpi_unacked }% +5 %{ print "" }% First, a typo inverted left/right of one OR operation, then code forgot to advance end_seq if the merged skb carried FIN. Bug was added in 2.6.29 by commit 832d11c5cd076ab ("tcp: Try to restore large SKBs while SACK processing") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-11-28tcp: must unclone packets before mangling themEric Dumazet
[ Upstream commit c52e2421f7368fd36cbe330d2cf41b10452e39a9 ] TCP stack should make sure it owns skbs before mangling them. We had various crashes using bnx2x, and it turned out gso_size was cleared right before bnx2x driver was populating TC descriptor of the _previous_ packet send. TCP stack can sometime retransmit packets that are still in Qdisc. Of course we could make bnx2x driver more robust (using ACCESS_ONCE(shinfo->gso_size) for example), but the bug is TCP stack. We have identified two points where skb_unclone() was needed. This patch adds a WARN_ON_ONCE() to warn us if we missed another fix of this kind. Kudos to Neal for finding the root cause of this bug. Its visible using small MSS. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26Revert "sctp: fix call to SCTP_CMD_PROCESS_SACK in sctp_cmd_interpreter()"Ben Hutchings
This reverts commit de77b7955c3985ca95f64af3cb10557eb17eacee, which was commit f6e80abeab928b7c47cc1fbf53df13b4398a2bec upstream. This fix was only appropriate for Linux 3.7 onward, and introduced a regression when applied to earlier versions. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_putSalam Noureddine
[ Upstream commit 9260d3e1013701aa814d10c8fc6a9f92bd17d643 ] It is possible for the timer handlers to run after the call to ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the handler function in order to do proper cleanup when the refcnt reaches 0. Otherwise, the refcnt can reach zero without the inet6_dev being destroyed and we end up leaking a reference to the net_device and see messages like the following, unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Tested on linux-3.4.43. Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_putSalam Noureddine
[ Upstream commit e2401654dd0f5f3fb7a8d80dad9554d73d7ca394 ] It is possible for the timer handlers to run after the call to ip_mc_down so use in_dev_put instead of __in_dev_put in the handler function in order to do proper cleanup when the refcnt reaches 0. Otherwise, the refcnt can reach zero without the in_device being destroyed and we end up leaking a reference to the net_device and see messages like the following, unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Tested on linux-3.4.43. Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ipv6: udp packets following an UFO enqueued packet need also be handled by UFOHannes Frederic Sowa
[ Upstream commit 2811ebac2521ceac84f2bdae402455baa6a7fb47 ] In the following scenario the socket is corked: If the first UDP packet is larger then the mtu we try to append it to the write queue via ip6_ufo_append_data. A following packet, which is smaller than the mtu would be appended to the already queued up gso-skb via plain ip6_append_data. This causes random memory corruptions. In ip6_ufo_append_data we also have to be careful to not queue up the same skb multiple times. So setup the gso frame only when no first skb is available. This also fixes a shortcoming where we add the current packet's length to cork->length but return early because of a packet > mtu with dontfrag set (instead of sutracting it again). Found with trinity. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ip: generate unique IP identificator if local fragmentation is allowedAnsis Atteka
[ Upstream commit 703133de331a7a7df47f31fb9de51dc6f68a9de8 ] If local fragmentation is allowed, then ip_select_ident() and ip_select_ident_more() need to generate unique IDs to ensure correct defragmentation on the peer. For example, if IPsec (tunnel mode) has to encrypt large skbs that have local_df bit set, then all IP fragments that belonged to different ESP datagrams would have used the same identificator. If one of these IP fragments would get lost or reordered, then peer could possibly stitch together wrong IP fragments that did not belong to the same datagram. This would lead to a packet loss or data corruption. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26resubmit bridge: fix message_age_timer calculationChris Healy
[ Upstream commit 9a0620133ccce9dd35c00a96405c8d80938c2cc0 ] This changes the message_age_timer calculation to use the BPDU's max age as opposed to the local bridge's max age. This is in accordance with section 8.6.2.3.2 Step 2 of the 802.1D-1998 sprecification. With the current implementation, when running with very large bridge diameters, convergance will not always occur even if a root bridge is configured to have a longer max age. Tested successfully on bridge diameters of ~200. Signed-off-by: Chris Healy <cphealy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26net: sctp: fix ipv6 ipsec encryption bug in sctp_v6_xmitDaniel Borkmann
[ Upstream commit 95ee62083cb6453e056562d91f597552021e6ae7 ] Alan Chester reported an issue with IPv6 on SCTP that IPsec traffic is not being encrypted, whereas on IPv4 it is. Setting up an AH + ESP transport does not seem to have the desired effect: SCTP + IPv4: 22:14:20.809645 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 116) 192.168.0.2 > 192.168.0.5: AH(spi=0x00000042,sumlen=16,seq=0x1): ESP(spi=0x00000044,seq=0x1), length 72 22:14:20.813270 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 340) 192.168.0.5 > 192.168.0.2: AH(spi=0x00000043,sumlen=16,seq=0x1): SCTP + IPv6: 22:31:19.215029 IP6 (class 0x02, hlim 64, next-header SCTP (132) payload length: 364) fe80::222:15ff:fe87:7fc.3333 > fe80::92e6:baff:fe0d:5a54.36767: sctp 1) [INIT ACK] [init tag: 747759530] [rwnd: 62464] [OS: 10] [MIS: 10] Moreover, Alan says: This problem was seen with both Racoon and Racoon2. Other people have seen this with OpenSwan. When IPsec is configured to encrypt all upper layer protocols the SCTP connection does not initialize. After using Wireshark to follow packets, this is because the SCTP packet leaves Box A unencrypted and Box B believes all upper layer protocols are to be encrypted so it drops this packet, causing the SCTP connection to fail to initialize. When IPsec is configured to encrypt just SCTP, the SCTP packets are observed unencrypted. In fact, using `socat sctp6-listen:3333 -` on one end and transferring "plaintext" string on the other end, results in cleartext on the wire where SCTP eventually does not report any errors, thus in the latter case that Alan reports, the non-paranoid user might think he's communicating over an encrypted transport on SCTP although he's not (tcpdump ... -X): ... 0x0030: 5d70 8e1a 0003 001a 177d eb6c 0000 0000 ]p.......}.l.... 0x0040: 0000 0000 706c 6169 6e74 6578 740a 0000 ....plaintext... Only in /proc/net/xfrm_stat we can see XfrmInTmplMismatch increasing on the receiver side. Initial follow-up analysis from Alan's bug report was done by Alexey Dobriyan. Also thanks to Vlad Yasevich for feedback on this. SCTP has its own implementation of sctp_v6_xmit() not calling inet6_csk_xmit(). This has the implication that it probably never really got updated along with changes in inet6_csk_xmit() and therefore does not seem to invoke xfrm handlers. SCTP's IPv4 xmit however, properly calls ip_queue_xmit() to do the work. Since a call to inet6_csk_xmit() would solve this problem, but result in unecessary route lookups, let us just use the cached flowi6 instead that we got through sctp_v6_get_dst(). Since all SCTP packets are being sent through sctp_packet_transmit(), we do the route lookup / flow caching in sctp_transport_route(), hold it in tp->dst and skb_dst_set() right after that. If we would alter fl6->daddr in sctp_v6_xmit() to np->opt->srcrt, we possibly could run into the same effect of not having xfrm layer pick it up, hence, use fl6_update_dst() in sctp_v6_get_dst() instead to get the correct source routed dst entry, which we assign to the skb. Also source address routing example from 625034113 ("sctp: fix sctp to work with ipv6 source address routing") still works with this patch! Nevertheless, in RFC5095 it is actually 'recommended' to not use that anyway due to traffic amplification [1]. So it seems we're not supposed to do that anyway in sctp_v6_xmit(). Moreover, if we overwrite the flow destination here, the lower IPv6 layer will be unable to put the correct destination address into IP header, as routing header is added in ipv6_push_nfrag_opts() but then probably with wrong final destination. Things aside, result of this patch is that we do not have any XfrmInTmplMismatch increase plus on the wire with this patch it now looks like: SCTP + IPv6: 08:17:47.074080 IP6 2620:52:0:102f:7a2b:cbff:fe27:1b0a > 2620:52:0:102f:213:72ff:fe32:7eba: AH(spi=0x00005fb4,seq=0x1): ESP(spi=0x00005fb5,seq=0x1), length 72 08:17:47.074264 IP6 2620:52:0:102f:213:72ff:fe32:7eba > 2620:52:0:102f:7a2b:cbff:fe27:1b0a: AH(spi=0x00003d54,seq=0x1): ESP(spi=0x00003d55,seq=0x1), length 296 This fixes Kernel Bugzilla 24412. This security issue seems to be present since 2.6.18 kernels. Lets just hope some big passive adversary in the wild didn't have its fun with that. lksctp-tools IPv6 regression test suite passes as well with this patch. [1] http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf Reported-by: Alan Chester <alan.chester@tekelec.com> Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26netpoll: fix NULL pointer dereference in netpoll_cleanupNikolay Aleksandrov
[ Upstream commit d0fe8c888b1fd1a2f84b9962cabcb98a70988aec ] I've been hitting a NULL ptr deref while using netconsole because the np->dev check and the pointer manipulation in netpoll_cleanup are done without rtnl and the following sequence happens when having a netconsole over a vlan and we remove the vlan while disabling the netconsole: CPU 1 CPU2 removes vlan and calls the notifier enters store_enabled(), calls netdev_cleanup which checks np->dev and then waits for rtnl executes the netconsole netdev release notifier making np->dev == NULL and releases rtnl continues to dereference a member of np->dev which at this point is == NULL Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26net: sctp: fix smatch warning in sctp_send_asconf_del_ipDaniel Borkmann
[ Upstream commit 88362ad8f9a6cea787420b57cc27ccacef000dbe ] This was originally reported in [1] and posted by Neil Horman [2], he said: Fix up a missed null pointer check in the asconf code. If we don't find a local address, but we pass in an address length of more than 1, we may dereference a NULL laddr pointer. Currently this can't happen, as the only users of the function pass in the value 1 as the addrcnt parameter, but its not hot path, and it doesn't hurt to check for NULL should that ever be the case. The callpath from sctp_asconf_mgmt() looks okay. But this could be triggered from sctp_setsockopt_bindx() call with SCTP_BINDX_REM_ADDR and addrcnt > 1 while passing all possible addresses from the bind list to SCTP_BINDX_REM_ADDR so that we do *not* find a single address in the association's bind address list that is not in the packed array of addresses. If this happens when we have an established association with ASCONF-capable peers, then we could get a NULL pointer dereference as we only check for laddr == NULL && addrcnt == 1 and call later sctp_make_asconf_update_ip() with NULL laddr. BUT: this actually won't happen as sctp_bindx_rem() will catch such a case and return with an error earlier. As this is incredably unintuitive and error prone, add a check to catch at least future bugs here. As Neil says, its not hot path. Introduced by 8a07eb0a5 ("sctp: Add ASCONF operation on the single-homed host"). [1] http://www.spinics.net/lists/linux-sctp/msg02132.html [2] http://www.spinics.net/lists/linux-sctp/msg02133.html Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Michio Honda <micchie@sfc.wide.ad.jp> Acked-By: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26caif: Add missing braces to multiline if in cfctrl_linkup_requestDave Jones
[ Upstream commit 0c1db731bfcf3a9fd6c58132134f8b0f423552f0 ] The indentation here implies this was meant to be a multi-line if. Introduced several years back in commit c85c2951d4da1236e32f1858db418221e624aba5 ("caif: Handle dev_queue_xmit errors.") Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26libceph: unregister request in __map_request failed and nofail == falsemajianpeng
commit 73d9f7eef3d98c3920e144797cc1894c6b005a1e upstream. For nofail == false request, if __map_request failed, the caller does cleanup work, like releasing the relative pages. It doesn't make any sense to retry this request. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Reviewed-by: Sage Weil <sage@inktank.com> [bwh: Backported to 3.2: adjust indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26inetpeer: fix a race in inetpeer_gc_worker()Eric Dumazet
[ Upstream commit 55432d2b543a4b6dfae54f5c432a566877a85d90 ] commit 5faa5df1fa2024 (inetpeer: Invalidate the inetpeer tree along with the routing cache) added a race : Before freeing an inetpeer, we must respect a RCU grace period, and make sure no user will attempt to increase refcnt. inetpeer_invalidate_tree() waits for a RCU grace period before inserting inetpeer tree into gc_list and waking the worker. At that time, no concurrent lookup can find a inetpeer in this tree. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26inetpeer: Invalidate the inetpeer tree along with the routing cacheSteffen Klassert
[ Upstream commit 5faa5df1fa2024bd750089ff21dcc4191798263d ] We initialize the routing metrics with the values cached on the inetpeer in rt_init_metrics(). So if we have the metrics cached on the inetpeer, we ignore the user configured fib_metrics. To fix this issue, we replace the old tree with a fresh initialized inet_peer_base. The old tree is removed later with a delayed work queue. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26tipc: fix lockdep warning during bearer initializationYing Xue
[ Upstream commit 4225a398c1352a7a5c14dc07277cb5cc4473983b ] When the lockdep validator is enabled, it will report the below warning when we enable a TIPC bearer: [ INFO: possible irq lock inversion dependency detected ] --------------------------------------------------------- Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(ptype_lock); local_irq_disable(); lock(tipc_net_lock); lock(ptype_lock); <Interrupt> lock(tipc_net_lock); *** DEADLOCK *** the shortest dependencies between 2nd lock and 1st lock: -> (ptype_lock){+.+...} ops: 10 { [...] SOFTIRQ-ON-W at: [<c1089418>] __lock_acquire+0x528/0x13e0 [<c108a360>] lock_acquire+0x90/0x100 [<c1553c38>] _raw_spin_lock+0x38/0x50 [<c14651ca>] dev_add_pack+0x3a/0x60 [<c182da75>] arp_init+0x1a/0x48 [<c182dce5>] inet_init+0x181/0x27e [<c1001114>] do_one_initcall+0x34/0x170 [<c17f7329>] kernel_init+0x110/0x1b2 [<c155b6a2>] kernel_thread_helper+0x6/0x10 [...] ... key at: [<c17e4b10>] ptype_lock+0x10/0x20 ... acquired at: [<c108a360>] lock_acquire+0x90/0x100 [<c1553c38>] _raw_spin_lock+0x38/0x50 [<c14651ca>] dev_add_pack+0x3a/0x60 [<c8bc18d2>] enable_bearer+0xf2/0x140 [tipc] [<c8bb283a>] tipc_enable_bearer+0x1ba/0x450 [tipc] [<c8bb3a04>] tipc_cfg_do_cmd+0x5c4/0x830 [tipc] [<c8bbc032>] handle_cmd+0x42/0xd0 [tipc] [<c148e802>] genl_rcv_msg+0x232/0x280 [<c148d3f6>] netlink_rcv_skb+0x86/0xb0 [<c148e5bc>] genl_rcv+0x1c/0x30 [<c148d144>] netlink_unicast+0x174/0x1f0 [<c148ddab>] netlink_sendmsg+0x1eb/0x2d0 [<c1456bc1>] sock_aio_write+0x161/0x170 [<c1135a7c>] do_sync_write+0xac/0xf0 [<c11360f6>] vfs_write+0x156/0x170 [<c11361e2>] sys_write+0x42/0x70 [<c155b0df>] sysenter_do_call+0x12/0x38 [...] } -> (tipc_net_lock){+..-..} ops: 4 { [...] IN-SOFTIRQ-R at: [<c108953a>] __lock_acquire+0x64a/0x13e0 [<c108a360>] lock_acquire+0x90/0x100 [<c15541cd>] _raw_read_lock_bh+0x3d/0x50 [<c8bb874d>] tipc_recv_msg+0x1d/0x830 [tipc] [<c8bc195f>] recv_msg+0x3f/0x50 [tipc] [<c146a5fa>] __netif_receive_skb+0x22a/0x590 [<c146ab0b>] netif_receive_skb+0x2b/0xf0 [<c13c43d2>] pcnet32_poll+0x292/0x780 [<c146b00a>] net_rx_action+0xfa/0x1e0 [<c103a4be>] __do_softirq+0xae/0x1e0 [...] } >From the log, we can see three different call chains between CPU0 and CPU1: Time 0 on CPU0: kernel_init()->inet_init()->dev_add_pack() At time 0, the ptype_lock is held by CPU0 in dev_add_pack(); Time 1 on CPU1: tipc_enable_bearer()->enable_bearer()->dev_add_pack() At time 1, tipc_enable_bearer() first holds tipc_net_lock, and then wants to take ptype_lock to register TIPC protocol handler into the networking stack. But the ptype_lock has been taken by dev_add_pack() on CPU0, so at this time the dev_add_pack() running on CPU1 has to be busy looping. Time 2 on CPU0: netif_receive_skb()->recv_msg()->tipc_recv_msg() At time 2, an incoming TIPC packet arrives at CPU0, hence tipc_recv_msg() will be invoked. In tipc_recv_msg(), it first wants to hold tipc_net_lock. At the moment, below scenario happens: On CPU0, below is our sequence of taking locks: lock(ptype_lock)->lock(tipc_net_lock) On CPU1, our sequence of taking locks looks like: lock(tipc_net_lock)->lock(ptype_lock) Obviously deadlock may happen in this case. But please note the deadlock possibly doesn't occur at all when the first TIPC bearer is enabled. Before enable_bearer() -- running on CPU1 does not hold ptype_lock, so the TIPC receive handler (i.e. recv_msg()) is not registered successfully via dev_add_pack(), so the tipc_recv_msg() cannot be called by recv_msg() even if a TIPC message comes to CPU0. But when the second TIPC bearer is registered, the deadlock can perhaps really happen. To fix it, we will push the work of registering TIPC protocol handler into workqueue context. After the change, both paths taking ptype_lock are always in process contexts, thus, the deadlock should never occur. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ICMPv6: treat dest unreachable codes 5 and 6 as EACCES, not EPROTOJiri Bohac
[ Upstream commit 61e76b178dbe7145e8d6afa84bb4ccea71918994 ] RFC 4443 has defined two additional codes for ICMPv6 type 1 (destination unreachable) messages: 5 - Source address failed ingress/egress policy 6 - Reject route to destination Now they are treated as protocol error and icmpv6_err_convert() converts them to EPROTO. RFC 4443 says: "Codes 5 and 6 are more informative subsets of code 1." Treat codes 5 and 6 as code 1 (EACCES) Btw, connect() returning -EPROTO confuses firefox, so that fallback to other/IPv4 addresses does not work: https://bugzilla.mozilla.org/show_bug.cgi?id=910773 Signed-off-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delayDaniel Borkmann
[ Upstream commit 2d98c29b6fb3de44d9eaa73c09f9cf7209346383 ] While looking into MLDv1/v2 code, I noticed that bridging code does not convert it's max delay into jiffies for MLDv2 messages as we do in core IPv6' multicast code. RFC3810, 5.1.3. Maximum Response Code says: The Maximum Response Code field specifies the maximum time allowed before sending a responding Report. The actual time allowed, called the Maximum Response Delay, is represented in units of milliseconds, and is derived from the Maximum Response Code as follows: [...] As we update timers that work with jiffies, we need to convert it. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Linus Lüssing <linus.luessing@web.de> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ipv6: Don't depend on per socket memory for neighbour discovery messagesThomas Graf
[ Upstream commit 25a6e6b84fba601eff7c28d30da8ad7cfbef0d43 ] Allocating skbs when sending out neighbour discovery messages currently uses sock_alloc_send_skb() based on a per net namespace socket and thus share a socket wmem buffer space. If a netdevice is temporarily unable to transmit due to carrier loss or for other reasons, the queued up ndisc messages will cosnume all of the wmem space and will thus prevent from any more skbs to be allocated even for netdevices that are able to transmit packets. The number of neighbour discovery messages sent is very limited, use of alloc_skb() bypasses the socket wmem buffer size enforcement while the manual call to skb_set_owner_w() maintains the socket reference needed for the IPv6 output path. This patch has orginally been posted by Eric Dumazet in a modified form. Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Fabio Estevam <festevam@gmail.com> Tested-by: Fabio Estevam <fabio.estevam@freescale.com> Tested-by: Stephen Warren <swarren@nvidia.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ipv6: drop packets with multiple fragmentation headersHannes Frederic Sowa
[ Upstream commit f46078cfcd77fa5165bf849f5e568a7ac5fa569c ] It is not allowed for an ipv6 packet to contain multiple fragmentation headers. So discard packets which were already reassembled by fragmentation logic and send back a parameter problem icmp. The updates for RFC 6980 will come in later, I have to do a bit more research here. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ipv6: remove max_addresses check from ipv6_create_tempaddrHannes Frederic Sowa
[ Upstream commit 4b08a8f1bd8cb4541c93ec170027b4d0782dab52 ] Because of the max_addresses check attackers were able to disable privacy extensions on an interface by creating enough autoconfigured addresses: <http://seclists.org/oss-sec/2012/q4/292> But the check is not actually needed: max_addresses protects the kernel to install too many ipv6 addresses on an interface and guards addrconf_prefix_rcv to install further addresses as soon as this limit is reached. We only generate temporary addresses in direct response of a new address showing up. As soon as we filled up the maximum number of addresses of an interface, we stop installing more addresses and thus also stop generating more temp addresses. Even if the attacker tries to generate a lot of temporary addresses by announcing a prefix and removing it again (lifetime == 0) we won't install more temp addresses, because the temporary addresses do count to the maximum number of addresses, thus we would stop installing new autoconfigured addresses when the limit is reached. This patch fixes CVE-2013-0343 (but other layer-2 attacks are still possible). Thanks to Ding Tianhong to bring this topic up again. Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: George Kargiotakis <kargig@void.gr> Cc: P J P <ppandit@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ipv6: don't stop backtracking in fib6_lookup_1 if subtree does not matchHannes Frederic Sowa
[ Upstream commit 3e3be275851bc6fc90bfdcd732cd95563acd982b ] In case a subtree did not match we currently stop backtracking and return NULL (root table from fib_lookup). This could yield in invalid routing table lookups when using subtrees. Instead continue to backtrack until a valid subtree or node is found and return this match. Also remove unneeded NULL check. Reported-by: Teco Boot <teco@inf-net.nl> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: David Lamparter <equinox@diac24.net> Cc: <boutier@pps.univ-paris-diderot.fr> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26