aboutsummaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)Author
2014-01-03inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu ↵Hannes Frederic Sowa
functions [ Upstream commit 85fbaa75037d0b6b786ff18658ddf0b4014ce2a4 ] Commit bceaa90240b6019ed73b49965eac7d167610be69 ("inet: prevent leakage of uninitialized memory to user in recv syscalls") conditionally updated addr_len if the msg_name is written to. The recv_error and rxpmtu functions relied on the recvmsg functions to set up addr_len before. As this does not happen any more we have to pass addr_len to those functions as well and set it to the size of the corresponding sockaddr length. This broke traceroute and such. Fixes: bceaa90240b6 ("inet: prevent leakage of uninitialized memory to user in recv syscalls") Reported-by: Brad Spengler <spender@grsecurity.net> Reported-by: Tom Labanowski Cc: mpb <mpb.mail@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-11-28net: fix cipso packet validation when !NETLABELSeif Mazareeb
[ Upstream commit f2e5ddcc0d12f9c4c7b254358ad245c9dddce13b ] When CONFIG_NETLABEL is disabled, the cipso_v4_validate() function could loop forever in the main loop if opt[opt_iter +1] == 0, this will causing a kernel crash in an SMP system, since the CPU executing this function will stall /not respond to IPIs. This problem can be reproduced by running the IP Stack Integrity Checker (http://isic.sourceforge.net) using the following command on a Linux machine connected to DUT: "icmpsic -s rand -d <DUT IP address> -r 123456" wait (1-2 min) Signed-off-by: Seif Mazareeb <seif@marvell.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-11-28net: dst: provide accessor function to dst->xfrmVlad Yasevich
[ Upstream commit e87b3998d795123b4139bc3f25490dd236f68212 ] dst->xfrm is conditionally defined. Provide accessor funtion that is always available. Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26ip: generate unique IP identificator if local fragmentation is allowedAnsis Atteka
[ Upstream commit 703133de331a7a7df47f31fb9de51dc6f68a9de8 ] If local fragmentation is allowed, then ip_select_ident() and ip_select_ident_more() need to generate unique IDs to ensure correct defragmentation on the peer. For example, if IPsec (tunnel mode) has to encrypt large skbs that have local_df bit set, then all IP fragments that belonged to different ESP datagrams would have used the same identificator. If one of these IP fragments would get lost or reordered, then peer could possibly stitch together wrong IP fragments that did not belong to the same datagram. This would lead to a packet loss or data corruption. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26inetpeer: fix a race in inetpeer_gc_worker()Eric Dumazet
[ Upstream commit 55432d2b543a4b6dfae54f5c432a566877a85d90 ] commit 5faa5df1fa2024 (inetpeer: Invalidate the inetpeer tree along with the routing cache) added a race : Before freeing an inetpeer, we must respect a RCU grace period, and make sure no user will attempt to increase refcnt. inetpeer_invalidate_tree() waits for a RCU grace period before inserting inetpeer tree into gc_list and waking the worker. At that time, no concurrent lookup can find a inetpeer in this tree. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-10-26inetpeer: Invalidate the inetpeer tree along with the routing cacheSteffen Klassert
[ Upstream commit 5faa5df1fa2024bd750089ff21dcc4191798263d ] We initialize the routing metrics with the values cached on the inetpeer in rt_init_metrics(). So if we have the metrics cached on the inetpeer, we ignore the user configured fib_metrics. To fix this issue, we replace the old tree with a fresh initialized inet_peer_base. The old tree is removed later with a delayed work queue. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-08-02ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET ↵Hannes Frederic Sowa
pending data [ Upstream commit 8822b64a0fa64a5dd1dfcf837c5b0be83f8c05d1 ] We accidentally call down to ip6_push_pending_frames when uncorking pending AF_INET data on a ipv6 socket. This results in the following splat (from Dave Jones): skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:126! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth +netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37 task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000 RIP: 0010:[<ffffffff816e759c>] [<ffffffff816e759c>] skb_panic+0x63/0x65 RSP: 0018:ffff8801e6431de8 EFLAGS: 00010282 RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006 RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520 RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800 R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800 FS: 00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Stack: ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0 Call Trace: [<ffffffff8159a9aa>] skb_push+0x3a/0x40 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0 [<ffffffff816f5d54>] tracesys+0xdd/0xe2 Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 RIP [<ffffffff816e759c>] skb_panic+0x63/0x65 RSP <ffff8801e6431de8> This patch adds a check if the pending data is of address family AF_INET and directly calls udp_push_ending_frames from udp_v6_push_pending_frames if that is the case. This bug was found by Dave Jones with trinity. (Also move the initialization of fl6 below the AF_INET check, even if not strictly necessary.) Cc: Dave Jones <davej@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-08-02ipv6,mcast: always hold idev->lock before mca_lockAmerigo Wang
[ Upstream commit 8965779d2c0e6ab246c82a405236b1fb2adae6b2, with some bits from commit b7b1bfce0bb68bd8f6e62a28295922785cc63781 ("ipv6: split duplicate address detection and router solicitation timer") to get the __ipv6_get_lladdr() used by this patch. ] dingtianhong reported the following deadlock detected by lockdep: ====================================================== [ INFO: possible circular locking dependency detected ] 3.4.24.05-0.1-default #1 Not tainted ------------------------------------------------------- ksoftirqd/0/3 is trying to acquire lock: (&ndev->lock){+.+...}, at: [<ffffffff8147f804>] ipv6_get_lladdr+0x74/0x120 but task is already holding lock: (&mc->mca_lock){+.+...}, at: [<ffffffff8149d130>] mld_send_report+0x40/0x150 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&mc->mca_lock){+.+...}: [<ffffffff810a8027>] validate_chain+0x637/0x730 [<ffffffff810a8417>] __lock_acquire+0x2f7/0x500 [<ffffffff810a8734>] lock_acquire+0x114/0x150 [<ffffffff814f691a>] rt_spin_lock+0x4a/0x60 [<ffffffff8149e4bb>] igmp6_group_added+0x3b/0x120 [<ffffffff8149e5d8>] ipv6_mc_up+0x38/0x60 [<ffffffff81480a4d>] ipv6_find_idev+0x3d/0x80 [<ffffffff81483175>] addrconf_notify+0x3d5/0x4b0 [<ffffffff814fae3f>] notifier_call_chain+0x3f/0x80 [<ffffffff81073471>] raw_notifier_call_chain+0x11/0x20 [<ffffffff813d8722>] call_netdevice_notifiers+0x32/0x60 [<ffffffff813d92d4>] __dev_notify_flags+0x34/0x80 [<ffffffff813d9360>] dev_change_flags+0x40/0x70 [<ffffffff813ea627>] do_setlink+0x237/0x8a0 [<ffffffff813ebb6c>] rtnl_newlink+0x3ec/0x600 [<ffffffff813eb4d0>] rtnetlink_rcv_msg+0x160/0x310 [<ffffffff814040b9>] netlink_rcv_skb+0x89/0xb0 [<ffffffff813eb357>] rtnetlink_rcv+0x27/0x40 [<ffffffff81403e20>] netlink_unicast+0x140/0x180 [<ffffffff81404a9e>] netlink_sendmsg+0x33e/0x380 [<ffffffff813c4252>] sock_sendmsg+0x112/0x130 [<ffffffff813c537e>] __sys_sendmsg+0x44e/0x460 [<ffffffff813c5544>] sys_sendmsg+0x44/0x70 [<ffffffff814feab9>] system_call_fastpath+0x16/0x1b -> #0 (&ndev->lock){+.+...}: [<ffffffff810a798e>] check_prev_add+0x3de/0x440 [<ffffffff810a8027>] validate_chain+0x637/0x730 [<ffffffff810a8417>] __lock_acquire+0x2f7/0x500 [<ffffffff810a8734>] lock_acquire+0x114/0x150 [<ffffffff814f6c82>] rt_read_lock+0x42/0x60 [<ffffffff8147f804>] ipv6_get_lladdr+0x74/0x120 [<ffffffff8149b036>] mld_newpack+0xb6/0x160 [<ffffffff8149b18b>] add_grhead+0xab/0xc0 [<ffffffff8149d03b>] add_grec+0x3ab/0x460 [<ffffffff8149d14a>] mld_send_report+0x5a/0x150 [<ffffffff8149f99e>] igmp6_timer_handler+0x4e/0xb0 [<ffffffff8105705a>] call_timer_fn+0xca/0x1d0 [<ffffffff81057b9f>] run_timer_softirq+0x1df/0x2e0 [<ffffffff8104e8c7>] handle_pending_softirqs+0xf7/0x1f0 [<ffffffff8104ea3b>] __do_softirq_common+0x7b/0xf0 [<ffffffff8104f07f>] __thread_do_softirq+0x1af/0x210 [<ffffffff8104f1c1>] run_ksoftirqd+0xe1/0x1f0 [<ffffffff8106c7de>] kthread+0xae/0xc0 [<ffffffff814fff74>] kernel_thread_helper+0x4/0x10 actually we can just hold idev->lock before taking pmc->mca_lock, and avoid taking idev->lock again when iterating idev->addr_list, since the upper callers of mld_newpack() already take read_lock_bh(&idev->lock). Reported-by: dingtianhong <dingtianhong@huawei.com> Cc: dingtianhong <dingtianhong@huawei.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David S. Miller <davem@davemloft.net> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Tested-by: Ding Tianhong <dingtianhong@huawei.com> Tested-by: Chen Weilong <chenweilong@huawei.com> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-05-30ipv6: do not clear pinet6 fieldEric Dumazet
[ Upstream commit f77d602124d865c38705df7fa25c03de9c284ad2 ] We have seen multiple NULL dereferences in __inet6_lookup_established() After analysis, I found that inet6_sk() could be NULL while the check for sk_family == AF_INET6 was true. Bug was added in linux-2.6.29 when RCU lookups were introduced in UDP and TCP stacks. Once an IPv6 socket, using SLAB_DESTROY_BY_RCU is inserted in a hash table, we no longer can clear pinet6 field. This patch extends logic used in commit fcbdf09d9652c891 ("net: fix nulls list corruptions in sk_prot_alloc") TCP/UDP/UDPLite IPv6 protocols provide their own .clear_sk() method to make sure we do not clear pinet6 field. At socket clone phase, we do not really care, as cloning the parent (non NULL) pinet6 is not adding a fatal race. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-05-30tcp: force a dst refcount when prequeue packetEric Dumazet
[ Upstream commit 093162553c33e9479283e107b4431378271c735d ] Before escaping RCU protected section and adding packet into prequeue, make sure the dst is refcounted. Reported-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-04-25net: fix incorrect credentials passingLinus Torvalds
commit 83f1b4ba917db5dc5a061a44b3403ddb6e783494 upstream. Commit 257b5358b32f ("scm: Capture the full credentials of the scm sender") changed the credentials passing code to pass in the effective uid/gid instead of the real uid/gid. Obviously this doesn't matter most of the time (since normally they are the same), but it results in differences for suid binaries when the wrong uid/gid ends up being used. This just undoes that (presumably unintentional) part of the commit. Reported-by: Andy Lutomirski <luto@amacapital.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: scm_set_cred() does user namespace conversion of euid/egid using cred_to_ucred(). Add and use cred_real_to_ucred() to do the same thing for real uid/gid.] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27inet: limit length of fragment queue hash table bucket listsHannes Frederic Sowa
[ Upstream commit 5a3da1fe9561828d0ca7eca664b16ec2b9bf0055 ] This patch introduces a constant limit of the fragment queue hash table bucket list lengths. Currently the limit 128 is choosen somewhat arbitrary and just ensures that we can fill up the fragment cache with empty packets up to the default ip_frag_high_thresh limits. It should just protect from list iteration eating considerable amounts of cpu. If we reach the maximum length in one hash bucket a warning is printed. This is implemented on the caller side of inet_frag_find to distinguish between the different users of inet_fragment.c. I dropped the out of memory warning in the ipv4 fragment lookup path, because we already get a warning by the slab allocator. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jesper Dangaard Brouer <jbrouer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27ipv4: fix definition of FIB_TABLE_HASHSZDenis V. Lunev
[ Upstream commit 5b9e12dbf92b441b37136ea71dac59f05f2673a9 ] a long time ago by the commit commit 93456b6d7753def8760b423ac6b986eb9d5a4a95 Author: Denis V. Lunev <den@openvz.org> Date: Thu Jan 10 03:23:38 2008 -0800 [IPV4]: Unify access to the routing tables. the defenition of FIB_HASH_TABLE size has obtained wrong dependency: it should depend upon CONFIG_IP_MULTIPLE_TABLES (as was in the original code) but it was depended from CONFIG_IP_ROUTE_MULTIPATH This patch returns the situation to the original state. The problem was spotted by Tingwei Liu. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Tingwei Liu <tingw.liu@gmail.com> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06ipv6: use a stronger hash for tcpEric Dumazet
[ Upstream commit 08dcdbf6a7b9d14c2302c5bd0c5390ddf122f664 ] It looks like its possible to open thousands of TCP IPv6 sessions on a server, all landing in a single slot of TCP hash table. Incoming packets have to lookup sockets in a very long list. We should hash all bits from foreign IPv6 addresses, using a salt and hash mix, not a simple XOR. inet6_ehashfn() can also separately use the ports, instead of xoring them. Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-16tcp: implement RFC 5961 3.2Eric Dumazet
[ Upstream commit 282f23c6ee343126156dd41218b22ece96d747e3 ] Implement the RFC 5691 mitigation against Blind Reset attack using RST bit. Idea is to validate incoming RST sequence, to match RCV.NXT value, instead of previouly accepted window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND) If sequence is in window but not an exact match, send a "challenge ACK", so that the other part can resend an RST with the appropriate sequence. Add a new sysctl, tcp_challenge_ack_limit, to limit number of challenge ACK sent per second. Add a new SNMP counter to count number of challenge acks sent. (netstat -s | grep TCPChallengeACK) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kiran Kumar Kella <kkiran@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-16inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sockChristoph Paasch
[ Upstream commit e337e24d6624e74a558aa69071e112a65f7b5758 ] If in either of the above functions inet_csk_route_child_sock() or __inet_inherit_port() fails, the newsk will not be freed: unreferenced object 0xffff88022e8a92c0 (size 1592): comm "softirq", pid 0, jiffies 4294946244 (age 726.160s) hex dump (first 32 bytes): 0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00 ................ 02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8153d190>] kmemleak_alloc+0x21/0x3e [<ffffffff810ab3e7>] kmem_cache_alloc+0xb5/0xc5 [<ffffffff8149b65b>] sk_prot_alloc.isra.53+0x2b/0xcd [<ffffffff8149b784>] sk_clone_lock+0x16/0x21e [<ffffffff814d711a>] inet_csk_clone_lock+0x10/0x7b [<ffffffff814ebbc3>] tcp_create_openreq_child+0x21/0x481 [<ffffffff814e8fa5>] tcp_v4_syn_recv_sock+0x3a/0x23b [<ffffffff814ec5ba>] tcp_check_req+0x29f/0x416 [<ffffffff814e8e10>] tcp_v4_do_rcv+0x161/0x2bc [<ffffffff814eb917>] tcp_v4_rcv+0x6c9/0x701 [<ffffffff814cea9f>] ip_local_deliver_finish+0x70/0xc4 [<ffffffff814cec20>] ip_local_deliver+0x4e/0x7f [<ffffffff814ce9f8>] ip_rcv_finish+0x1fc/0x233 [<ffffffff814cee68>] ip_rcv+0x217/0x267 [<ffffffff814a7bbe>] __netif_receive_skb+0x49e/0x553 [<ffffffff814a7cc3>] netif_receive_skb+0x50/0x82 This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus a single sock_put() is not enough to free the memory. Additionally, things like xfrm, memcg, cookie_values,... may have been initialized. We have to free them properly. This is fixed by forcing a call to tcp_done(), ending up in inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary, because it ends up doing all the cleanup on xfrm, memcg, cookie_values, xfrm,... Before calling tcp_done, we have to set the socket to SOCK_DEAD, to force it entering inet_csk_destroy_sock. To avoid the warning in inet_csk_destroy_sock, inet_num has to be set to 0. As inet_csk_destroy_sock does a dec on orphan_count, we first have to increase it. Calling tcp_done() allows us to remove the calls to tcp_clear_xmit_timer() and tcp_cleanup_congestion_control(). A similar approach is taken for dccp by calling dccp_done(). This is in the kernel since 093d282321 (tproxy: fix hash locking issue when using port redirection in __inet_inherit_port()), thus since version >= 2.6.37. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-16mac80211: introduce IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAILStanislaw Gruszka
commit 5b632fe85ec82e5c43740b52e74c66df50a37db3 upstream. Commit f0425beda4d404a6e751439b562100b902ba9c98 "mac80211: retry sending failed BAR frames later instead of tearing down aggr" caused regression on rt2x00 hardware (connection hangs). This regression was fixed by commit be03d4a45c09ee5100d3aaaedd087f19bc20d01 "rt2x00: Don't let mac80211 send a BAR when an AMPDU subframe fails". But the latter commit caused yet another problem reported in https://bugzilla.kernel.org/show_bug.cgi?id=42828#c22 After long discussion in this thread: http://mid.gmane.org/20121018075615.GA18212@redhat.com and testing various alternative solutions, which failed on one or other setup, we have no other good fix for the issues like just revert both mentioned earlier commits. To do not affect other hardware which benefit from commit f0425beda4d404a6e751439b562100b902ba9c98, instead of reverting it, introduce flag that when used will restore mac80211 behaviour before the commit. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> [replaced link with mid.gmane.org that has message-id] Signed-off-by: Johannes Berg <johannes.berg@intel.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16mac80211: verify that skb data is presentJohannes Berg
commit 9b395bc3be1cebf0144a127c7e67d56dbdac0930 upstream. A number of places in the mesh code don't check that the frame data is present and in the skb header when trying to access. Add those checks and the necessary pskb_may_pull() calls. This prevents accessing data that doesn't actually exist. To do this, export ieee80211_get_mesh_hdrlen() to be able to use it in mac80211. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16rtnetlink: Fix problem with buffer allocationGreg Rose
commit 115c9b81928360d769a76c632bae62d15206a94a upstream. Implement a new netlink attribute type IFLA_EXT_MASK. The mask is a 32 bit value that can be used to indicate to the kernel that certain extended ifinfo values are requested by the user application. At this time the only mask value defined is RTEXT_FILTER_VF to indicate that the user wants the ifinfo dump to send information about the VFs belonging to the interface. This patch fixes a bug in which certain applications do not have large enough buffers to accommodate the extra information returned by the kernel with large numbers of SR-IOV virtual functions. Those applications will not send the new netlink attribute with the interface info dump request netlink messages so they will not get unexpectedly large request buffers returned by the kernel. Modifies the rtnl_calcit function to traverse the list of net devices and compute the minimum buffer size that can hold the info dumps of all matching devices based upon the filter passed in via the new netlink attribute filter mask. If no filter mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE. With this change it is possible to add yet to be defined netlink attributes to the dump request which should make it fairly extensible in the future. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: drop the change in do_setlink() that reverts commit f18da14565819ba43b8321237e2426a2914cc2ef, which we never applied] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30netfilter: nf_conntrack: fix racy timer handling with reliable eventsPablo Neira Ayuso
commit 5b423f6a40a0327f9d40bc8b97ce9be266f74368 upstream. Existing code assumes that del_timer returns true for alive conntrack entries. However, this is not true if reliable events are enabled. In that case, del_timer may return true for entries that were just inserted in the dying list. Note that packets / ctnetlink may hold references to conntrack entries that were just inserted to such list. This patch fixes the issue by adding an independent timer for event delivery. This increases the size of the ecache extension. Still we can revisit this later and use variable size extensions to allocate this area on demand. Tested-by: Oliver Smith <olipro@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-17ipvs: fix oops on NAT reply in br_nf contextLin Ming
commit 9e33ce453f8ac8452649802bee1f410319408f4b upstream. IPVS should not reset skb->nf_bridge in FORWARD hook by calling nf_reset for NAT replies. It triggers oops in br_nf_forward_finish. [ 579.781508] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 [ 579.781669] IP: [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112 [ 579.781792] PGD 218f9067 PUD 0 [ 579.781865] Oops: 0000 [#1] SMP [ 579.781945] CPU 0 [ 579.781983] Modules linked in: [ 579.782047] [ 579.782080] [ 579.782114] Pid: 4644, comm: qemu Tainted: G W 3.5.0-rc5-00006-g95e69f9 #282 Hewlett-Packard /30E8 [ 579.782300] RIP: 0010:[<ffffffff817b1ca5>] [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112 [ 579.782455] RSP: 0018:ffff88007b003a98 EFLAGS: 00010287 [ 579.782541] RAX: 0000000000000008 RBX: ffff8800762ead00 RCX: 000000000001670a [ 579.782653] RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff8800762ead00 [ 579.782845] RBP: ffff88007b003ac8 R08: 0000000000016630 R09: ffff88007b003a90 [ 579.782957] R10: ffff88007b0038e8 R11: ffff88002da37540 R12: ffff88002da01a02 [ 579.783066] R13: ffff88002da01a80 R14: ffff88002d83c000 R15: ffff88002d82a000 [ 579.783177] FS: 0000000000000000(0000) GS:ffff88007b000000(0063) knlGS:00000000f62d1b70 [ 579.783306] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b [ 579.783395] CR2: 0000000000000004 CR3: 00000000218fe000 CR4: 00000000000027f0 [ 579.783505] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 579.783684] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 579.783795] Process qemu (pid: 4644, threadinfo ffff880021b20000, task ffff880021aba760) [ 579.783919] Stack: [ 579.783959] ffff88007693cedc ffff8800762ead00 ffff88002da01a02 ffff8800762ead00 [ 579.784110] ffff88002da01a02 ffff88002da01a80 ffff88007b003b18 ffffffff817b26c7 [ 579.784260] ffff880080000000 ffffffff81ef59f0 ffff8800762ead00 ffffffff81ef58b0 [ 579.784477] Call Trace: [ 579.784523] <IRQ> [ 579.784562] [ 579.784603] [<ffffffff817b26c7>] br_nf_forward_ip+0x275/0x2c8 [ 579.784707] [<ffffffff81704b58>] nf_iterate+0x47/0x7d [ 579.784797] [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae [ 579.784906] [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102 [ 579.784995] [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae [ 579.785175] [<ffffffff8187fa95>] ? _raw_write_unlock_bh+0x19/0x1b [ 579.785179] [<ffffffff817ac417>] __br_forward+0x97/0xa2 [ 579.785179] [<ffffffff817ad366>] br_handle_frame_finish+0x1a6/0x257 [ 579.785179] [<ffffffff817b2386>] br_nf_pre_routing_finish+0x26d/0x2cb [ 579.785179] [<ffffffff817b2cf0>] br_nf_pre_routing+0x55d/0x5c1 [ 579.785179] [<ffffffff81704b58>] nf_iterate+0x47/0x7d [ 579.785179] [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44 [ 579.785179] [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102 [ 579.785179] [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44 [ 579.785179] [<ffffffff81551525>] ? sky2_poll+0xb35/0xb54 [ 579.785179] [<ffffffff817ad62a>] br_handle_frame+0x213/0x229 [ 579.785179] [<ffffffff817ad417>] ? br_handle_frame_finish+0x257/0x257 [ 579.785179] [<ffffffff816e3b47>] __netif_receive_skb+0x2b4/0x3f1 [ 579.785179] [<ffffffff816e69fc>] process_backlog+0x99/0x1e2 [ 579.785179] [<ffffffff816e6800>] net_rx_action+0xdf/0x242 [ 579.785179] [<ffffffff8107e8a8>] __do_softirq+0xc1/0x1e0 [ 579.785179] [<ffffffff8135a5ba>] ? trace_hardirqs_off_thunk+0x3a/0x6c [ 579.785179] [<ffffffff8188812c>] call_softirq+0x1c/0x30 The steps to reproduce as follow, 1. On Host1, setup brige br0(192.168.1.106) 2. Boot a kvm guest(192.168.1.105) on Host1 and start httpd 3. Start IPVS service on Host1 ipvsadm -A -t 192.168.1.106:80 -s rr ipvsadm -a -t 192.168.1.106:80 -r 192.168.1.105:80 -m 4. Run apache benchmark on Host2(192.168.1.101) ab -n 1000 http://192.168.1.106/ ip_vs_reply4 ip_vs_out handle_response ip_vs_notrack nf_reset() { skb->nf_bridge = NULL; } Actually, IPVS wants in this case just to replace nfct with untracked version. So replace the nf_reset(skb) call in ip_vs_notrack() with a nf_conntrack_put(skb->nfct) call. Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-10xfrm: Workaround incompatibility of ESN and async cryptoSteffen Klassert
[ Upstream commit 3b59df46a449ec9975146d71318c4777ad086744 ] ESN for esp is defined in RFC 4303. This RFC assumes that the sequence number counters are always up to date. However, this is not true if an async crypto algorithm is employed. If the sequence number counters are not up to date on sequence number check, we may incorrectly update the upper 32 bit of the sequence number. This leads to a DOS. We workaround this by comparing the upper sequence number, (used for authentication) with the upper sequence number computed after the async processing. We drop the packet if these numbers are different. To do this, we introduce a recheck function that does this check in the ESN case. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-10Bluetooth: Change signature of smp_conn_security()Vinicius Costa Gomes
commit cc110922da7e902b62d18641a370fec01a9fa794 upstream. To make it clear that it may be called from contexts that may not have any knowledge of L2CAP, we change the connection parameter, to receive a hci_conn. This also makes it clear that it is checking the security of the link. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-19af_netlink: force credentials passing [CVE-2012-3520]Eric Dumazet
[ Upstream commit e0e3cea46d31d23dc40df0a49a7a2c04fe8edfea ] Pablo Neira Ayuso discovered that avahi and potentially NetworkManager accept spoofed Netlink messages because of a kernel bug. The kernel passes all-zero SCM_CREDENTIALS ancillary data to the receiver if the sender did not provide such data, instead of not including any such data at all or including the correct data from the peer (as it is the case with AF_UNIX). This bug was introduced in commit 16e572626961 (af_unix: dont send SCM_CREDENTIALS by default) This patch forces passing credentials for netlink, as before the regression. Another fix would be to not add SCM_CREDENTIALS in netlink messages if not provided by the sender, but it might break some programs. With help from Florian Weimer & Petr Matousek This issue is designated as CVE-2012-3520 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-09-19tcp: Apply device TSO segment limit earlierBen Hutchings
[ Upstream commit 1485348d2424e1131ea42efc033cbd9366462b01 ] Cache the device gso_max_segs in sock::sk_gso_max_segs and use it to limit the size of TSO skbs. This avoids the need to fall back to software GSO for local TCP senders. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-07-12bonding: Fix corrupted queue_mappingEric Dumazet
[ Upstream commit 5ee31c6898ea5537fcea160999d60dc63bc0c305 ] In the transmit path of the bonding driver, skb->cb is used to stash the skb->queue_mapping so that the bonding device can set its own queue mapping. This value becomes corrupted since the skb->cb is also used in __dev_xmit_skb. When transmitting through bonding driver, bond_select_queue is called from dev_queue_xmit. In bond_select_queue the original skb->queue_mapping is copied into skb->cb (via bond_queue_mapping) and skb->queue_mapping is overwritten with the bond driver queue. Subsequently in dev_queue_xmit, __dev_xmit_skb is called which writes the packet length into skb->cb, thereby overwriting the stashed queue mappping. In bond_dev_queue_xmit (called from hard_start_xmit), the queue mapping for the skb is set to the stashed value which is now the skb length and hence is an invalid queue for the slave device. If we want to save skb->queue_mapping into skb->cb[], best place is to add a field in struct qdisc_skb_cb, to make sure it wont conflict with other layers (eg : Qdiscc, Infiniband...) This patchs also makes sure (struct qdisc_skb_cb)->data is aligned on 8 bytes : netem qdisc for example assumes it can store an u64 in it, without misalignment penalty. Note : we only have 20 bytes left in (struct qdisc_skb_cb)->data[]. The largest user is CHOKe and it fills it. Based on a previous patch from Tom Herbert. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Tom Herbert <therbert@google.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Roland Dreier <roland@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-07-12cipso: handle CIPSO options correctly when NetLabel is disabledPaul Moore
[ Upstream commit 20e2a86485967c385d7c7befc1646e4d1d39362e ] When NetLabel is not enabled, e.g. CONFIG_NETLABEL=n, and the system receives a CIPSO tagged packet it is dropped (cipso_v4_validate() returns non-zero). In most cases this is the correct and desired behavior, however, in the case where we are simply forwarding the traffic, e.g. acting as a network bridge, this becomes a problem. This patch fixes the forwarding problem by providing the basic CIPSO validation code directly in ip_options_compile() without the need for the NetLabel or CIPSO code. The new validation code can not perform any of the CIPSO option label/value verification that cipso_v4_validate() does, but it can verify the basic CIPSO option format. The behavior when NetLabel is enabled is unchanged. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-06-10sctp: check cached dst before using itNicolas Dichtel
[ Upstream commit e0268868ba064980488fc8c194db3d8e9fb2959c ] dst_check() will take care of SA (and obsolete field), hence IPsec rekeying scenario is taken into account. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Vlad Yaseivch <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-06-10ipv6: fix incorrect ipsec fragmentGao feng
[ Upstream commit 0c1833797a5a6ec23ea9261d979aa18078720b74 ] Since commit ad0081e43a "ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed" the fragment of packets is incorrect. because tunnel mode needs IPsec headers and trailer for all fragments, while on transport mode it is sufficient to add the headers to the first fragment and the trailer to the last. so modify mtu and maxfraglen base on ipsec mode and if fragment is first or last. with my test,it work well(every fragment's size is the mtu) and does not trigger slow fragment path. Changes from v1: though optimization, mtu_prev and maxfraglen_prev can be delete. replace xfrm mode codes with dst_entry's new frag DST_XFRM_TUNNEL. add fuction ip6_append_data_mtu to make codes clearer. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-06-10set fake_rtable's dst to NULL to avoid kernel OopsPeter Huang (Peng)
[ Upstream commit a881e963c7fe1f226e991ee9bbe8907acda93294 ] bridge: set fake_rtable's dst to NULL to avoid kernel Oops when bridge is deleted before tap/vif device's delete, kernel may encounter an oops because of NULL reference to fake_rtable's dst. Set fake_rtable's dst to NULL before sending packets out can solve this problem. v4 reformat, change br_drop_fake_rtable(skb) to {} v3 enrich commit header v2 introducing new flag DST_FAKE_RTABLE to dst_entry struct. [ Use "do { } while (0)" for nop br_drop_fake_rtable() implementation -DaveM ] Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Peter Huang <peter.huangpeng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-04-22Bluetooth: hci_core: fix NULL-pointer dereference at unregisterJohan Hovold
commit 94324962066231a938564bebad0f941cd2d06bb2 upstream. Make sure hci_dev_open returns immediately if hci_dev_unregister has been called. This fixes a race between hci_dev_open and hci_dev_unregister which can lead to a NULL-pointer dereference. Bug is 100% reproducible using hciattach and a disconnected serial port: 0. # hciattach -n /dev/ttyO1 any noflow 1. hci_dev_open called from hci_power_on grabs req lock 2. hci_init_req executes but device fails to initialise (times out eventually) 3. hci_dev_open is called from hci_sock_ioctl and sleeps on req lock 4. hci_uart_tty_close calls hci_dev_unregister and sleeps on req lock in hci_dev_do_close 5. hci_dev_open (1) releases req lock 6. hci_dev_do_close grabs req lock and returns as device is not up 7. hci_dev_unregister sleeps in destroy_workqueue 8. hci_dev_open (3) grabs req lock, calls hci_init_req and eventually sleeps 9. hci_dev_unregister finishes, while hci_dev_open is still running... [ 79.627136] INFO: trying to register non-static key. [ 79.632354] the code is fine but needs lockdep annotation. [ 79.638122] turning off the locking correctness validator. [ 79.643920] [<c00188bc>] (unwind_backtrace+0x0/0xf8) from [<c00729c4>] (__lock_acquire+0x1590/0x1ab0) [ 79.653594] [<c00729c4>] (__lock_acquire+0x1590/0x1ab0) from [<c00733f8>] (lock_acquire+0x9c/0x128) [ 79.663085] [<c00733f8>] (lock_acquire+0x9c/0x128) from [<c0040a88>] (run_timer_softirq+0x150/0x3ac) [ 79.672668] [<c0040a88>] (run_timer_softirq+0x150/0x3ac) from [<c003a3b8>] (__do_softirq+0xd4/0x22c) [ 79.682281] [<c003a3b8>] (__do_softirq+0xd4/0x22c) from [<c003a924>] (irq_exit+0x8c/0x94) [ 79.690856] [<c003a924>] (irq_exit+0x8c/0x94) from [<c0013a50>] (handle_IRQ+0x34/0x84) [ 79.699157] [<c0013a50>] (handle_IRQ+0x34/0x84) from [<c0008530>] (omap3_intc_handle_irq+0x48/0x4c) [ 79.708648] [<c0008530>] (omap3_intc_handle_irq+0x48/0x4c) from [<c037499c>] (__irq_usr+0x3c/0x60) [ 79.718048] Exception stack(0xcf281fb0 to 0xcf281ff8) [ 79.723358] 1fa0: 0001e6a0 be8dab00 0001e698 00036698 [ 79.731933] 1fc0: 0002df98 0002df38 0000001f 00000000 b6f234d0 00000000 00000004 00000000 [ 79.740509] 1fe0: 0001e6f8 be8d6aa0 be8dac50 0000aab8 80000010 ffffffff [ 79.747497] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 79.756011] pgd = cf3b4000 [ 79.758850] [00000000] *pgd=8f0c7831, *pte=00000000, *ppte=00000000 [ 79.765502] Internal error: Oops: 80000007 [#1] [ 79.770294] Modules linked in: [ 79.773529] CPU: 0 Tainted: G W (3.3.0-rc6-00002-gb5d5c87 #421) [ 79.781066] PC is at 0x0 [ 79.783721] LR is at run_timer_softirq+0x16c/0x3ac [ 79.788787] pc : [<00000000>] lr : [<c0040aa4>] psr: 60000113 [ 79.788787] sp : cf281ee0 ip : 00000000 fp : cf280000 [ 79.800903] r10: 00000004 r9 : 00000100 r8 : b6f234d0 [ 79.806427] r7 : c0519c28 r6 : cf093488 r5 : c0561a00 r4 : 00000000 [ 79.813323] r3 : 00000000 r2 : c054eee0 r1 : 00000001 r0 : 00000000 [ 79.820190] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 79.827728] Control: 10c5387d Table: 8f3b4019 DAC: 00000015 [ 79.833801] Process gpsd (pid: 1265, stack limit = 0xcf2802e8) [ 79.839965] Stack: (0xcf281ee0 to 0xcf282000) [ 79.844573] 1ee0: 00000002 00000000 c0040a24 00000000 00000002 cf281f08 00200200 00000000 [ 79.853210] 1f00: 00000000 cf281f18 cf281f08 00000000 00000000 00000000 cf281f18 cf281f18 [ 79.861816] 1f20: 00000000 00000001 c056184c 00000000 00000001 b6f234d0 c0561848 00000004 [ 79.870452] 1f40: cf280000 c003a3b8 c051e79c 00000001 00000000 00000100 3fa9e7b8 0000000a [ 79.879089] 1f60: 00000025 cf280000 00000025 00000000 00000000 b6f234d0 00000000 00000004 [ 79.887756] 1f80: 00000000 c003a924 c053ad38 c0013a50 fa200000 cf281fb0 ffffffff c0008530 [ 79.896362] 1fa0: 0001e6a0 0000aab8 80000010 c037499c 0001e6a0 be8dab00 0001e698 00036698 [ 79.904998] 1fc0: 0002df98 0002df38 0000001f 00000000 b6f234d0 00000000 00000004 00000000 [ 79.913665] 1fe0: 0001e6f8 be8d6aa0 be8dac50 0000aab8 80000010 ffffffff 00fbf700 04ffff00 [ 79.922302] [<c0040aa4>] (run_timer_softirq+0x16c/0x3ac) from [<c003a3b8>] (__do_softirq+0xd4/0x22c) [ 79.931945] [<c003a3b8>] (__do_softirq+0xd4/0x22c) from [<c003a924>] (irq_exit+0x8c/0x94) [ 79.940582] [<c003a924>] (irq_exit+0x8c/0x94) from [<c0013a50>] (handle_IRQ+0x34/0x84) [ 79.948913] [<c0013a50>] (handle_IRQ+0x34/0x84) from [<c0008530>] (omap3_intc_handle_irq+0x48/0x4c) [ 79.958404] [<c0008530>] (omap3_intc_handle_irq+0x48/0x4c) from [<c037499c>] (__irq_usr+0x3c/0x60) [ 79.967773] Exception stack(0xcf281fb0 to 0xcf281ff8) [ 79.973083] 1fa0: 0001e6a0 be8dab00 0001e698 00036698 [ 79.981658] 1fc0: 0002df98 0002df38 0000001f 00000000 b6f234d0 00000000 00000004 00000000 [ 79.990234] 1fe0: 0001e6f8 be8d6aa0 be8dac50 0000aab8 80000010 ffffffff [ 79.997161] Code: bad PC value [ 80.000396] ---[ end trace 6f6739840475f9ee ]--- [ 80.005279] Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Johan Hovold <jhovold@gmail.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29net: Make qdisc_skb_cb upper size bound explicit.David S. Miller
[ Upstream commit 16bda13d90c8d5da243e2cfa1677e62ecce26860 ] Just like skb->cb[], so that qdisc_skb_cb can be encapsulated inside of other data structures. This is intended to be used by IPoIB so that it can remember addressing information stored at hard_header_ops->create() time that it can fetch when the packet gets to the transmit routine. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29ipv4: reset flowi parameters on route connectJulian Anastasov
[ Upstream commit e6b45241c57a83197e5de9166b3b0d32ac562609 ] Eric Dumazet found that commit 813b3b5db83 (ipv4: Use caller's on-stack flowi as-is in output route lookups.) that comes in 3.0 added a regression. The problem appears to be that resulting flowi4_oif is used incorrectly as input parameter to some routing lookups. The result is that when connecting to local port without listener if the IP address that is used is not on a loopback interface we incorrectly assign RTN_UNICAST to the output route because no route is matched by oif=lo. The RST packet can not be sent immediately by tcp_v4_send_reset because it expects RTN_LOCAL. So, change ip_route_connect and ip_route_newports to update the flowi4 fields that are input parameters because we do not want unnecessary binding to oif. To make it clear what are the input parameters that can be modified during lookup and to show which fields of floiw4 are reused add a new function to update the flowi4 structure: flowi4_update_output. Thanks to Yurij M. Plotnikov for providing a bug report including a program to reproduce the problem. Thanks to Eric Dumazet for tracking the problem down to tcp_v4_send_reset and providing initial fix. Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru> Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03netns: Fail conspicously if someone uses net_generic at an inappropriate time.Eric W. Biederman
[ Upstream commit 5ee4433efe99b9f39f6eff5052a177bbcfe72cea ] By definition net_generic should never be called when it can return NULL. Fail conspicously with a BUG_ON to make it clear when people mess up that a NULL return should never happen. Recently there was a bug in the CAIF subsystem where it was registered with register_pernet_device instead of register_pernet_subsys. It was erroneously concluded that net_generic could validly return NULL and that net_assign_generic was buggy (when it was just inefficient). Hopefully this BUG_ON will prevent people to coming to similar erroneous conclusions in the futrue. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Tested-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2011-12-31ipvs: try also real server with port 0 in backup serverJulian Anastasov
We should not forget to try for real server with port 0 in the backup server when processing the sync message. We should do it in all cases because the backup server can use different forwarding method. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-23net: relax rcvbuf limitsEric Dumazet
skb->truesize might be big even for a small packet. Its even bigger after commit 87fb4b7b533 (net: more accurate skb truesize) and big MTU. We should allow queueing at least one packet per receiver, even with a low RCVBUF setting. Reported-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22net: introduce DST_NOPEER dst flagEric Dumazet
Chris Boot reported crashes occurring in ipv6_select_ident(). [ 461.457562] RIP: 0010:[<ffffffff812dde61>] [<ffffffff812dde61>] ipv6_select_ident+0x31/0xa7 [ 461.578229] Call Trace: [ 461.580742] <IRQ> [ 461.582870] [<ffffffff812efa7f>] ? udp6_ufo_fragment+0x124/0x1a2 [ 461.589054] [<ffffffff812dbfe0>] ? ipv6_gso_segment+0xc0/0x155 [ 461.595140] [<ffffffff812700c6>] ? skb_gso_segment+0x208/0x28b [ 461.601198] [<ffffffffa03f236b>] ? ipv6_confirm+0x146/0x15e [nf_conntrack_ipv6] [ 461.608786] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.614227] [<ffffffff81271d64>] ? dev_hard_start_xmit+0x357/0x543 [ 461.620659] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.626440] [<ffffffffa0379745>] ? br_parse_ip_options+0x19a/0x19a [bridge] [ 461.633581] [<ffffffff812722ff>] ? dev_queue_xmit+0x3af/0x459 [ 461.639577] [<ffffffffa03747d2>] ? br_dev_queue_push_xmit+0x72/0x76 [bridge] [ 461.646887] [<ffffffffa03791e3>] ? br_nf_post_routing+0x17d/0x18f [bridge] [ 461.653997] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.659473] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.665485] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.671234] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.677299] [<ffffffffa0379215>] ? nf_bridge_update_protocol+0x20/0x20 [bridge] [ 461.684891] [<ffffffffa03bb0e5>] ? nf_ct_zone+0xa/0x17 [nf_conntrack] [ 461.691520] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.697572] [<ffffffffa0374812>] ? NF_HOOK.constprop.8+0x3c/0x56 [bridge] [ 461.704616] [<ffffffffa0379031>] ? nf_bridge_push_encap_header+0x1c/0x26 [bridge] [ 461.712329] [<ffffffffa037929f>] ? br_nf_forward_finish+0x8a/0x95 [bridge] [ 461.719490] [<ffffffffa037900a>] ? nf_bridge_pull_encap_header+0x1c/0x27 [bridge] [ 461.727223] [<ffffffffa0379974>] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge] [ 461.734292] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.739758] [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge] [ 461.746203] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.751950] [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge] [ 461.758378] [<ffffffffa037533a>] ? NF_HOOK.constprop.4+0x56/0x56 [bridge] This is caused by bridge netfilter special dst_entry (fake_rtable), a special shared entry, where attaching an inetpeer makes no sense. Problem is present since commit 87c48fa3b46 (ipv6: make fragment identifications less predictable) Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and __ip_select_ident() fallback to the 'no peer attached' handling. Reported-by: Chris Boot <bootc@bootc.net> Tested-by: Chris Boot <bootc@bootc.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-21net: Add a flow_cache_flush_deferred functionSteffen Klassert
flow_cach_flush() might sleep but can be called from atomic context via the xfrm garbage collector. So add a flow_cache_flush_deferred() function and use this if the xfrm garbage colector is invoked from within the packet path. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19sctp: fix incorrect overflow check on autocloseXi Wang
Commit 8ffd3208 voids the previous patches f6778aab and 810c0719 for limiting the autoclose value. If userspace passes in -1 on 32-bit platform, the overflow check didn't work and autoclose would be set to 0xffffffff. This patch defines a max_autoclose (in seconds) for limiting the value and exposes it through sysctl, with the following intentions. 1) Avoid overflowing autoclose * HZ. 2) Keep the default autoclose bound consistent across 32- and 64-bit platforms (INT_MAX / HZ in this patch). 3) Keep the autoclose value consistent between setsockopt() and getsockopt() calls. Suggested-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30sch_red: fix red_calc_qavg_from_idle_timeEric Dumazet
Since commit a4a710c4a7490587 (pkt_sched: Change PSCHED_SHIFT from 10 to 6) it seems RED/GRED are broken. red_calc_qavg_from_idle_time() computes a delay in us units, but this delay is now 16 times bigger than real delay, so the final qavg result smaller than expected. Use standard kernel time services since there is no need to obfuscate them. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29Merge branch 'nf' of git://1984.lsi.us.es/netDavid S. Miller
2011-11-26inet: add a redirect generation id in inetpeerEric Dumazet
Now inetpeer is the place where we cache redirect information for ipv4 destinations, we must be able to invalidate informations when a route is added/removed on host. As inetpeer is not yet namespace aware, this patch adds a shared redirect_genid, and a per inetpeer redirect_genid. This might be changed later if inetpeer becomes ns aware. Cache information for one inerpeer is valid as long as its redirect_genid has the same value than global redirect_genid. Reported-by: Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com> Tested-by: Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-26route: struct rtable can be const in rt_is_input_route and rt_is_output_routeSteffen Klassert
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-26net: Move mtu handling down to the protocol depended handlersSteffen Klassert
We move all mtu handling from dst_mtu() down to the protocol layer. So each protocol can implement the mtu handling in a different manner. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-26net: Rename the dst_opt default_mtu method to mtuSteffen Klassert
We plan to invoke the dst_opt->default_mtu() method unconditioally from dst_mtu(). So rename the method to dst_opt->mtu() to match the name with the new meaning. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-23ipv4: Save nexthop address of LSRR/SSRR option to IPCB.Li Wei
We can not update iph->daddr in ip_options_rcv_srr(), It is too early. When some exception ocurred later (eg. in ip_forward() when goto sr_failed) we need the ip header be identical to the original one as ICMP need it. Add a field 'nexthop' in struct ip_options to save nexthop of LSRR or SSRR option. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-22netfilter: nf_conntrack: make event callback registration per-netnsPablo Neira Ayuso
This patch fixes an oops that can be triggered following this recipe: 0) make sure nf_conntrack_netlink and nf_conntrack_ipv4 are loaded. 1) container is started. 2) connect to it via lxc-console. 3) generate some traffic with the container to create some conntrack entries in its table. 4) stop the container: you hit one oops because the conntrack table cleanup tries to report the destroy event to user-space but the per-netns nfnetlink socket has already gone (as the nfnetlink socket is per-netns but event callback registration is global). To fix this situation, we make the ctnl_notifier per-netns so the callback is registered/unregistered if the container is created/destroyed. Alex Bligh and Alexey Dobriyan originally proposed one small patch to check if the nfnetlink socket is gone in nfnetlink_has_listeners, but this is a very visited path for events, thus, it may reduce performance and it looks a bit hackish to check for the nfnetlink socket only to workaround this situation. As a result, I decided to follow the bigger path choice, which seems to look nicer to me. Cc: Alexey Dobriyan <adobriyan@gmail.com> Reported-by: Alex Bligh <alex@alex.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-09Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
2011-11-09Merge branch 'master' of ↵John W. Linville
ssh://ra.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2011-11-08cfg80211: fix missing kernel-docJohannes Berg
Two new struct members were not documented, fix that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>