linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2013-01-17	rbd: remove linger unconditionally	Alex Elder
	In __unregister_linger_request(), the request is being removed from the osd client's req_linger list only when the request has a non-null osd pointer. It should be done whether or not the request currently has an osd. This is most likely a non-issue because I believe the request will always have an osd when this function is called. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> (cherry picked from commit 61c74035626beb25a39b0273ccf7d75510bc36a1) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	ceph: don't reference req after put	Alex Elder
	In __unregister_request(), there is a call to list_del_init() referencing a request that was the subject of a call to ceph_osdc_put_request() on the previous line. This is not safe, because the request structure could have been freed by the time we reach the list_del_init(). Fix this by reversing the order of these lines. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit 7d5f24812bd182a2471cb69c1c2baf0648332e1f) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: remove 'osdtimeout' option	Sage Weil
	This would reset a connection with any OSD that had an outstanding request that was taking more than N seconds. The idea was that if the OSD was buggy, the client could compensate by resending the request. In reality, this only served to hide server bugs, and we haven't actually seen such a bug in quite a while. Moreover, the userspace client code never did this. More importantly, often the request is taking a long time because the OSD is trying to recover, or overloaded, and killing the connection and retrying would only make the situation worse by giving the OSD more work to do. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> (cherry picked from commit 83aff95eb9d60aff5497e9f44a2ae906b86d8e88) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: avoid using freed osd in __kick_osd_requests()	Alex Elder
	If an osd has no requests and no linger requests, __reset_osd() will just remove it with a call to __remove_osd(). That drops a reference to the osd, and therefore the osd may have been free by the time __reset_osd() returns. That function offers no indication this may have occurred, and as a result the osd will continue to be used even when it's no longer valid. Change__reset_osd() so it returns an error (ENODEV) when it deletes the osd being reset. And change __kick_osd_requests() so it returns immediately (before referencing osd again) if __reset_osd() returns any error. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> (cherry picked from commit 685a7555ca69030739ddb57a47f0ea8ea80196a4) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: fix osdmap decode error paths	Sage Weil
	Ensure that we set the err value correctly so that we do not pass a 0 value to ERR_PTR and confuse the calling code. (In particular, osd_client.c handle_map() will BUG(!newmap)). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> (cherry picked from commit 0ed7285e0001b960c888e5455ae982025210ed3d) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: fix protocol feature mismatch failure path	Sage Weil
	We should not set con->state to CLOSED here; that happens in ceph_fault() in the caller, where it first asserts that the state is not yet CLOSED. Avoids a BUG when the features don't match. Since the fail_protocol() has become a trivial wrapper, replace calls to it with direct calls to reset_connection(). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> (cherry picked from commit 0fa6ebc600bc8e830551aee47a0e929e818a1868) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: WARN, don't BUG on unexpected connection states	Alex Elder
	A number of assertions in the ceph messenger are implemented with BUG_ON(), killing the system if connection's state doesn't match what's expected. At this point our state model is (evidently) not well understood enough for these assertions to trigger a BUG(). Convert all BUG_ON(con->state...) calls to be WARN_ON(con->state...) so we learn about these issues without killing the machine. We now recognize that a connection fault can occur due to a socket closure at any time, regardless of the state of the connection. So there is really nothing we can assert about the state of the connection at that point so eliminate that assertion. Reported-by: Ugis <ugis22@gmail.com> Tested-by: Ugis <ugis22@gmail.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> (cherry picked from commit 122070a2ffc91f87fe8e8493eb0ac61986c5557c) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: always reset osds when kicking	Alex Elder
	When ceph_osdc_handle_map() is called to process a new osd map, kick_requests() is called to ensure all affected requests are updated if necessary to reflect changes in the osd map. This happens in two cases: whenever an incremental map update is processed; and when a full map update (or the last one if there is more than one) gets processed. In the former case, the kick_requests() call is followed immediately by a call to reset_changed_osds() to ensure any connections to osds affected by the map change are reset. But for full map updates this isn't done. Both cases should be doing this osd reset. Rather than duplicating the reset_changed_osds() call, move it into the end of kick_requests(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> (cherry picked from commit e6d50f67a6b1a6252a616e6e629473b5c4277218) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: move linger requests sooner in kick_requests()	Alex Elder
	The kick_requests() function is called by ceph_osdc_handle_map() when an osd map change has been indicated. Its purpose is to re-queue any request whose target osd is different from what it was when it was originally sent. It is structured as two loops, one for incomplete but registered requests, and a second for handling completed linger requests. As a special case, in the first loop if a request marked to linger has not yet completed, it is moved from the request list to the linger list. This is as a quick and dirty way to have the second loop handle sending the request along with all the other linger requests. Because of the way it's done now, however, this quick and dirty solution can result in these incomplete linger requests never getting re-sent as desired. The problem lies in the fact that the second loop only arranges for a linger request to be sent if it appears its target osd has changed. This is the proper handling for completed linger requests (it avoids issuing the same linger request twice to the same osd). But although the linger requests added to the list in the first loop may have been sent, they have not yet completed, so they need to be re-sent regardless of whether their target osd has changed. The first required fix is we need to avoid calling __map_request() on any incomplete linger request. Otherwise the subsequent __map_request() call in the second loop will find the target osd has not changed and will therefore not re-send the request. Second, we need to be sure that a sent but incomplete linger request gets re-sent. If the target osd is the same with the new osd map as it was when the request was originally sent, this won't happen. This can be fixed through careful handling when we move these requests from the request list to the linger list, by unregistering the request before it is registered as a linger request. This works because a side-effect of unregistering the request is to make the request's r_osd pointer be NULL, and that will ensure the second loop actually re-sends the linger request. Processing of such a request is done at that point, so continue with the next one once it's been moved. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> (cherry picked from commit ab60b16d3c31b9bd9fd5b39f97dc42c52a50b67d) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: register request before unregister linger	Alex Elder
	In kick_requests(), we need to register the request before we unregister the linger request. Otherwise the unregister will reset the request's osd pointer to NULL. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> (cherry picked from commit c89ce05e0c5a01a256100ac6a6019f276bdd1ca6) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: don't use rb_init_node() in ceph_osdc_alloc_request()	Alex Elder
	The red-black node in the ceph osd request structure is initialized in ceph_osdc_alloc_request() using rbd_init_node(). We do need to initialize this, because in __unregister_request() we call RB_EMPTY_NODE(), which expects the node it's checking to have been initialized. But rb_init_node() is apparently overkill, and may in fact be on its way out. So use RB_CLEAR_NODE() instead. For a little more background, see this commit: 4c199a93 rbtree: empty nodes have no color" Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> (cherry picked from commit a978fa20fb657548561dddbfb605fe43654f0825) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: init event->node in ceph_osdc_create_event()	Alex Elder
	The red-black node node in the ceph osd event structure is not initialized in create_osdc_create_event(). Because this node can be the subject of a RB_EMPTY_NODE() call later on, we should ensure the node is initialized properly for that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> (cherry picked from commit 3ee5234df68d253c415ba4f2db72ad250d9c21a9) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: init osd->o_node in create_osd()	Alex Elder
	The red-black node node in the ceph osd structure is not initialized in create_osd(). Because this node can be the subject of a RB_EMPTY_NODE() call later on, we should ensure the node is initialized properly for that. Add a call to RB_CLEAR_NODE() initialize it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> (cherry picked from commit f407731d12214e7686819018f3a1e9d7b6f83a02) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: report connection fault with warning	Alex Elder
	When a connection's socket disconnects, or if there's a protocol error of some kind on the connection, a fault is signaled and the connection is reset (closed and reopened, basically). We currently get an error message on the log whenever this occurs. A ceph connection will attempt to reestablish a socket connection repeatedly if a fault occurs. This means that these error messages will get repeatedly added to the log, which is undesirable. Change the error message to be a warning, so they don't get logged by default. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> (cherry picked from commit 28362986f8743124b3a0fda20a8ed3e80309cce1) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: socket can close in any connection state	Alex Elder
	A connection's socket can close for any reason, independent of the state of the connection (and without irrespective of the connection mutex). As a result, the connectino can be in pretty much any state at the time its socket is closed. Handle those other cases at the top of con_work(). Pull this whole block of code into a separate function to reduce the clutter. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> (cherry picked from commit 7bb21d68c535ad8be38e14a715632ae398b37ac1) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	ceph: propagate layout error on osd request creation	Sage Weil
	If we are creating an osd request and get an invalid layout, return an EINVAL to the caller. We switch up the return to have an error code instead of NULL implying -ENOMEM. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> (cherry picked from commit 6816282dab3a72efe8c0d182c1bc2960d87f4322) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	mac80211: use del_timer_sync for final sta cleanup timer deletion	Johannes Berg
	commit a56f992cdabc63f56b4b142885deebebf936ff76 upstream. This is a very old bug, but there's nothing that prevents the timer from running while the module is being removed when we only do del_timer() instead of del_timer_sync(). The timer should normally not be running at this point, but it's not clearly impossible (or we could just remove this.) Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	mac80211: fix ibss scanning	Stanislaw Gruszka
	commit 34bcf71502413f8903ade93746f2d0f04b937a78 upstream. Do not scan on no-IBSS and disabled channels in IBSS mode. Doing this can trigger Microcode errors on iwlwifi and iwlegacy drivers. Also rename ieee80211_request_internal_scan() function since it is only used in IBSS mode and simplify calling it from ieee80211_sta_find_ibss(). This patch should address: https://bugzilla.redhat.com/show_bug.cgi?id=883414 https://bugzilla.kernel.org/show_bug.cgi?id=49411 Reported-by: Jesse Kahtava <jesse_kahtava@f-m.fm> Reported-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	SUNRPC: Fix validity issues with rpc_pipefs sb->s_fs_info	Trond Myklebust
	commit 642fe4d00db56d65060ce2fd4c105884414acb16 upstream. rpc_kill_sb() must defer calling put_net() until after the notifier has been called, since most (all?) of the notifier callbacks assume that sb->s_fs_info points to a valid net namespace. It also must not call put_net() if the call to rpc_fill_super was unsuccessful. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=48421 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	SUNRPC: Ensure we release the socket write lock if the rpc_task exits early	Trond Myklebust
	commit 87ed50036b866db2ec2ba16b2a7aec4a2b0b7c39 upstream. If the rpc_task exits while holding the socket write lock before it has allocated an rpc slot, then the usual mechanism for releasing the write lock in xprt_release() is defeated. The problem occurs if the call to xprt_lock_write() initially fails, so that the rpc_task is put on the xprt->sending wait queue. If the task exits after being assigned the lock by __xprt_lock_write_func, but before it has retried the call to xprt_lock_and_alloc_slot(), then it calls xprt_release() while holding the write lock, but will immediately exit due to the test for task->tk_rqstp != NULL. Reported-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	SUNRPC: Ensure that we free the rpc_task after cleanups are done	Trond Myklebust
	commit c6567ed1402c55e19b012e66a8398baec2a726f3 upstream. This patch ensures that we free the rpc_task after the cleanup callbacks are done in order to avoid a deadlock problem that can be triggered if the callback needs to wait for another workqueue item to complete. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Weston Andros Adamson <dros@netapp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Bruce Fields <bfields@fieldses.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	SUNRPC: continue run over clients list on PipeFS event instead of break	Stanislav Kinsbursky
	commit cd6c5968582a273561464fe6b1e8cc8214be02df upstream. There are SUNRPC clients, which program doesn't have pipe_dir_name. These clients can be skipped on PipeFS events, because nothing have to be created or destroyed. But instead of breaking in case of such a client was found, search for suitable client over clients list have to be continued. Otherwise some clients could not be covered by PipeFS event handler. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	Bluetooth: cancel power_on work when unregistering the device	Gustavo Padovan
	commit b9b5ef188e5a2222cfc16ef62a4703080750b451 upstream. We need to cancel the hci_power_on work in order to avoid it run when we try to free the hdev. [ 1434.201149] ------------[ cut here ]------------ [ 1434.204998] WARNING: at lib/debugobjects.c:261 debug_print_object+0x8e/0xb0() [ 1434.208324] ODEBUG: free active (active state 0) object type: work_struct hint: hci _power_on+0x0/0x90 [ 1434.210386] Pid: 8564, comm: trinity-child25 Tainted: G W 3.7.0-rc5-next- 20121112-sasha-00018-g2f4ce0e #127 [ 1434.210760] Call Trace: [ 1434.210760] [<ffffffff819f3d6e>] ? debug_print_object+0x8e/0xb0 [ 1434.210760] [<ffffffff8110b887>] warn_slowpath_common+0x87/0xb0 [ 1434.210760] [<ffffffff8110b911>] warn_slowpath_fmt+0x41/0x50 [ 1434.210760] [<ffffffff819f3d6e>] debug_print_object+0x8e/0xb0 [ 1434.210760] [<ffffffff8376b750>] ? hci_dev_open+0x310/0x310 [ 1434.210760] [<ffffffff83bf94e5>] ? _raw_spin_unlock_irqrestore+0x55/0xa0 [ 1434.210760] [<ffffffff819f3ee5>] __debug_check_no_obj_freed+0xa5/0x230 [ 1434.210760] [<ffffffff83785db0>] ? bt_host_release+0x10/0x20 [ 1434.210760] [<ffffffff819f4d15>] debug_check_no_obj_freed+0x15/0x20 [ 1434.210760] [<ffffffff8125eee7>] kfree+0x227/0x330 [ 1434.210760] [<ffffffff83785db0>] bt_host_release+0x10/0x20 [ 1434.210760] [<ffffffff81e539e5>] device_release+0x65/0xc0 [ 1434.210760] [<ffffffff819d3975>] kobject_cleanup+0x145/0x190 [ 1434.210760] [<ffffffff819d39cd>] kobject_release+0xd/0x10 [ 1434.210760] [<ffffffff819d33cc>] kobject_put+0x4c/0x60 [ 1434.210760] [<ffffffff81e548b2>] put_device+0x12/0x20 [ 1434.210760] [<ffffffff8376a334>] hci_free_dev+0x24/0x30 [ 1434.210760] [<ffffffff82fd8fe1>] vhci_release+0x31/0x60 [ 1434.210760] [<ffffffff8127be12>] __fput+0x122/0x250 [ 1434.210760] [<ffffffff811cab0d>] ? rcu_user_exit+0x9d/0xd0 [ 1434.210760] [<ffffffff8127bf49>] ____fput+0x9/0x10 [ 1434.210760] [<ffffffff81133402>] task_work_run+0xb2/0xf0 [ 1434.210760] [<ffffffff8106cfa7>] do_notify_resume+0x77/0xa0 [ 1434.210760] [<ffffffff83bfb0ea>] int_signal+0x12/0x17 [ 1434.210760] ---[ end trace a6d57fefbc8a8cc7 ]--- Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	Bluetooth: Add missing lock nesting notation	Gustavo Padovan
	commit dc2a0e20fbc85a71c63aa4330b496fda33f6bf80 upstream. This patch fixes the following report, it happens when accepting rfcomm connections: [ 228.165378] ============================================= [ 228.165378] [ INFO: possible recursive locking detected ] [ 228.165378] 3.7.0-rc1-00536-gc1d5dc4 #120 Tainted: G W [ 228.165378] --------------------------------------------- [ 228.165378] bluetoothd/1341 is trying to acquire lock: [ 228.165378] (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+...}, at: [<ffffffffa0000aa0>] bt_accept_dequeue+0xa0/0x180 [bluetooth] [ 228.165378] [ 228.165378] but task is already holding lock: [ 228.165378] (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+...}, at: [<ffffffffa0205118>] rfcomm_sock_accept+0x58/0x2d0 [rfcomm] [ 228.165378] [ 228.165378] other info that might help us debug this: [ 228.165378] Possible unsafe locking scenario: [ 228.165378] [ 228.165378] CPU0 [ 228.165378] ---- [ 228.165378] lock(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM); [ 228.165378] lock(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM); [ 228.165378] [ 228.165378] * DEADLOCK * [ 228.165378] [ 228.165378] May be due to missing lock nesting notation Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation	Eric Dumazet
	[ Upstream commit 354e4aa391ed50a4d827ff6fc11e0667d0859b25 ] RFC 5961 5.2 [Blind Data Injection Attack].[Mitigation] All TCP stacks MAY implement the following mitigation. TCP stacks that implement this mitigation MUST add an additional input check to any incoming segment. The ACK value is considered acceptable only if it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <= SND.NXT). All incoming segments whose ACK value doesn't satisfy the above condition MUST be discarded and an ACK sent back. Move tcp_send_challenge_ack() before tcp_ack() to avoid a forward declaration. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	tcp: tcp_replace_ts_recent() should not be called from tcp_validate_incoming()	Eric Dumazet
	[ Upstream commit bd090dfc634ddd711a5fbd0cadc6e0ab4977bcaf ] We added support for RFC 5961 in latest kernels but TCP fails to perform exhaustive check of ACK sequence. We can update our view of peer tsval from a frame that is later discarded by tcp_ack() This makes timestamps enabled sessions vulnerable to injection of a high tsval : peers start an ACK storm, since the victim sends a dupack each time it receives an ACK from the other peer. As tcp_validate_incoming() is called before tcp_ack(), we should not peform tcp_replace_ts_recent() from it, and let callers do it at the right time. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Romain Francoise <romain@orebokech.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	tcp: refine SYN handling in tcp_validate_incoming	Eric Dumazet
	[ Upstream commit e371589917011efe6ff8c7dfb4e9e81934ac5855 ] Followup of commit 0c24604b68fc (tcp: implement RFC 5961 4.2) As reported by Vijay Subramanian, we should send a challenge ACK instead of a dup ack if a SYN flag is set on a packet received out of window. This permits the ratelimiting to work as intended, and to increase correct SNMP counters. Suggested-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Vijay Subramanian <subramanian.vijay@gmail.com> Cc: Kiran Kumar Kella <kkiran@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	tcp: implement RFC 5961 4.2	Eric Dumazet
	[ Upstream commit 0c24604b68fc7810d429d6c3657b6f148270e528 ] Implement the RFC 5691 mitigation against Blind Reset attack using SYN bit. Section 4.2 of RFC 5961 advises to send a Challenge ACK and drop incoming packet, instead of resetting the session. Add a new SNMP counter to count number of challenge acks sent in response to SYN packets. (netstat -s \| grep TCPSYNChallenge) Remove obsolete TCPAbortOnSyn, since we no longer abort a TCP session because of a SYN flag. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kiran Kumar Kella <kkiran@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	tcp: implement RFC 5961 3.2	Eric Dumazet
	[ Upstream commit 282f23c6ee343126156dd41218b22ece96d747e3 ] Implement the RFC 5691 mitigation against Blind Reset attack using RST bit. Idea is to validate incoming RST sequence, to match RCV.NXT value, instead of previouly accepted window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND) If sequence is in window but not an exact match, send a "challenge ACK", so that the other part can resend an RST with the appropriate sequence. Add a new sysctl, tcp_challenge_ack_limit, to limit number of challenge ACK sent per second. Add a new SNMP counter to count number of challenge acks sent. (netstat -s \| grep TCPChallengeACK) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kiran Kumar Kella <kkiran@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	net: sched: integer overflow fix	Stefan Hasko
	[ Upstream commit d2fe85da52e89b8012ffad010ef352a964725d5f ] Fixed integer overflow in function htb_dequeue Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock	Christoph Paasch
	[ Upstream commit e337e24d6624e74a558aa69071e112a65f7b5758 ] If in either of the above functions inet_csk_route_child_sock() or __inet_inherit_port() fails, the newsk will not be freed: unreferenced object 0xffff88022e8a92c0 (size 1592): comm "softirq", pid 0, jiffies 4294946244 (age 726.160s) hex dump (first 32 bytes): 0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00 ................ 02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8153d190>] kmemleak_alloc+0x21/0x3e [<ffffffff810ab3e7>] kmem_cache_alloc+0xb5/0xc5 [<ffffffff8149b65b>] sk_prot_alloc.isra.53+0x2b/0xcd [<ffffffff8149b784>] sk_clone_lock+0x16/0x21e [<ffffffff814d711a>] inet_csk_clone_lock+0x10/0x7b [<ffffffff814ebbc3>] tcp_create_openreq_child+0x21/0x481 [<ffffffff814e8fa5>] tcp_v4_syn_recv_sock+0x3a/0x23b [<ffffffff814ec5ba>] tcp_check_req+0x29f/0x416 [<ffffffff814e8e10>] tcp_v4_do_rcv+0x161/0x2bc [<ffffffff814eb917>] tcp_v4_rcv+0x6c9/0x701 [<ffffffff814cea9f>] ip_local_deliver_finish+0x70/0xc4 [<ffffffff814cec20>] ip_local_deliver+0x4e/0x7f [<ffffffff814ce9f8>] ip_rcv_finish+0x1fc/0x233 [<ffffffff814cee68>] ip_rcv+0x217/0x267 [<ffffffff814a7bbe>] __netif_receive_skb+0x49e/0x553 [<ffffffff814a7cc3>] netif_receive_skb+0x50/0x82 This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus a single sock_put() is not enough to free the memory. Additionally, things like xfrm, memcg, cookie_values,... may have been initialized. We have to free them properly. This is fixed by forcing a call to tcp_done(), ending up in inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary, because it ends up doing all the cleanup on xfrm, memcg, cookie_values, xfrm,... Before calling tcp_done, we have to set the socket to SOCK_DEAD, to force it entering inet_csk_destroy_sock. To avoid the warning in inet_csk_destroy_sock, inet_num has to be set to 0. As inet_csk_destroy_sock does a dec on orphan_count, we first have to increase it. Calling tcp_done() allows us to remove the calls to tcp_clear_xmit_timer() and tcp_cleanup_congestion_control(). A similar approach is taken for dccp by calling dccp_done(). This is in the kernel since 093d282321 (tproxy: fix hash locking issue when using port redirection in __inet_inherit_port()), thus since version >= 2.6.37. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	batman-adv: fix random jitter calculation	Akinobu Mita
	[ Upstream commit 143cdd8f33909ff5a153e3f02048738c5964ba26 ] batadv_iv_ogm_emit_send_time() attempts to calculates a random integer in the range of 'orig_interval +- BATADV_JITTER' by the below lines. msecs = atomic_read(&bat_priv->orig_interval) - BATADV_JITTER; msecs += (random32() % 2 * BATADV_JITTER); But it actually gets 'orig_interval' or 'orig_interval - BATADV_JITTER' because '%' and '*' have same precedence and associativity is left-to-right. This adds the parentheses at the appropriate position so that it matches original intension. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Antonio Quartulli <ordex@autistici.org> Cc: Marek Lindner <lindner_marek@yahoo.de> Cc: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Cc: Antonio Quartulli <ordex@autistici.org> Cc: b.a.t.m.a.n@lists.open-mesh.org Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	inet_diag: validate port comparison byte code to prevent unsafe reads	Neal Cardwell
	[ Upstream commit 5e1f54201cb481f40a04bc47e1bc8c093a189e23 ] Add logic to verify that a port comparison byte code operation actually has the second inet_diag_bc_op from which we read the port for such operations. Previously the code blindly referenced op[1] without first checking whether a second inet_diag_bc_op struct could fit there. So a malicious user could make the kernel read 4 bytes beyond the end of the bytecode array by claiming to have a whole port comparison byte code (2 inet_diag_bc_op structs) when in fact the bytecode was not long enough to hold both. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run()	Neal Cardwell
	[ Upstream commit f67caec9068cee426ec23cf9005a1dee2ecad187 ] Add logic to check the address family of the user-supplied conditional and the address family of the connection entry. We now do not do prefix matching of addresses from different address families (AF_INET vs AF_INET6), except for the previously existing support for having an IPv4 prefix match an IPv4-mapped IPv6 address (which this commit maintains as-is). This change is needed for two reasons: (1) The addresses are different lengths, so comparing a 128-bit IPv6 prefix match condition to a 32-bit IPv4 connection address can cause us to unwittingly walk off the end of the IPv4 address and read garbage or oops. (2) The IPv4 and IPv6 address spaces are semantically distinct, so a simple bit-wise comparison of the prefixes is not meaningful, and would lead to bogus results (except for the IPv4-mapped IPv6 case, which this commit maintains). Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	inet_diag: validate byte code to prevent oops in inet_diag_bc_run()	Neal Cardwell
	[ Upstream commit 405c005949e47b6e91359159c24753519ded0c67 ] Add logic to validate INET_DIAG_BC_S_COND and INET_DIAG_BC_D_COND operations. Previously we did not validate the inet_diag_hostcond, address family, address length, and prefix length. So a malicious user could make the kernel read beyond the end of the bytecode array by claiming to have a whole inet_diag_hostcond when the bytecode was not long enough to contain a whole inet_diag_hostcond of the given address family. Or they could make the kernel read up to about 27 bytes beyond the end of a connection address by passing a prefix length that exceeded the length of addresses of the given family. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state	Neal Cardwell
	[ Upstream commit 1c95df85ca49640576de2f0a850925957b547b84 ] Fix inet_diag to be aware of the fact that AF_INET6 TCP connections instantiated for IPv4 traffic and in the SYN-RECV state were actually created with inet_reqsk_alloc(), instead of inet6_reqsk_alloc(). This means that for such connections inet6_rsk(req) returns a pointer to a random spot in memory up to roughly 64KB beyond the end of the request_sock. With this bug, for a server using AF_INET6 TCP sockets and serving IPv4 traffic, an inet_diag user like `ss state SYN-RECV` would lead to inet_diag_fill_req() causing an oops or the export to user space of 16 bytes of kernel memory as a garbage IPv6 address, depending on where the garbage inet6_rsk(req) pointed. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	ipv4: ip_check_defrag must not modify skb before unsharing	Johannes Berg
	[ Upstream commit 1bf3751ec90cc3174e01f0d701e8449ce163d113 ] ip_check_defrag() might be called from af_packet within the RX path where shared SKBs are used, so it must not modify the input SKB before it has unshared it for defragmentation. Use skb_copy_bits() to get the IP header and only pull in everything later. The same is true for the other caller in macvlan as it is called from dev->rx_handler which can also get a shared SKB. Reported-by: Eric Leblond <eric@regit.org> Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	sctp: fix -ENOMEM result with invalid user space pointer in sendto() syscall	Tommi Rantala
	[ Upstream commit 6e51fe7572590d8d86e93b547fab6693d305fd0d ] Consider the following program, that sets the second argument to the sendto() syscall incorrectly: #include <string.h> #include <arpa/inet.h> #include <sys/socket.h> int main(void) { int fd; struct sockaddr_in sa; fd = socket(AF_INET, SOCK_STREAM, 132 /IPPROTO_SCTP/); if (fd < 0) return 1; memset(&sa, 0, sizeof(sa)); sa.sin_family = AF_INET; sa.sin_addr.s_addr = inet_addr("127.0.0.1"); sa.sin_port = htons(11111); sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa)); return 0; } We get -ENOMEM: $ strace -e sendto ./demo sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ENOMEM (Cannot allocate memory) Propagate the error code from sctp_user_addto_chunk(), so that we will tell user space what actually went wrong: $ strace -e sendto ./demo sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EFAULT (Bad address) Noticed while running Trinity (the syscall fuzzer). Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space ↵	Tommi Rantala
	fails [ Upstream commit be364c8c0f17a3dd42707b5a090b318028538eb9 ] Trinity (the syscall fuzzer) discovered a memory leak in SCTP, reproducible e.g. with the sendto() syscall by passing invalid user space pointer in the second argument: #include <string.h> #include <arpa/inet.h> #include <sys/socket.h> int main(void) { int fd; struct sockaddr_in sa; fd = socket(AF_INET, SOCK_STREAM, 132 /IPPROTO_SCTP/); if (fd < 0) return 1; memset(&sa, 0, sizeof(sa)); sa.sin_family = AF_INET; sa.sin_addr.s_addr = inet_addr("127.0.0.1"); sa.sin_port = htons(11111); sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa)); return 0; } As far as I can tell, the leak has been around since ~2003. Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-03	NFC: Fix nfc_llcp_local chained list insertion	Thierry Escande
	commit 16a78e9fed5e8baa8480ae3413f4328c4537c599 upstream. list_add was called with swapped parameters Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-03	can: bcm: initialize ifindex for timeouts without previous frame reception	Oliver Hartkopp
	commit 81b401100c01d2357031e874689f89bd788d13cd upstream. Set in the rx_ifindex to pass the correct interface index in the case of a message timeout detection. Usually the rx_ifindex value is set at receive time. But when no CAN frame has been received the RX_TIMEOUT notification did not contain a valid value. Reported-by: Andre Naujoks <nautsch2@googlemail.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-03	mac80211: deinitialize ibss-internals after emptiness check	Simon Wunderlich
	commit b78a4932f5fb11fadf41e69c606a33fa6787574c upstream. The check whether the IBSS is active and can be removed should be performed before deinitializing the fields used for the check/search. Otherwise, the configured BSS will not be found and removed properly. To make it more clear for the future, rename sdata->u.ibss to the local pointer ifibss which is used within the checks. This behaviour was introduced by f3209bea110cade12e2b133da8b8499689cb0e2e ("mac80211: fix IBSS teardown race") Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Cc: Ignacy Gawedzki <i@lri.fr> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-03	Bluetooth: Fix using uninitialized option in RFCMode	Szymon Janc
	commit 8f321f853ea33330c7141977cd34804476e2e07e upstream. If remote device sends bogus RFC option with invalid length, undefined options values are used. Fix this by using defaults when remote misbehaves. This also fixes the following warning reported by gcc 4.7.0: net/bluetooth/l2cap_core.c: In function 'l2cap_config_rsp': net/bluetooth/l2cap_core.c:3302:13: warning: 'rfc.max_pdu_size' may be used uninitialized in this function [-Wmaybe-uninitialized] net/bluetooth/l2cap_core.c:3266:24: note: 'rfc.max_pdu_size' was declared here net/bluetooth/l2cap_core.c:3298:25: warning: 'rfc.monitor_timeout' may be used uninitialized in this function [-Wmaybe-uninitialized] net/bluetooth/l2cap_core.c:3266:24: note: 'rfc.monitor_timeout' was declared here net/bluetooth/l2cap_core.c:3297:25: warning: 'rfc.retrans_timeout' may be used uninitialized in this function [-Wmaybe-uninitialized] net/bluetooth/l2cap_core.c:3266:24: note: 'rfc.retrans_timeout' was declared here net/bluetooth/l2cap_core.c:3295:2: warning: 'rfc.mode' may be used uninitialized in this function [-Wmaybe-uninitialized] net/bluetooth/l2cap_core.c:3266:24: note: 'rfc.mode' was declared here Signed-off-by: Szymon Janc <szymon.janc@tieto.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26	libceph: check for invalid mapping	Sage Weil
	(cherry picked from commit d63b77f4c552cc3a20506871046ab0fcbc332609) If we encounter an invalid (e.g., zeroed) mapping, return an error and avoid a divide by zero. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26	libceph: avoid NULL kref_put when osd reset races with alloc_msg	Sage Weil
	(cherry picked from commit 9bd952615a42d7e2ce3fa2c632e808e804637a1a) The ceph_on_in_msg_alloc() method drops con->mutex while it allocates a message. If that races with a timeout that resends a zillion messages and resets the connection, and the ->alloc_msg() method returns a NULL message, it will call ceph_msg_put(NULL) and BUG. Fix by only calling put if msg is non-NULL. Fixes http://tracker.newdream.net/issues/3142 Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26	rbd: reset BACKOFF if unable to re-queue	Alex Elder
	(cherry picked from commit 588377d6199034c36d335e7df5818b731fea072c) If ceph_fault() is unable to queue work after a delay, it sets the BACKOFF connection flag so con_work() will attempt to do so. In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't result in newly-queued work, it simply ignores this condition and proceeds as if no backoff delay were desired. There are two problems with this--one of which is a bug. The first problem is simply that the intended behavior is to back off, and if we aren't able queue the work item to run after a delay we're not doing that. The only reason queue_delayed_work() won't queue work is if the provided work item is already queued. In the messenger, this means that con_work() is already scheduled to be run again. So if we simply set the BACKOFF flag again when this occurs, we know the next con_work() call will again attempt to hold off activity on the connection until after the delay. The second problem--the bug--is a leak of a reference count. If queue_delayed_work() returns 0 in con_work(), con->ops->put() drops the connection reference held on entry to con_work(). However, processing is (was) allowed to continue, and at the end of the function a second con->ops->put() is called. This patch fixes both problems. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26	libceph: only kunmap kmapped pages	Alex Elder
	(cherry picked from commit 5ce765a540f34d1e2005e1210f49f67fdf11e997) In write_partial_msg_pages(), pages need to be kmapped in order to perform a CRC-32c calculation on them. As an artifact of the way this code used to be structured, the kunmap() call was separated from the kmap() call and both were done conditionally. But the conditions under which the kmap() and kunmap() calls were made differed, so there was a chance a kunmap() call would be done on a page that had not been mapped. The symptom of this was tripping a BUG() in kunmap_high() when pkmap_count[nr] became 0. Reported-by: Bryan K. Wright <bryan@virginia.edu> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26	libceph: avoid truncation due to racing banners	Jim Schutt
	(cherry picked from commit 6d4221b53707486dfad3f5bfe568d2ce7f4c9863) Because the Ceph client messenger uses a non-blocking connect, it is possible for the sending of the client banner to race with the arrival of the banner sent by the peer. When ceph_sock_state_change() notices the connect has completed, it schedules work to process the socket via con_work(). During this time the peer is writing its banner, and arrival of the peer banner races with con_work(). If con_work() calls try_read() before the peer banner arrives, there is nothing for it to do, after which con_work() calls try_write() to send the client's banner. In this case Ceph's protocol negotiation can complete succesfully. The server-side messenger immediately sends its banner and addresses after accepting a connect request, before actually attempting to read or verify the banner from the client. As a result, it is possible for the banner from the server to arrive before con_work() calls try_read(). If that happens, try_read() will read the banner and prepare protocol negotiation info via prepare_write_connect(). prepare_write_connect() calls con_out_kvec_reset(), which discards the as-yet-unsent client banner. Next, con_work() calls try_write(), which sends the protocol negotiation info rather than the banner that the peer is expecting. The result is that the peer sees an invalid banner, and the client reports "negotiation failed". Fix this by moving con_out_kvec_reset() out of prepare_write_connect() to its callers at all locations except the one where the banner might still need to be sent. [elder@inktak.com: added note about server-side behavior] Signed-off-by: Jim Schutt <jaschut@sandia.gov> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26	libceph: delay debugfs initialization until we learn global_id	Sage Weil
	(cherry picked from commit d1c338a509cea5378df59629ad47382810c38623) The debugfs directory includes the cluster fsid and our unique global_id. We need to delay the initialization of the debug entry until we have learned both the fsid and our global_id from the monitor or else the second client can't create its debugfs entry and will fail (and multiple client instances aren't properly reflected in debugfs). Reported by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26	libceph: fix crypto key null deref, memory leak	Sylvain Munaut
	(cherry picked from commit f0666b1ac875ff32fe290219b150ec62eebbe10e) Avoid crashing if the crypto key payload was NULL, as when it was not correctly allocated and initialized. Also, avoid leaking it. Signed-off-by: Sylvain Munaut <tnt@246tNt.com> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>