linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2013-01-17	libceph: remove 'osdtimeout' option	Sage Weil
	(cherry picked from commit 83aff95eb9d60aff5497e9f44a2ae906b86d8e88) This would reset a connection with any OSD that had an outstanding request that was taking more than N seconds. The idea was that if the OSD was buggy, the client could compensate by resending the request. In reality, this only served to hide server bugs, and we haven't actually seen such a bug in quite a while. Moreover, the userspace client code never did this. More importantly, often the request is taking a long time because the OSD is trying to recover, or overloaded, and killing the connection and retrying would only make the situation worse by giving the OSD more work to do. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: avoid using freed osd in __kick_osd_requests()	Alex Elder
	(cherry picked from commit 685a7555ca69030739ddb57a47f0ea8ea80196a4) If an osd has no requests and no linger requests, __reset_osd() will just remove it with a call to __remove_osd(). That drops a reference to the osd, and therefore the osd may have been free by the time __reset_osd() returns. That function offers no indication this may have occurred, and as a result the osd will continue to be used even when it's no longer valid. Change__reset_osd() so it returns an error (ENODEV) when it deletes the osd being reset. And change __kick_osd_requests() so it returns immediately (before referencing osd again) if __reset_osd() returns any error. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: fix osdmap decode error paths	Sage Weil
	(cherry picked from commit 0ed7285e0001b960c888e5455ae982025210ed3d) Ensure that we set the err value correctly so that we do not pass a 0 value to ERR_PTR and confuse the calling code. (In particular, osd_client.c handle_map() will BUG(!newmap)). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: fix protocol feature mismatch failure path	Sage Weil
	(cherry picked from commit 0fa6ebc600bc8e830551aee47a0e929e818a1868) We should not set con->state to CLOSED here; that happens in ceph_fault() in the caller, where it first asserts that the state is not yet CLOSED. Avoids a BUG when the features don't match. Since the fail_protocol() has become a trivial wrapper, replace calls to it with direct calls to reset_connection(). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: WARN, don't BUG on unexpected connection states	Alex Elder
	(cherry picked from commit 122070a2ffc91f87fe8e8493eb0ac61986c5557c) A number of assertions in the ceph messenger are implemented with BUG_ON(), killing the system if connection's state doesn't match what's expected. At this point our state model is (evidently) not well understood enough for these assertions to trigger a BUG(). Convert all BUG_ON(con->state...) calls to be WARN_ON(con->state...) so we learn about these issues without killing the machine. We now recognize that a connection fault can occur due to a socket closure at any time, regardless of the state of the connection. So there is really nothing we can assert about the state of the connection at that point so eliminate that assertion. Reported-by: Ugis <ugis22@gmail.com> Tested-by: Ugis <ugis22@gmail.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: always reset osds when kicking	Alex Elder
	(cherry picked from commit e6d50f67a6b1a6252a616e6e629473b5c4277218) When ceph_osdc_handle_map() is called to process a new osd map, kick_requests() is called to ensure all affected requests are updated if necessary to reflect changes in the osd map. This happens in two cases: whenever an incremental map update is processed; and when a full map update (or the last one if there is more than one) gets processed. In the former case, the kick_requests() call is followed immediately by a call to reset_changed_osds() to ensure any connections to osds affected by the map change are reset. But for full map updates this isn't done. Both cases should be doing this osd reset. Rather than duplicating the reset_changed_osds() call, move it into the end of kick_requests(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: move linger requests sooner in kick_requests()	Alex Elder
	(cherry picked from commit ab60b16d3c31b9bd9fd5b39f97dc42c52a50b67d) The kick_requests() function is called by ceph_osdc_handle_map() when an osd map change has been indicated. Its purpose is to re-queue any request whose target osd is different from what it was when it was originally sent. It is structured as two loops, one for incomplete but registered requests, and a second for handling completed linger requests. As a special case, in the first loop if a request marked to linger has not yet completed, it is moved from the request list to the linger list. This is as a quick and dirty way to have the second loop handle sending the request along with all the other linger requests. Because of the way it's done now, however, this quick and dirty solution can result in these incomplete linger requests never getting re-sent as desired. The problem lies in the fact that the second loop only arranges for a linger request to be sent if it appears its target osd has changed. This is the proper handling for completed linger requests (it avoids issuing the same linger request twice to the same osd). But although the linger requests added to the list in the first loop may have been sent, they have not yet completed, so they need to be re-sent regardless of whether their target osd has changed. The first required fix is we need to avoid calling __map_request() on any incomplete linger request. Otherwise the subsequent __map_request() call in the second loop will find the target osd has not changed and will therefore not re-send the request. Second, we need to be sure that a sent but incomplete linger request gets re-sent. If the target osd is the same with the new osd map as it was when the request was originally sent, this won't happen. This can be fixed through careful handling when we move these requests from the request list to the linger list, by unregistering the request before it is registered as a linger request. This works because a side-effect of unregistering the request is to make the request's r_osd pointer be NULL, and that will ensure the second loop actually re-sends the linger request. Processing of such a request is done at that point, so continue with the next one once it's been moved. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: register request before unregister linger	Alex Elder
	(cherry picked from commit c89ce05e0c5a01a256100ac6a6019f276bdd1ca6) In kick_requests(), we need to register the request before we unregister the linger request. Otherwise the unregister will reset the request's osd pointer to NULL. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: don't use rb_init_node() in ceph_osdc_alloc_request()	Alex Elder
	(cherry picked from commit a978fa20fb657548561dddbfb605fe43654f0825) The red-black node in the ceph osd request structure is initialized in ceph_osdc_alloc_request() using rbd_init_node(). We do need to initialize this, because in __unregister_request() we call RB_EMPTY_NODE(), which expects the node it's checking to have been initialized. But rb_init_node() is apparently overkill, and may in fact be on its way out. So use RB_CLEAR_NODE() instead. For a little more background, see this commit: 4c199a93 rbtree: empty nodes have no color" Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: init event->node in ceph_osdc_create_event()	Alex Elder
	(cherry picked from commit 3ee5234df68d253c415ba4f2db72ad250d9c21a9) The red-black node node in the ceph osd event structure is not initialized in create_osdc_create_event(). Because this node can be the subject of a RB_EMPTY_NODE() call later on, we should ensure the node is initialized properly for that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: init osd->o_node in create_osd()	Alex Elder
	(cherry picked from commit f407731d12214e7686819018f3a1e9d7b6f83a02) The red-black node node in the ceph osd structure is not initialized in create_osd(). Because this node can be the subject of a RB_EMPTY_NODE() call later on, we should ensure the node is initialized properly for that. Add a call to RB_CLEAR_NODE() initialize it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: report connection fault with warning	Alex Elder
	(cherry picked from commit 28362986f8743124b3a0fda20a8ed3e80309cce1) When a connection's socket disconnects, or if there's a protocol error of some kind on the connection, a fault is signaled and the connection is reset (closed and reopened, basically). We currently get an error message on the log whenever this occurs. A ceph connection will attempt to reestablish a socket connection repeatedly if a fault occurs. This means that these error messages will get repeatedly added to the log, which is undesirable. Change the error message to be a warning, so they don't get logged by default. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	libceph: socket can close in any connection state	Alex Elder
	(cherry picked from commit 7bb21d68c535ad8be38e14a715632ae398b37ac1) A connection's socket can close for any reason, independent of the state of the connection (and without irrespective of the connection mutex). As a result, the connectino can be in pretty much any state at the time its socket is closed. Handle those other cases at the top of con_work(). Pull this whole block of code into a separate function to reduce the clutter. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	mac80211: use del_timer_sync for final sta cleanup timer deletion	Johannes Berg
	commit a56f992cdabc63f56b4b142885deebebf936ff76 upstream. This is a very old bug, but there's nothing that prevents the timer from running while the module is being removed when we only do del_timer() instead of del_timer_sync(). The timer should normally not be running at this point, but it's not clearly impossible (or we could just remove this.) Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	mac80211: fix station destruction in AP/mesh modes	Johannes Berg
	commit 97f97b1f5fe0878b35c8e314f98591771696321b upstream. Unfortunately, commit b22cfcfcae5b, intended to speed up roaming by avoiding the synchronize_rcu() broke AP/mesh modes as it moved some code into that work item that will still call into the driver at a time where it's no longer expected to handle this: after the AP or mesh has been stopped. To fix this problem remove the per-station work struct, maintain a station cleanup list instead and flush this list when stations are flushed. To keep this patch smaller for stable, do this when the stations are flushed (sta_info_flush()). This unfortunately brings back the original roaming delay; I'll fix that again in a separate patch. Also, Ben reported that the original commit could sometimes (with many interfaces) cause long delays when an interface is set down, due to blocking on flush_workqueue(). Since we now maintain the cleanup list, this particular change of the original patch can be reverted. Reported-by: Ben Greear <greearb@candelatech.com> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	mac80211: fix ibss scanning	Stanislaw Gruszka
	commit 34bcf71502413f8903ade93746f2d0f04b937a78 upstream. Do not scan on no-IBSS and disabled channels in IBSS mode. Doing this can trigger Microcode errors on iwlwifi and iwlegacy drivers. Also rename ieee80211_request_internal_scan() function since it is only used in IBSS mode and simplify calling it from ieee80211_sta_find_ibss(). This patch should address: https://bugzilla.redhat.com/show_bug.cgi?id=883414 https://bugzilla.kernel.org/show_bug.cgi?id=49411 Reported-by: Jesse Kahtava <jesse_kahtava@f-m.fm> Reported-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	SUNRPC: Ensure we release the socket write lock if the rpc_task exits early	Trond Myklebust
	commit 87ed50036b866db2ec2ba16b2a7aec4a2b0b7c39 upstream. If the rpc_task exits while holding the socket write lock before it has allocated an rpc slot, then the usual mechanism for releasing the write lock in xprt_release() is defeated. The problem occurs if the call to xprt_lock_write() initially fails, so that the rpc_task is put on the xprt->sending wait queue. If the task exits after being assigned the lock by __xprt_lock_write_func, but before it has retried the call to xprt_lock_and_alloc_slot(), then it calls xprt_release() while holding the write lock, but will immediately exit due to the test for task->tk_rqstp != NULL. Reported-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	SUNRPC: Ensure that we free the rpc_task after cleanups are done	Trond Myklebust
	commit c6567ed1402c55e19b012e66a8398baec2a726f3 upstream. This patch ensures that we free the rpc_task after the cleanup callbacks are done in order to avoid a deadlock problem that can be triggered if the callback needs to wait for another workqueue item to complete. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Weston Andros Adamson <dros@netapp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Bruce Fields <bfields@fieldses.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	SUNRPC: continue run over clients list on PipeFS event instead of break	Stanislav Kinsbursky
	commit cd6c5968582a273561464fe6b1e8cc8214be02df upstream. There are SUNRPC clients, which program doesn't have pipe_dir_name. These clients can be skipped on PipeFS events, because nothing have to be created or destroyed. But instead of breaking in case of such a client was found, search for suitable client over clients list have to be continued. Otherwise some clients could not be covered by PipeFS event handler. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	SUNRPC: Fix validity issues with rpc_pipefs sb->s_fs_info	Trond Myklebust
	commit 642fe4d00db56d65060ce2fd4c105884414acb16 upstream. rpc_kill_sb() must defer calling put_net() until after the notifier has been called, since most (all?) of the notifier callbacks assume that sb->s_fs_info points to a valid net namespace. It also must not call put_net() if the call to rpc_fill_super was unsuccessful. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=48421 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17	mac80211: introduce IEEE80211_HW_TEARDOWN_AGGR_ON_BAR_FAIL	Stanislaw Gruszka
	commit 5b632fe85ec82e5c43740b52e74c66df50a37db3 upstream. Commit f0425beda4d404a6e751439b562100b902ba9c98 "mac80211: retry sending failed BAR frames later instead of tearing down aggr" caused regression on rt2x00 hardware (connection hangs). This regression was fixed by commit be03d4a45c09ee5100d3aaaedd087f19bc20d01 "rt2x00: Don't let mac80211 send a BAR when an AMPDU subframe fails". But the latter commit caused yet another problem reported in https://bugzilla.kernel.org/show_bug.cgi?id=42828#c22 After long discussion in this thread: http://mid.gmane.org/20121018075615.GA18212@redhat.com and testing various alternative solutions, which failed on one or other setup, we have no other good fix for the issues like just revert both mentioned earlier commits. To do not affect other hardware which benefit from commit f0425beda4d404a6e751439b562100b902ba9c98, instead of reverting it, introduce flag that when used will restore mac80211 behaviour before the commit. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> [replaced link with mid.gmane.org that has message-id] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	Revert "Bluetooth: Fix possible deadlock in SCO code"	Gustavo Padovan
	commit 0b27a4b97cb1874503c78453c0903df53c0c86b2 upstream. This reverts commit 269c4845d5b3627b95b1934107251bacbe99bb68. The commit was causing dead locks and NULL dereferences in the sco code: [28084.104013] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u:0H:7] [28084.104021] Modules linked in: btusb bluetooth <snip [last unloaded: bluetooth] ... [28084.104021] [<c160246d>] _raw_spin_lock+0xd/0x10 [28084.104021] [<f920e708>] sco_conn_del+0x58/0x1b0 [bluetooth] [28084.104021] [<f920f1a9>] sco_connect_cfm+0xb9/0x2b0 [bluetooth] [28084.104021] [<f91ef289>] hci_sync_conn_complete_evt.isra.94+0x1c9/0x260 [bluetooth] [28084.104021] [<f91f1a8d>] hci_event_packet+0x74d/0x2b40 [bluetooth] [28084.104021] [<c1501abd>] ? __kfree_skb+0x3d/0x90 [28084.104021] [<c1501b46>] ? kfree_skb+0x36/0x90 [28084.104021] [<f91fcb4e>] ? hci_send_to_monitor+0x10e/0x190 [bluetooth] [28084.104021] [<f91fcb4e>] ? hci_send_to_monitor+0x10e/0x190 [bluetooth] Reported-by: Chan-yeol Park <chanyeol.park@gmail.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	Bluetooth: cancel power_on work when unregistering the device	Gustavo Padovan
	commit b9b5ef188e5a2222cfc16ef62a4703080750b451 upstream. We need to cancel the hci_power_on work in order to avoid it run when we try to free the hdev. [ 1434.201149] ------------[ cut here ]------------ [ 1434.204998] WARNING: at lib/debugobjects.c:261 debug_print_object+0x8e/0xb0() [ 1434.208324] ODEBUG: free active (active state 0) object type: work_struct hint: hci _power_on+0x0/0x90 [ 1434.210386] Pid: 8564, comm: trinity-child25 Tainted: G W 3.7.0-rc5-next- 20121112-sasha-00018-g2f4ce0e #127 [ 1434.210760] Call Trace: [ 1434.210760] [<ffffffff819f3d6e>] ? debug_print_object+0x8e/0xb0 [ 1434.210760] [<ffffffff8110b887>] warn_slowpath_common+0x87/0xb0 [ 1434.210760] [<ffffffff8110b911>] warn_slowpath_fmt+0x41/0x50 [ 1434.210760] [<ffffffff819f3d6e>] debug_print_object+0x8e/0xb0 [ 1434.210760] [<ffffffff8376b750>] ? hci_dev_open+0x310/0x310 [ 1434.210760] [<ffffffff83bf94e5>] ? _raw_spin_unlock_irqrestore+0x55/0xa0 [ 1434.210760] [<ffffffff819f3ee5>] __debug_check_no_obj_freed+0xa5/0x230 [ 1434.210760] [<ffffffff83785db0>] ? bt_host_release+0x10/0x20 [ 1434.210760] [<ffffffff819f4d15>] debug_check_no_obj_freed+0x15/0x20 [ 1434.210760] [<ffffffff8125eee7>] kfree+0x227/0x330 [ 1434.210760] [<ffffffff83785db0>] bt_host_release+0x10/0x20 [ 1434.210760] [<ffffffff81e539e5>] device_release+0x65/0xc0 [ 1434.210760] [<ffffffff819d3975>] kobject_cleanup+0x145/0x190 [ 1434.210760] [<ffffffff819d39cd>] kobject_release+0xd/0x10 [ 1434.210760] [<ffffffff819d33cc>] kobject_put+0x4c/0x60 [ 1434.210760] [<ffffffff81e548b2>] put_device+0x12/0x20 [ 1434.210760] [<ffffffff8376a334>] hci_free_dev+0x24/0x30 [ 1434.210760] [<ffffffff82fd8fe1>] vhci_release+0x31/0x60 [ 1434.210760] [<ffffffff8127be12>] __fput+0x122/0x250 [ 1434.210760] [<ffffffff811cab0d>] ? rcu_user_exit+0x9d/0xd0 [ 1434.210760] [<ffffffff8127bf49>] ____fput+0x9/0x10 [ 1434.210760] [<ffffffff81133402>] task_work_run+0xb2/0xf0 [ 1434.210760] [<ffffffff8106cfa7>] do_notify_resume+0x77/0xa0 [ 1434.210760] [<ffffffff83bfb0ea>] int_signal+0x12/0x17 [ 1434.210760] ---[ end trace a6d57fefbc8a8cc7 ]--- Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	Bluetooth: Add missing lock nesting notation	Gustavo Padovan
	commit dc2a0e20fbc85a71c63aa4330b496fda33f6bf80 upstream. This patch fixes the following report, it happens when accepting rfcomm connections: [ 228.165378] ============================================= [ 228.165378] [ INFO: possible recursive locking detected ] [ 228.165378] 3.7.0-rc1-00536-gc1d5dc4 #120 Tainted: G W [ 228.165378] --------------------------------------------- [ 228.165378] bluetoothd/1341 is trying to acquire lock: [ 228.165378] (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+...}, at: [<ffffffffa0000aa0>] bt_accept_dequeue+0xa0/0x180 [bluetooth] [ 228.165378] [ 228.165378] but task is already holding lock: [ 228.165378] (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+...}, at: [<ffffffffa0205118>] rfcomm_sock_accept+0x58/0x2d0 [rfcomm] [ 228.165378] [ 228.165378] other info that might help us debug this: [ 228.165378] Possible unsafe locking scenario: [ 228.165378] [ 228.165378] CPU0 [ 228.165378] ---- [ 228.165378] lock(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM); [ 228.165378] lock(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM); [ 228.165378] [ 228.165378] * DEADLOCK * [ 228.165378] [ 228.165378] May be due to missing lock nesting notation Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	sctp: jsctp_sf_eat_sack: fix jprobes function signature mismatch	Daniel Borkmann
	[ Upstream commit 4cb9d6eaf85ecdd266a9a5c6d825c56ca9eefc14 ] Commit 24cb81a6a (sctp: Push struct net down into all of the state machine functions) introduced the net structure into all state machine functions, but jsctp_sf_eat_sack was not updated, hence when SCTP association probing is enabled in the kernel, any simple SCTP client/server program from userspace will panic the kernel. Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	net: sched: integer overflow fix	Stefan Hasko
	[ Upstream commit d2fe85da52e89b8012ffad010ef352a964725d5f ] Fixed integer overflow in function htb_dequeue Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	mac802154: fix NOHZ local_softirq_pending 08 warning	Alexander Aring
	[ Upstream commit 5ff3fec6d3fc848753c2fa30b18607358f89a202 ] When using nanosleep() in an userspace application we get a ratelimit warning NOHZ: local_softirq_pending 08 for 10 times. This patch replaces netif_rx() with netif_rx_ni() which has to be used from process/softirq context. The process/softirq context will be called from fakelb driver. See linux-kernel commit 481a819 for similar fix. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	ipv6: Change skb->data before using icmpv6_notify() to propagate redirect	Duan Jiong
	[ Upstream commit 093d04d42fa094f6740bb188f0ad0c215ff61e2c ] In function ndisc_redirect_rcv(), the skb->data points to the transport header, but function icmpv6_notify() need the skb->data points to the inner IP packet. So before using icmpv6_notify() to propagate redirect, change skb->data to point the inner IP packet that triggered the sending of the Redirect, and introduce struct rd_msg to make it easy. Signed-off-by: Duan Jiong <djduanjiong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock	Christoph Paasch
	[ Upstream commit e337e24d6624e74a558aa69071e112a65f7b5758 ] If in either of the above functions inet_csk_route_child_sock() or __inet_inherit_port() fails, the newsk will not be freed: unreferenced object 0xffff88022e8a92c0 (size 1592): comm "softirq", pid 0, jiffies 4294946244 (age 726.160s) hex dump (first 32 bytes): 0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00 ................ 02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8153d190>] kmemleak_alloc+0x21/0x3e [<ffffffff810ab3e7>] kmem_cache_alloc+0xb5/0xc5 [<ffffffff8149b65b>] sk_prot_alloc.isra.53+0x2b/0xcd [<ffffffff8149b784>] sk_clone_lock+0x16/0x21e [<ffffffff814d711a>] inet_csk_clone_lock+0x10/0x7b [<ffffffff814ebbc3>] tcp_create_openreq_child+0x21/0x481 [<ffffffff814e8fa5>] tcp_v4_syn_recv_sock+0x3a/0x23b [<ffffffff814ec5ba>] tcp_check_req+0x29f/0x416 [<ffffffff814e8e10>] tcp_v4_do_rcv+0x161/0x2bc [<ffffffff814eb917>] tcp_v4_rcv+0x6c9/0x701 [<ffffffff814cea9f>] ip_local_deliver_finish+0x70/0xc4 [<ffffffff814cec20>] ip_local_deliver+0x4e/0x7f [<ffffffff814ce9f8>] ip_rcv_finish+0x1fc/0x233 [<ffffffff814cee68>] ip_rcv+0x217/0x267 [<ffffffff814a7bbe>] __netif_receive_skb+0x49e/0x553 [<ffffffff814a7cc3>] netif_receive_skb+0x50/0x82 This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus a single sock_put() is not enough to free the memory. Additionally, things like xfrm, memcg, cookie_values,... may have been initialized. We have to free them properly. This is fixed by forcing a call to tcp_done(), ending up in inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary, because it ends up doing all the cleanup on xfrm, memcg, cookie_values, xfrm,... Before calling tcp_done, we have to set the socket to SOCK_DEAD, to force it entering inet_csk_destroy_sock. To avoid the warning in inet_csk_destroy_sock, inet_num has to be set to 0. As inet_csk_destroy_sock does a dec on orphan_count, we first have to increase it. Calling tcp_done() allows us to remove the calls to tcp_clear_xmit_timer() and tcp_cleanup_congestion_control(). A similar approach is taken for dccp by calling dccp_done(). This is in the kernel since 093d282321 (tproxy: fix hash locking issue when using port redirection in __inet_inherit_port()), thus since version >= 2.6.37. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	batman-adv: fix random jitter calculation	Akinobu Mita
	[ Upstream commit 143cdd8f33909ff5a153e3f02048738c5964ba26 ] batadv_iv_ogm_emit_send_time() attempts to calculates a random integer in the range of 'orig_interval +- BATADV_JITTER' by the below lines. msecs = atomic_read(&bat_priv->orig_interval) - BATADV_JITTER; msecs += (random32() % 2 * BATADV_JITTER); But it actually gets 'orig_interval' or 'orig_interval - BATADV_JITTER' because '%' and '*' have same precedence and associativity is left-to-right. This adds the parentheses at the appropriate position so that it matches original intension. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Antonio Quartulli <ordex@autistici.org> Cc: Marek Lindner <lindner_marek@yahoo.de> Cc: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Cc: Antonio Quartulli <ordex@autistici.org> Cc: b.a.t.m.a.n@lists.open-mesh.org Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11	virtio: 9p: correctly pass physical address to userspace for high pages	Will Deacon
	commit b9cdc88df8e63e81c723b82c286fc97f5d0dc325 upstream. When using a virtio transport, the 9p net device may pass the physical address of a kernel buffer to userspace via a scatterlist inside a virtqueue. If the kernel buffer is mapped outside of the linear mapping (e.g. highmem), then virt_to_page will return a bogus value and we will populate the scatterlist with junk. This patch uses kmap_to_page when populating the page array for a kernel buffer. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-10	ipv4: ip_check_defrag must not modify skb before unsharing	Johannes Berg
	ip_check_defrag() might be called from af_packet within the RX path where shared SKBs are used, so it must not modify the input SKB before it has unshared it for defragmentation. Use skb_copy_bits() to get the IP header and only pull in everything later. The same is true for the other caller in macvlan as it is called from dev->rx_handler which can also get a shared SKB. Reported-by: Eric Leblond <eric@regit.org> Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09	inet_diag: validate port comparison byte code to prevent unsafe reads	Neal Cardwell
	Add logic to verify that a port comparison byte code operation actually has the second inet_diag_bc_op from which we read the port for such operations. Previously the code blindly referenced op[1] without first checking whether a second inet_diag_bc_op struct could fit there. So a malicious user could make the kernel read 4 bytes beyond the end of the bytecode array by claiming to have a whole port comparison byte code (2 inet_diag_bc_op structs) when in fact the bytecode was not long enough to hold both. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09	inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run()	Neal Cardwell
	Add logic to check the address family of the user-supplied conditional and the address family of the connection entry. We now do not do prefix matching of addresses from different address families (AF_INET vs AF_INET6), except for the previously existing support for having an IPv4 prefix match an IPv4-mapped IPv6 address (which this commit maintains as-is). This change is needed for two reasons: (1) The addresses are different lengths, so comparing a 128-bit IPv6 prefix match condition to a 32-bit IPv4 connection address can cause us to unwittingly walk off the end of the IPv4 address and read garbage or oops. (2) The IPv4 and IPv6 address spaces are semantically distinct, so a simple bit-wise comparison of the prefixes is not meaningful, and would lead to bogus results (except for the IPv4-mapped IPv6 case, which this commit maintains). Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09	inet_diag: validate byte code to prevent oops in inet_diag_bc_run()	Neal Cardwell
	Add logic to validate INET_DIAG_BC_S_COND and INET_DIAG_BC_D_COND operations. Previously we did not validate the inet_diag_hostcond, address family, address length, and prefix length. So a malicious user could make the kernel read beyond the end of the bytecode array by claiming to have a whole inet_diag_hostcond when the bytecode was not long enough to contain a whole inet_diag_hostcond of the given address family. Or they could make the kernel read up to about 27 bytes beyond the end of a connection address by passing a prefix length that exceeded the length of addresses of the given family. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09	inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state	Neal Cardwell
	Fix inet_diag to be aware of the fact that AF_INET6 TCP connections instantiated for IPv4 traffic and in the SYN-RECV state were actually created with inet_reqsk_alloc(), instead of inet6_reqsk_alloc(). This means that for such connections inet6_rsk(req) returns a pointer to a random spot in memory up to roughly 64KB beyond the end of the request_sock. With this bug, for a server using AF_INET6 TCP sockets and serving IPv4 traffic, an inet_diag user like `ss state SYN-RECV` would lead to inet_diag_fill_req() causing an oops or the export to user space of 16 bytes of kernel memory as a garbage IPv6 address, depending on where the garbage inet6_rsk(req) pointed. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-07	net: gro: fix possible panic in skb_gro_receive()	Eric Dumazet
	commit 2e71a6f8084e (net: gro: selective flush of packets) added a bug for skbs using frag_list. This part of the GRO stack is rarely used, as it needs skb not using a page fragment for their skb->head. Most drivers do use a page fragment, but some of them use GFP_KERNEL allocations for the initial fill of their RX ring buffer. napi_gro_flush() overwrite skb->prev that was used for these skb to point to the last skb in frag_list. Fix this using a separate field in struct napi_gro_cb to point to the last fragment. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-07	tcp: bug fix Fast Open client retransmission	Yuchung Cheng
	If SYN-ACK partially acks SYN-data, the client retransmits the remaining data by tcp_retransmit_skb(). This increments lost recovery state variables like tp->retrans_out in Open state. If loss recovery happens before the retransmission is acked, it triggers the WARN_ON check in tcp_fastretrans_alert(). For example: the client sends SYN-data, gets SYN-ACK acking only ISN, retransmits data, sends another 4 data packets and get 3 dupacks. Since the retransmission is not caused by network drop it should not update the recovery state variables. Further the server may return a smaller MSS than the cached MSS used for SYN-data, so the retranmission needs a loop. Otherwise some data will not be retransmitted until timeout or other loss recovery events. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-01	tcp: fix crashes in do_tcp_sendpages()	Eric Dumazet
	Recent network changes allowed high order pages being used for skb fragments. This uncovered a bug in do_tcp_sendpages() which was assuming its caller provided an array of order-0 page pointers. We only have to deal with a single page in this function, and its order is irrelevant. Reported-by: Willy Tarreau <w@1wt.eu> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30	Merge branch 'master' of ↵	John W. Linville
	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2012-11-28	Merge branch 'fixes' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Two small openswitch fixes from Jesse Gross. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28	Merge branch 'master' of git://1984.lsi.us.es/nf	David S. Miller
	An interface name overflow fix in netfilter via Pablo Neira Ayuso. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28	irda: irttp: fix memory leak in irttp_open_tsap() error path	Tommi Rantala
	Cleanup the memory we allocated earlier in irttp_open_tsap() when we hit this error path. The leak goes back to at least 1da177e4 ("Linux-2.6.12-rc2"). Discovered with Trinity (the syscall fuzzer). Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28	sctp: Error in calculation of RTTvar	Schoch Christian
	The calculation of RTTVAR involves the subtraction of two unsigned numbers which may causes rollover and results in very high values of RTTVAR when RTT > SRTT. With this patch it is possible to set RTOmin = 1 to get the minimum of RTO at 4 times the clock granularity. Change Notes: v2) *Replaced abs() by abs64() and long by __s64, changed patch description. Signed-off-by: Christian Schoch <e0326715@student.tuwien.ac.at> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Sridhar Samudrala <sri@us.ibm.com> CC: Neil Horman <nhorman@tuxdriver.com> CC: linux-sctp@vger.kernel.org Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28	sctp: fix -ENOMEM result with invalid user space pointer in sendto() syscall	Tommi Rantala
	Consider the following program, that sets the second argument to the sendto() syscall incorrectly: #include <string.h> #include <arpa/inet.h> #include <sys/socket.h> int main(void) { int fd; struct sockaddr_in sa; fd = socket(AF_INET, SOCK_STREAM, 132 /IPPROTO_SCTP/); if (fd < 0) return 1; memset(&sa, 0, sizeof(sa)); sa.sin_family = AF_INET; sa.sin_addr.s_addr = inet_addr("127.0.0.1"); sa.sin_port = htons(11111); sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa)); return 0; } We get -ENOMEM: $ strace -e sendto ./demo sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ENOMEM (Cannot allocate memory) Propagate the error code from sctp_user_addto_chunk(), so that we will tell user space what actually went wrong: $ strace -e sendto ./demo sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EFAULT (Bad address) Noticed while running Trinity (the syscall fuzzer). Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28	sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space ↵	Tommi Rantala
	fails Trinity (the syscall fuzzer) discovered a memory leak in SCTP, reproducible e.g. with the sendto() syscall by passing invalid user space pointer in the second argument: #include <string.h> #include <arpa/inet.h> #include <sys/socket.h> int main(void) { int fd; struct sockaddr_in sa; fd = socket(AF_INET, SOCK_STREAM, 132 /IPPROTO_SCTP/); if (fd < 0) return 1; memset(&sa, 0, sizeof(sa)); sa.sin_family = AF_INET; sa.sin_addr.s_addr = inet_addr("127.0.0.1"); sa.sin_port = htons(11111); sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa)); return 0; } As far as I can tell, the leak has been around since ~2003. Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-26	net: ipmr: limit MRT_TABLE identifiers	Eric Dumazet
	Name of pimreg devices are built from following format : char name[IFNAMSIZ]; // IFNAMSIZ == 16 sprintf(name, "pimreg%u", mrt->id); We must therefore limit mrt->id to 9 decimal digits or risk a buffer overflow and a crash. Restrict table identifiers in [0 ... 999999999] interval. Reported-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-26	ipv4: avoid passing NULL to inet_putpeer() in icmpv4_xrlim_allow()	Neal Cardwell
	inet_getpeer_v4() can return NULL under OOM conditions, and while inet_peer_xrlim_allow() is OK with a NULL peer, inet_putpeer() will crash. This code path now uses the same idiom as the others from: 1d861aa4b3fb08822055345f480850205ffe6170 ("inet: Minimize use of cached route inetpeer."). Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-26	can: bcm: initialize ifindex for timeouts without previous frame reception	Oliver Hartkopp
	Set in the rx_ifindex to pass the correct interface index in the case of a message timeout detection. Usually the rx_ifindex value is set at receive time. But when no CAN frame has been received the RX_TIMEOUT notification did not contain a valid value. Cc: linux-stable <stable@vger.kernel.org> Reported-by: Andre Naujoks <nautsch2@googlemail.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-11-26	mac80211: fix remain-on-channel (non-)cancelling	Johannes Berg
	Felix Liao reported that when an interface is set DOWN while another interface is executing a ROC, the warning in ieee80211_start_next_roc() (about the first item on the list having started already) triggers. This is because ieee80211_roc_purge() calls it even if it never actually changed the list of ROC items. To fix this, simply remove the function call. If it is needed then it will be done by the ieee80211_sw_roc_work() function when the ROC item that is being removed while active is cleaned up. Cc: stable@vger.kernel.org Reported-by: Felix Liao <Felix.Liao@watchguard.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>