aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2010-09-10block: Range check cpu in blk_cpu_to_groupBrian King
While testing CPU DLPAR, the following problem was discovered. We were DLPAR removing the first CPU, which in this case was logical CPUs 0-3. CPUs 0-2 were already marked offline and we were in the process of offlining CPU 3. After marking the CPU inactive and offline in cpu_disable, but before the cpu was completely idle (cpu_die), we ended up in __make_request on CPU 3. There we looked at the topology map to see which CPU to complete the I/O on and found no CPUs in the cpu_sibling_map. This resulted in the block layer setting the completion cpu to be NR_CPUS, which then caused an oops when we tried to complete the I/O. Fix this by sanity checking the value we return from blk_cpu_to_group to be a valid cpu value. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-30scatterlist: prevent invalid free when alloc failsJeffrey Carlyle
When alloc fails, free_table is being called. Depending on the number of bytes requested, we determine if we are going to call _get_free_page() or kmalloc(). When alloc fails, our math is wrong (due to sg_size - 1), and the last buffer is wrongfully assumed to have been allocated by kmalloc. Hence, kfree gets called and a panic occurs. Signed-off-by: Jeffrey Carlyle <jeff.carlyle@motorola.com> Signed-off-by: Olusanya Soyannwo <c23746@motorola.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-28writeback: Fix lost wake-up shutting down writeback threadJ. Bruce Fields
Setting the task state here may cause us to miss the wake up from kthread_stop(), so we need to recheck kthread_should_stop() or risk sleeping forever in the following schedule(). Symptom was an indefinite hang on an NFSv4 mount. (NFSv4 may create multiple mounts in a temporary namespace while traversing the mount path, and since the temporary namespace is immediately destroyed, it may end up destroying a mount very soon after it was created, possibly making this race more likely.) INFO: task mount.nfs4:4314 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mount.nfs4 D 0000000000000000 2880 4314 4313 0x00000000 ffff88001ed6da28 0000000000000046 ffff88001ed6dfd8 ffff88001ed6dfd8 ffff88001ed6c000 ffff88001ed6c000 ffff88001ed6c000 ffff88001e5003a0 ffff88001ed6dfd8 ffff88001e5003a8 ffff88001ed6c000 ffff88001ed6dfd8 Call Trace: [<ffffffff8196090d>] schedule_timeout+0x1cd/0x2e0 [<ffffffff8106a31c>] ? mark_held_locks+0x6c/0xa0 [<ffffffff819639a0>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff8106a5fd>] ? trace_hardirqs_on_caller+0x14d/0x190 [<ffffffff819671fe>] ? sub_preempt_count+0xe/0xd0 [<ffffffff8195fc80>] wait_for_common+0x120/0x190 [<ffffffff81033c70>] ? default_wake_function+0x0/0x20 [<ffffffff8195fdcd>] wait_for_completion+0x1d/0x20 [<ffffffff810595fa>] kthread_stop+0x4a/0x150 [<ffffffff81061a60>] ? thaw_process+0x70/0x80 [<ffffffff810cc68a>] bdi_unregister+0x10a/0x1a0 [<ffffffff81229dc9>] nfs_put_super+0x19/0x20 [<ffffffff810ee8c4>] generic_shutdown_super+0x54/0xe0 [<ffffffff810ee9b6>] kill_anon_super+0x16/0x60 [<ffffffff8122d3b9>] nfs4_kill_super+0x39/0x90 [<ffffffff810eda45>] deactivate_locked_super+0x45/0x60 [<ffffffff810edfb9>] deactivate_super+0x49/0x70 [<ffffffff81108294>] mntput_no_expire+0x84/0xe0 [<ffffffff811084ef>] release_mounts+0x9f/0xc0 [<ffffffff81108575>] put_mnt_ns+0x65/0x80 [<ffffffff8122cc56>] nfs_follow_remote_path+0x1e6/0x420 [<ffffffff8122cfbf>] nfs4_try_mount+0x6f/0xd0 [<ffffffff8122d0c2>] nfs4_get_sb+0xa2/0x360 [<ffffffff810edcb8>] vfs_kern_mount+0x88/0x1f0 [<ffffffff810ede92>] do_kern_mount+0x52/0x130 [<ffffffff81963d9a>] ? _lock_kernel+0x6a/0x170 [<ffffffff81108e9e>] do_mount+0x26e/0x7f0 [<ffffffff81106b3a>] ? copy_mount_options+0xea/0x190 [<ffffffff811094b8>] sys_mount+0x98/0xf0 [<ffffffff810024d8>] system_call_fastpath+0x16/0x1b 1 lock held by mount.nfs4/4314: #0: (&type->s_umount_key#24){+.+...}, at: [<ffffffff810edfb1>] deactivate_super+0x41/0x70 Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Acked-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2010-08-27writeback: do not lose wakeup events when forking bdi threadsArtem Bityutskiy
This patch fixes the following issue: INFO: task mount.nfs4:1120 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mount.nfs4 D 00000000fffc6a21 0 1120 1119 0x00000000 ffff880235643948 0000000000000046 ffffffff00000000 ffffffff00000000 ffff880235643fd8 ffff880235314760 00000000001d44c0 ffff880235643fd8 00000000001d44c0 00000000001d44c0 00000000001d44c0 00000000001d44c0 Call Trace: [<ffffffff813bc747>] schedule_timeout+0x34/0xf1 [<ffffffff813bc530>] ? wait_for_common+0x3f/0x130 [<ffffffff8106b50b>] ? trace_hardirqs_on+0xd/0xf [<ffffffff813bc5c3>] wait_for_common+0xd2/0x130 [<ffffffff8104159c>] ? default_wake_function+0x0/0xf [<ffffffff813beaa0>] ? _raw_spin_unlock+0x26/0x2a [<ffffffff813bc6bb>] wait_for_completion+0x18/0x1a [<ffffffff81101a03>] sync_inodes_sb+0xca/0x1bc [<ffffffff811056a6>] __sync_filesystem+0x47/0x7e [<ffffffff81105798>] sync_filesystem+0x47/0x4b [<ffffffff810e7ffd>] generic_shutdown_super+0x22/0xd2 [<ffffffff810e80f8>] kill_anon_super+0x11/0x4f [<ffffffffa00d06d7>] nfs4_kill_super+0x3f/0x72 [nfs] [<ffffffff810e7b68>] deactivate_locked_super+0x21/0x41 [<ffffffff810e7fd6>] deactivate_super+0x40/0x45 [<ffffffff810fc66c>] mntput_no_expire+0xb8/0xed [<ffffffff810fc73b>] release_mounts+0x9a/0xb0 [<ffffffff810fc7bb>] put_mnt_ns+0x6a/0x7b [<ffffffffa00d0fb2>] nfs_follow_remote_path+0x19a/0x296 [nfs] [<ffffffffa00d11ca>] nfs4_try_mount+0x75/0xaf [nfs] [<ffffffffa00d1790>] nfs4_get_sb+0x276/0x2ff [nfs] [<ffffffff810e7dba>] vfs_kern_mount+0xb8/0x196 [<ffffffff810e7ef6>] do_kern_mount+0x48/0xe8 [<ffffffff810fdf68>] do_mount+0x771/0x7e8 [<ffffffff810fe062>] sys_mount+0x83/0xbd [<ffffffff810089c2>] system_call_fastpath+0x16/0x1b The reason of this hang was a race condition: when the flusher thread is forking a bdi thread, we use 'kthread_run()', so we run it _before_ we make it visible in 'bdi->wb.task'. The bdi thread runs, does all works, and goes sleep. 'bdi->wb.task' is still NULL. And this is a dangerous time window. If at this time someone queues a work for this bdi, he does not see the bdi thread and wakes up the forker thread instead! But the forker has already forked this bdi thread, but just did not make it visible yet! The result is that we lose the wake up event for this bdi thread and the NFS4 code waits forever. To fix the problem, we should use 'ktrhead_create()' for creating bdi threads, then make them visible in 'bdi->wb.task', and only after this wake them up. This is exactly what this patch does. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-25cciss: fix reporting of max queue depth since initStephen M. Cameron
The ioctl path and the scsi tape path were not accounting for their additions to the queue depth. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23block: switch s390 tape_block and mg_disk to elevator_change()Jens Axboe
Now that we have this API, switch the two in-kernel users to it. Resolves an oops introduced by commit 1abec4fdbb142e3ccb6ce99832fae42129134a96. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23block: add function call to switch the IO scheduler from a driverJens Axboe
Currently drivers must do an elevator_exit() + elevator_init() to switch IO schedulers. There are a few problems with this: - Since commit 1abec4fdbb142e3ccb6ce99832fae42129134a96, elevator_init() requires a zeroed out q->elevator pointer. The two existing in-kernel users don't do that. - It will only work at initialization time, since using the above two-staged construct does not properly quisce the queue. So add elevator_change() which takes care of this, and convert the elv_iosched_store() sysfs interface to use this helper as well. Reported-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Reported-by: Kevin Vigor <kevin@vigor.nu> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23fs/bio-integrity.c: return -ENOMEM on kmalloc failureAndrew Morton
Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23bio-integrity.c: remove dependency on __GFP_NOFAILDavid Rientjes
The kmalloc() in bio_integrity_prep() is failable, so remove __GFP_NOFAIL from its mask. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23BLOCK: fix bio.bi_rw handlingJiri Slaby
Return of the bi_rw tests is no longer bool after commit 74450be1. But results of such tests are stored in bools. This doesn't fit in there for some compilers (gcc 4.5 here), so either use !! magic to get real bools or use ulong where the result is assigned somewhere. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23block: put dev->kobj in blk_register_queue fail pathXiaotian Feng
kernel needs to kobject_put on dev->kobj if elv_register_queue fails. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: David Teigland <teigland@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23cciss: handle allocation failureDan Carpenter
If kmalloc() fails then cleanup and return failure (-1). Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23cfq-iosched: Documentation help for new tunablesVivek Goyal
Some documentation to provide help with tunables. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23cfq-iosched: blktrace print per slice sector statsVivek Goyal
o Divyesh had gotten rid of this code in the past. I want to re-introduce it back as it helps me a lot during debugging. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Divyesh Shah <dpshah@google.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23cfq-iosched: Implement tunable group_idleVivek Goyal
o Implement a new tunable group_idle, which allows idling on the group instead of a cfq queue. Hence one can set slice_idle = 0 and not idle on the individual queues but idle on the group. This way on fast storage we can get fairness between groups at the same time overall throughput improves. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23cfq-iosched: Do group share accounting in IOPS when slice_idle=0Vivek Goyal
o Implement another CFQ mode where we charge group in terms of number of requests dispatched instead of measuring the time. Measuring in terms of time is not possible when we are driving deeper queue depths and there are requests from multiple cfq queues in the request queue. o This mode currently gets activated if one sets slice_idle=0 and associated disk supports NCQ. Again the idea is that on an NCQ disk with idling disabled most of the queues will dispatch 1 or more requests and then cfq queue expiry happens and we don't have a way to measure time. So start providing fairness in terms of IOPS. o Currently IOPS mode works only with cfq group scheduling. CFQ is following different scheduling algorithms for queue and group scheduling. These IOPS stats are used only for group scheduling hence in non-croup mode nothing should change. o For CFQ group scheduling one can disable slice idling so that we don't idle on queue and drive deeper request queue depths (achieving better throughput), at the same time group idle is enabled so one should get service differentiation among groups. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23cfq-iosched: Do not idle if slice_idle=0Vivek Goyal
Do not idle either on cfq queue or service tree if slice_idle=0. User does not want any queue or service tree idling. Currently even if slice_idle=0, we were waiting for request to finish before expiring the queue and that can lead to lower queue depths. Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23cciss: disable doorbell reset on reset_devicesStephen M. Cameron
The doorbell reset initially appears to work correctly, the controller resets, comes up, some i/o can even be done, but on at least some Smart Arrays in some servers, it eventually causes a subsequent controller lockup due to some kind of PCIe error, and kdump can end up leaving the root filesystem in an unbootable state. For this reason, until the problem is fixed, or at least isolated to certain hardware enough to be avoided, the doorbell reset should not be used at all. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-23blkio: Fix return code for mkdir callsCiju Rajan K
If the cgroup hierarchy for blkio control groups is deeper than two levels, kernel should not allow the creation of further levels. mkdir system call does not except EINVAL as a return value. This patch replaces EINVAL with more appropriate EPERM Signed-off-by: Ciju Rajan K <ciju@linux.vnet.ibm.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-22Merge branch 'radix-tree' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev * 'radix-tree' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev: radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags radix-tree: clear all tags in radix_tree_node_rcu_free
2010-08-22Linux 2.6.36-rc2v2.6.36-rc2Linus Torvalds
2010-08-23radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tagsDave Chinner
Commit ebf8aa44beed48cd17893a83d92a4403e5f9d9e2 ("radix-tree: omplement function radix_tree_range_tag_if_tagged") does not safely set tags on on intermediate tree nodes. The code walks down the tree setting tags before it has fully resolved the path to the leaf under the assumption there will be a leaf slot with the tag set in the range it is searching. Unfortunately, this is not a valid assumption - we can abort after setting a tag on an intermediate node if we overrun the number of tags we are allowed to set in a batch, or stop scanning because we we have passed the last scan index before we reach a leaf slot with the tag we are searching for set. As a result, we can leave the function with tags set on intemediate nodes which can be tripped over later by tag-based lookups. The result of these stale tags is that lookup may end prematurely or livelock because the lookup cannot make progress. The fix for the problem involves reocrding the traversal path we take to the leaf nodes, and only propagating the tags back up the tree once the tag is set in the leaf node slot. We are already recording the path for efficient traversal, so there is no additional overhead to do the intermediately node tag setting in this manner. This fixes a radix tree lookup livelock triggered by the new writeback sync livelock avoidance code introduced in commit f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement writeback livelock avoidance using page tagging"). Signed-off-by: Dave Chinner <dchinner@redhat.com> Acked-by: Jan Kara <jack@suse.cz>
2010-08-23radix-tree: clear all tags in radix_tree_node_rcu_freeDave Chinner
Commit f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement writeback livelock avoidance using page tagging") introduced a new radix tree tag, increasing the number of tags in each node from 2 to 3. It did not, however, fix up the code in radix_tree_node_rcu_free() that cleans up after radix_tree_shrink() and hence could leave stray tags set in the new tag array. The result is that the livelock avoidance code added in the the above commit would hit stale tags when doing tag based lookups, resulting in livelocks when trying to traverse the tree. Fix this problem in radix_tree_node_rcu_free() so it doesn't happen again in the future by using a loop to walk all the tags up to RADIX_TREE_MAX_TAGS to clear the stray tags radix_tree_shrink() leaves behind. Signed-off-by: Dave Chinner <dchinner@redhat.com> Acked-by: Nick Piggin <npiggin@kernel.dk> Acked-by: Jan Kara <jack@suse.cz>
2010-08-22Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PIT: free irq source id in handling error path KVM: destroy workqueue on kvm_create_pit() failures KVM: fix poison overwritten caused by using wrong xstate size
2010-08-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel: (58 commits) drm/i915,intel_agp: Add support for Sandybridge D0 drm/i915: fix render pipe control notify on sandybridge agp/intel: set 40-bit dma mask on Sandybridge drm/i915: Remove the conflicting BUG_ON() drm/i915/suspend: s/IS_IRONLAKE/HAS_PCH_SPLIT/ drm/i915/suspend: Flush register writes before busy-waiting. i915: disable DAC on Ironlake also when doing CRT load detection. drm/i915: wait for actual vblank, not just 20ms drm/i915: make sure eDP PLL is enabled at the right time drm/i915: fix VGA plane disable for Ironlake+ drm/i915: eDP mode set sequence corrections drm/i915: add panel reset workaround drm/i915: Enable RC6 on Ironlake. drm/i915/sdvo: Only set is_lvds if we have a valid fixed mode. drm/i915: Set up a render context on Ironlake drm/i915 invalidate indirect state pointers at end of ring exec drm/i915: Wake-up wait_request() from elapsed hang-check (v2) drm/i915: Apply i830 errata for cursor alignment drm/i915: Only update i845/i865 CURBASE when disabled (v2) drm/i915: FBC is updated within set_base() so remove second call in mode_set() ...
2010-08-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slab: fix object alignment slub: add missing __percpu markup in mm/slub_def.h
2010-08-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: wait for discard to finish
2010-08-21drm/i915,intel_agp: Add support for Sandybridge D0Zhenyu Wang
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Eric Anholt <eric@anholt.net>
2010-08-21drm/i915: fix render pipe control notify on sandybridgeZhenyu Wang
This one is missed in last pipe control fix for sandybridge, that really unmask interrupt bit for notify in render engine IMR. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Eric Anholt <eric@anholt.net>
2010-08-21agp/intel: set 40-bit dma mask on SandybridgeZhenyu Wang
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Eric Anholt <eric@anholt.net>
2010-08-21drm/i915: Remove the conflicting BUG_ON()Chris Wilson
We now attempt to free "active" objects following a GPU hang as either the GPU will be reset or the hang is permenant. In either case, the GPU writes will not be flushed to main memory and it should be safe to return that memory back to the system. The BUG_ON(active) is thus overkill and can erroneously fire after a EIO. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Eric Anholt <eric@anholt.net>
2010-08-21drm/i915/suspend: s/IS_IRONLAKE/HAS_PCH_SPLIT/Chris Wilson
For the shared paths on the next generation chipsets. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Eric Anholt <eric@anholt.net>
2010-08-21drm/i915/suspend: Flush register writes before busy-waiting.Chris Wilson
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Eric Anholt <eric@anholt.net>
2010-08-21i915: disable DAC on Ironlake also when doing CRT load detection.Dave Airlie
Like on Sandybridge, disabling the DAC here when doing CRT load detect avoids forever hangs waiting on the hardware. test procedure on HP 2740p: boot with no VGA plugged in, start X, plug in VGA monitor (1280x1024) chvt 3 machine hangs waiting forever. Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Eric Anholt <eric@anholt.net>
2010-08-21drm/i915: wait for actual vblank, not just 20msJesse Barnes
Waiting for a hard coded 20ms isn't always enough to make sure a vblank period has actually occurred, so add code to make sure we really have passed through a vblank period (or that the pipe is off when disabling). This prevents problems with mode setting and link training, and seems to fix a bug like https://bugs.freedesktop.org/show_bug.cgi?id=29278, but on an HP 8440p instead. Hopefully also fixes https://bugs.freedesktop.org/show_bug.cgi?id=29141. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Eric Anholt <eric@anholt.net>
2010-08-21workqueue: Add basic tracepoints to track workqueue executionArjan van de Ven
With the introduction of the new unified work queue thread pools, we lost one feature: It's no longer possible to know which worker is causing the CPU to wake out of idle. The result is that PowerTOP now reports a lot of "kworker/a:b" instead of more readable results. This patch adds a pair of tracepoints to the new workqueue code, similar in style to the timer/hrtimer tracepoints. With this pair of tracepoints, the next PowerTOP can correctly report which work item caused the wakeup (and how long it took): Interrupt (43) i915 time 3.51ms wakeups 141 Work ieee80211_iface_work time 0.81ms wakeups 29 Work do_dbs_timer time 0.55ms wakeups 24 Process Xorg time 21.36ms wakeups 4 Timer sched_rt_period_timer time 0.01ms wakeups 1 Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-21Merge git://git.infradead.org/mtd-2.6Linus Torvalds
* git://git.infradead.org/mtd-2.6: mtd: nand: Fix probe of Samsung NAND chips mtd: nand: Fix regression in BBM detection pxa3xx: fix ns2cycle equation
2010-08-21Replace Configure with Enable in description of MAXSMPSamuel Thibault
The "Configure" word tends to make user believe they have to say 'yes' to be able to choose the number of procs/nodes. "Enable" should be unambiguous enough. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-21mm: make stack guard page logic use vm_prev pointerLinus Torvalds
Like the mlock() change previously, this makes the stack guard check code use vma->vm_prev to see what the mapping below the current stack is, rather than have to look it up with find_vma(). Also, accept an abutting stack segment, since that happens naturally if you split the stack with mlock or mprotect. Tested-by: Ian Campbell <ijc@hellion.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-21mm: make the mlock() stack guard page checks stricterLinus Torvalds
If we've split the stack vma, only the lowest one has the guard page. Now that we have a doubly linked list of vma's, checking this is trivial. Tested-by: Ian Campbell <ijc@hellion.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-21mm: make the vma list be doubly linkedLinus Torvalds
It's a really simple list, and several of the users want to go backwards in it to find the previous vma. So rather than have to look up the previous entry with 'find_vma_prev()' or something similar, just make it doubly linked instead. Tested-by: Ian Campbell <ijc@hellion.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-20mtd: nand: Fix probe of Samsung NAND chipsTilman Sauerbeck
Apparently, the check for a 6-byte ID string introduced by commit 426c457a3216fac74e3d44dd39729b0689f4c7ab ("mtd: nand: extend NAND flash detection to new MLC chips") is NOT sufficient to determine whether or not a Samsung chip uses their new MLC detection scheme or the old, standard scheme. This adds a condition to check cell type. Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de> Signed-off-by: Brian Norris <norris@broadcom.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: stable@kernel.org
2010-08-20Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, apic: Fix apic=debug boot crash x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues x86-32: Fix dummy trampoline-related inline stubs x86-32: Separate 1:1 pagetables from swapper_pg_dir x86, cpu: Fix regression in AMD errata checking code
2010-08-20Documentation: fix ozlabs.org mailing list addressStephen Rothwell
This list moved to lists.ozlabs.org quite some time ago. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-20MAINTAINERS: Fix ozlabs.org mailing list addressesStephen Rothwell
All these lists moved to lists.ozlabs.org quite a while ago. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-20Documentation: kernel-locking: mutex_trylock cannot be used in interrupt contextStefan Richter
Chapter 6 is right about mutex_trylock, but chapter 10 wasn't. This error was introduced during semaphore-to-mutex conversion of the Unreliable guide. :-) If user context which performs mutex_lock() or mutex_trylock() is preempted by interrupt context which performs mutex_trylock() on the same mutex instance, a deadlock occurs. This is because these functions do not disable local IRQs when they operate on mutex->wait_lock. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-20drivers/scsi/qla4xxx: fix buildAndrew Morton
gcc-4.0.2: drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4_8xxx_error_recovery': drivers/scsi/qla4xxx/ql4_glbl.h:135: sorry, unimplemented: inlining failed in call to 'qla4_8xxx_set_drv_active': function body not available drivers/scsi/qla4xxx/ql4_os.c:2377: sorry, unimplemented: called from here drivers/scsi/qla4xxx/ql4_glbl.h:135: sorry, unimplemented: inlining failed in call to 'qla4_8xxx_set_drv_active': function body not available drivers/scsi/qla4xxx/ql4_os.c:2393: sorry, unimplemented: called from here Cc: Ravi Anand <ravi.anand@qlogic.com> Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-20uml: fix compile error in dma_get_cache_alignment()Miklos Szeredi
Fix uml compile error: include/linux/dma-mapping.h:145: error: redefinition of 'dma_get_cache_alignment' arch/um/include/asm/dma-mapping.h:99: note: previous definition of 'dma_get_cache_alignment' was here Introduced by commit 4565f0170dfc ("dma-mapping: unify dma_get_cache_alignment implementations") Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Jeff Dike <jdike@addtoit.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-20oom: __task_cred() need rcu_read_lock()KOSAKI Motohiro
dump_tasks() needs to hold the RCU read lock around its access of the target task's UID. To this end it should use task_uid() as it only needs that one thing from the creds. The fact that dump_tasks() holds tasklist_lock is insufficient to prevent the target process replacing its credentials on another CPU. Then, this patch change to call rcu_read_lock() explicitly. =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- mm/oom_kill.c:410 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 4 locks held by kworker/1:2/651: #0: (events){+.+.+.}, at: [<ffffffff8106aae7>] process_one_work+0x137/0x4a0 #1: (moom_work){+.+...}, at: [<ffffffff8106aae7>] process_one_work+0x137/0x4a0 #2: (tasklist_lock){.+.+..}, at: [<ffffffff810fafd4>] out_of_memory+0x164/0x3f0 #3: (&(&p->alloc_lock)->rlock){+.+...}, at: [<ffffffff810fa48e>] find_lock_task_mm+0x2e/0x70 Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-20oom: fix tasklist_lock leakKOSAKI Motohiro
Commit 0aad4b3124 ("oom: fold __out_of_memory into out_of_memory") introduced a tasklist_lock leak. Then it caused following obvious danger warnings and panic. ================================================ [ BUG: lock held when returning to user space! ] ------------------------------------------------ rsyslogd/1422 is leaving the kernel with locks still held! 1 lock held by rsyslogd/1422: #0: (tasklist_lock){.+.+.+}, at: [<ffffffff810faf64>] out_of_memory+0x164/0x3f0 BUG: scheduling while atomic: rsyslogd/1422/0x00000002 INFO: lockdep is turned off. This patch fixes it. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>