aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2012-04-13cred: copy_process() should clear child->replacement_session_keyringOleg Nesterov
commit 79549c6dfda0603dba9a70a53467ce62d9335c33 upstream. keyctl_session_to_parent(task) sets ->replacement_session_keyring, it should be processed and cleared by key_replace_session_keyring(). However, this task can fork before it notices TIF_NOTIFY_RESUME and the new child gets the bogus ->replacement_session_keyring copied by dup_task_struct(). This is obviously wrong and, if nothing else, this leads to put_cred(already_freed_cred). change copy_creds() to clear this member. If copy_process() fails before this point the wrong ->replacement_session_keyring doesn't matter, exit_creds() won't be called. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13sysctl: fix write access to dmesg_restrict/kptr_restrictKees Cook
commit 620f6e8e855d6d447688a5f67a4e176944a084e8 upstream. Commit bfdc0b4 adds code to restrict access to dmesg_restrict, however, it incorrectly alters kptr_restrict rather than dmesg_restrict. The original patch from Richard Weinberger (https://lkml.org/lkml/2011/3/14/362) alters dmesg_restrict as expected, and so the patch seems to have been misapplied. This adds the CAP_SYS_ADMIN check to both dmesg_restrict and kptr_restrict, since both are sensitive. Reported-by: Phillip Lougher <plougher@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Richard Weinberger <richard@nod.at> Signed-off-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13kgdb,debug_core: pass the breakpoint struct instead of address and memoryJason Wessel
commit 98b54aa1a2241b59372468bd1e9c2d207bdba54b upstream. There is extra state information that needs to be exposed in the kgdb_bpt structure for tracking how a breakpoint was installed. The debug_core only uses the the probe_kernel_write() to install breakpoints, but this is not enough for all the archs. Some arch such as x86 need to use text_poke() in order to install a breakpoint into a read only page. Passing the kgdb_bpt structure to kgdb_arch_set_breakpoint() and kgdb_arch_remove_breakpoint() allows other archs to set the type variable which indicates how the breakpoint was installed. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13PM / Sleep: Mitigate race between the freezer and request_firmware()Rafael J. Wysocki
commit 247bc03742545fec2f79939a3b9f738392a0f7b4 upstream. There is a race condition between the freezer and request_firmware() such that if request_firmware() is run on one CPU and freeze_processes() is run on another CPU and usermodehelper_disable() called by it succeeds to grab umhelper_sem for writing before usermodehelper_read_trylock() called from request_firmware() acquires it for reading, the request_firmware() will fail and trigger a WARN_ON() complaining that it was called at a wrong time. However, in fact, it wasn't called at a wrong time and freeze_processes() simply happened to be executed simultaneously. To avoid this race, at least in some cases, modify usermodehelper_read_trylock() so that it doesn't fail if the freezing of tasks has just started and hasn't been completed yet. Instead, during the freezing of tasks, it will try to freeze the task that has called it so that it can wait until user space is thawed without triggering the scary warning. For this purpose, change usermodehelper_disabled so that it can take three different values, UMH_ENABLED (0), UMH_FREEZING and UMH_DISABLED. The first one means that usermode helpers are enabled, the last one means "hard disable" (i.e. the system is not ready for usermode helpers to be used) and the second one is reserved for the freezer. Namely, when freeze_processes() is started, it sets usermodehelper_disabled to UMH_FREEZING which tells usermodehelper_read_trylock() that it shouldn't fail just yet and should call try_to_freeze() if woken up and cannot return immediately. This way all freezable tasks that happen to call request_firmware() right before freeze_processes() is started and lose the race for umhelper_sem with it will be frozen and will sleep until thaw_processes() unsets usermodehelper_disabled. [For the non-freezable callers of request_firmware() the race for umhelper_sem against freeze_processes() is unfortunately unavoidable.] Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13PM / Sleep: Move disabling of usermode helpers to the freezerRafael J. Wysocki
commit 1e73203cd1157a03facc41ffb54050f5b28e55bd upstream. The core suspend/hibernation code calls usermodehelper_disable() to avoid race conditions between the freezer and the starting of usermode helpers and each code path has to do that on its own. However, it is always called right before freeze_processes() and usermodehelper_enable() is always called right after thaw_processes(). For this reason, to avoid code duplication and to make the connection between usermodehelper_disable() and the freezer more visible, make freeze_processes() call it and remove the direct usermodehelper_disable() and usermodehelper_enable() calls from all suspend/hibernation code paths. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13PM / Hibernate: Disable usermode helpers right before freezing tasksRafael J. Wysocki
commit 7b5179ac14dbad945647ac9e76bbbf14ed9e0dbe upstream. There is no reason to call usermodehelper_disable() before creating memory bitmaps in hibernate() and software_resume(), so call it right before freeze_processes(), in accordance with the other suspend and hibernation code. Consequently, call usermodehelper_enable() right after the thawing of tasks rather than after freeing the memory bitmaps. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13firmware_class: Do not warn that system is not ready from async loadsRafael J. Wysocki
commit 9b78c1da60b3c62ccdd1509f0902ad19ceaf776b upstream. If firmware is requested asynchronously, by calling request_firmware_nowait(), there is no reason to fail the request (and warn the user) when the system is (presumably temporarily) unready to handle it (because user space is not available yet or frozen). For this reason, introduce an alternative routine for read-locking umhelper_sem, usermodehelper_read_lock_wait(), that will wait for usermodehelper_disabled to be unset (possibly with a timeout) and make request_firmware_work_func() use it instead of usermodehelper_read_trylock(). Accordingly, modify request_firmware() so that it uses usermodehelper_read_trylock() to acquire umhelper_sem and remove the code related to that lock from _request_firmware(). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13firmware_class: Rework usermodehelper checkRafael J. Wysocki
commit fe2e39d8782d885755139304d8dba0b3e5bfa878 upstream. Instead of two functions, read_lock_usermodehelper() and usermodehelper_is_disabled(), used in combination, introduce usermodehelper_read_trylock() that will only return with umhelper_sem held if usermodehelper_disabled is unset (and will return -EAGAIN otherwise) and make _request_firmware() use it. Rename read_unlock_usermodehelper() to usermodehelper_read_unlock() to follow the naming convention of the new function. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13tracing: Fix ent_size in trace outputSteven Rostedt
commit 12b5da349a8b94c9dbc3430a6bc42eabd9eaf50b upstream. When reading the trace file, the records of each of the per_cpu buffers are examined to find the next event to print out. At the point of looking at the event, the size of the event is recorded. But if the first event is chosen, the other events in the other CPU buffers will reset the event size that is stored in the iterator descriptor, causing the event size passed to the output functions to be incorrect. In most cases this is not a problem, but for the case of stack traces, it is. With the change to the stack tracing to record a dynamic number of back traces, the output depends on the size of the entry instead of the fixed 8 back traces. When the entry size is not correct, the back traces would not be fully printed. Note, reading from the per-cpu trace files were not affected. Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13tracing: Fix ftrace stack trace entriesWolfgang Mauerer
commit 01de982abf8c9e10fc3089e10585cd2cc914bdab upstream. 8 hex characters tell only half the tale for 64 bit CPUs, so use the appropriate length. Link: http://lkml.kernel.org/r/1332411501-8059-2-git-send-email-wolfgang.mauerer@siemens.com Signed-off-by: Wolfgang Mauerer <wolfgang.mauerer@siemens.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13genirq: Adjust irq thread affinity on IRQ_SET_MASK_OK_NOCOPY return valueJiang Liu
commit f5cb92ac82d06cb583c1f66666314c5c0a4d7913 upstream. irq_move_masked_irq() checks the return code of chip->irq_set_affinity() only for 0, but IRQ_SET_MASK_OK_NOCOPY is also a valid return code, which is there to avoid a redundant copy of the cpumask. But in case of IRQ_SET_MASK_OK_NOCOPY we not only avoid the redundant copy, we also fail to adjust the thread affinity of an eventually threaded interrupt handler. Handle IRQ_SET_MASK_OK (==0) and IRQ_SET_MASK_OK_NOCOPY(==1) return values correctly by checking the valid return values seperately. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Keping Chen <chenkeping@huawei.com> Link: http://lkml.kernel.org/r/1333120296-13563-2-git-send-email-jiang.liu@huawei.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02module: Remove module size limitSasha Levin
commit f946eeb9313ff1470758e171a60fe7438a2ded3f upstream. Module size was limited to 64MB, this was legacy limitation due to vmalloc() which was removed a while ago. Limiting module size to 64MB is both pointless and affects real world use cases. Cc: Tim Abbott <tim.abbott@oracle.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02PM / Hibernate: Enable usermodehelpers in hibernate() error pathSrivatsa S. Bhat
commit 05b4877f6a4f1ba4952d1222213d262bf8c132b7 upstream. If create_basic_memory_bitmaps() fails, usermodehelpers are not re-enabled before returning. Fix this. And while at it, reword the goto labels so that they look more meaningful. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02genirq: Fix incorrect check for forced IRQ thread handlerAlexander Gordeev
commit 540b60e24f3f4781d80e47122f0c4486a03375b8 upstream. We do not want a bitwise AND between boolean operands Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20120309135912.GA2114@dhcp-26-207.brq.redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02genirq: Fix long-term regression in genirq irq_set_irq_type() handlingRussell King
commit a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5 upstream. In 2008, commit 0c5d1eb77a8be ("genirq: record trigger type") modified the way set_irq_type() handles the 'no trigger' condition. However, this has an adverse effect on PCMCIA support on Intel StrongARM and probably PXA platforms. PCMCIA has several status signals on the socket which can trigger interrupts; some of these status signals depend on the card's mode (whether it is configured in memory or IO mode). For example, cards have a 'Ready/IRQ' signal: in memory mode, this provides an indication to PCMCIA that the card has finished its power up initialization. In IO mode, it provides the device interrupt signal. Other status signals switch between on-board battery status and loud speaker output. In classical PCMCIA implementations, where you have a specific socket controller, the controller provides a method to mask interrupts from the socket, and importantly ignore any state transitions on the pins which correspond with interrupts once masked. This masking prevents unwanted events caused by the removal and application of socket power being forwarded. However, on platforms where there is no socket controller, the PCMCIA status and interrupt signals are routed to standard edge-triggered GPIOs. These GPIOs can be configured to interrupt on rising edge, falling edge, or never. This is where the problems start. Edge triggered interrupts are required to record events while disabled via the usual methods of {free,request,disable,enable}_irq() to prevent problems with dropped interrupts (eg, the 8390 driver uses disable_irq() to defer the delivery of interrupts). As a result, these interfaces can not be used to implement the desired behaviour. The side effect of this is that if the 'Ready/IRQ' GPIO is disabled via disable_irq() on suspend, and enabled via enable_irq() after resume, we will record the state transitions caused by powering events as valid interrupts, and foward them to the card driver, which may attempt to access a card which is not powered up. This leads delays resume while drivers spin in their interrupt handlers, and complaints from drivers before they realize what's happened. Moreover, in the case of the 'Ready/IRQ' signal, this is requested and freed by the card driver itself; the PCMCIA core has no idea whether the interrupt is requested, and, therefore, whether a call to disable_irq() would be valid. (We tried this around 2.4.17 / 2.5.1 kernel era, and ended up throwing it out because of this problem.) Therefore, it was decided back in around 2002 to disable the edge triggering instead, resulting in all state transitions on the GPIO being ignored. That's what we actually need the hardware to do. The commit above changes this behaviour; it explicitly prevents the 'no trigger' state being selected. The reason that request_irq() does not accept the 'no trigger' state is for compatibility with existing drivers which do not provide their desired triggering configuration. The set_irq_type() function is 'new' and not used by non-trigger aware drivers. Therefore, revert this change, and restore previously working platforms back to their former state. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: linux@arm.linux.org.uk Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02ntp: Fix integer overflow when setting timeSasha Levin
commit a078c6d0e6288fad6d83fb6d5edd91ddb7b6ab33 upstream. 'long secs' is passed as divisor to div_s64, which accepts a 32bit divisor. On 64bit machines that value is trimmed back from 8 bytes back to 4, causing a divide by zero when the number is bigger than (1 << 32) - 1 and all 32 lower bits are 0. Use div64_long() instead. Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Cc: johnstul@us.ibm.com Link: http://lkml.kernel.org/r/1331829374-31543-2-git-send-email-levinsasha928@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02futex: Cover all PI opcodes with cmpxchg enabled checkThomas Gleixner
commit 59263b513c11398cd66a52d4c5b2b118ce1e0359 upstream. Some of the newer futex PI opcodes do not check the cmpxchg enabled variable and call unconditionally into the handling functions. Cover all PI opcodes in a separate check. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-15prctl: use CAP_SYS_RESOURCE for PR_SET_MM optionCyrill Gorcunov
CAP_SYS_ADMIN is already overloaded left and right, so to have more fine-grained access control use CAP_SYS_RESOURCE here. The CAP_SYS_RESOUCE is chosen because this prctl option allows a current process to adjust some fields of memory map descriptor which rather represents what the process owns: pointers to code, data, stack segments, command line, auxiliary vector data and etc. Suggested-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Bolle <pebolle@tiscali.nl> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-14Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Been sitting on this for a while, but lets get this out the door. This fixes various important bugs for 3.3 final, along with a few more trivial ones. Please pull!" * 'for-linus' of git://git.kernel.dk/linux-block: block: fix ioc leak in put_io_context block, sx8: fix pointer math issue getting fw version Block: use a freezable workqueue for disk-event polling drivers/block/DAC960: fix -Wuninitialized warning drivers/block/DAC960: fix DAC960_V2_IOCTL_Opcode_T -Wenum-compare warning block: fix __blkdev_get and add_disk race condition block: Fix setting bio flags in drivers (sd_dif/floppy) block: Fix NULL pointer dereference in sd_revalidate_disk block: exit_io_context() should call elevator_exit_icq_fn() block: simplify ioc_release_fn() block: replace icq->changed with icq->flags
2012-03-07Revert "CPU hotplug, cpusets, suspend: Don't touch cpusets during ↵Linus Torvalds
suspend/resume" This reverts commit 8f2f748b0656257153bcf0941df8d6060acc5ca6. It causes some odd regression that we have not figured out, and it's too late in the -rc series to try to figure it out now. As reported by Konstantin Khlebnikov, it causes consistent hangs on his laptop (Thinkpad x220: 2x cores + HT). They can be avoided by adding calls to "rebuild_sched_domains();" in cpuset_cpu_[in]active() for the CPU_{ONLINE/DOWN_FAILED/DOWN_PREPARE}_FROZEN cases, but it's not at all clear why, and it makes no sense. Konstantin's config doesn't even have CONFIG_CPUSETS enabled, just to make things even more interesting. So it's not the cpusets, it's just the scheduling domains. So until this is understood, revert. Bisected-reported-and-tested-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-06genirq: Clear action->thread_mask if IRQ_ONESHOT is not setThomas Gleixner
Xommit ac5637611(genirq: Unmask oneshot irqs when thread was not woken) fails to unmask when a !IRQ_ONESHOT threaded handler is handled by handle_level_irq. This happens because thread_mask is or'ed unconditionally in irq_wake_thread(), but for !IRQ_ONESHOT interrupts never cleared. So the check for !desc->thread_active fails and keeps the interrupt disabled. Keep the thread_mask zero for !IRQ_ONESHOT interrupts. Document the thread_mask magic while at it. Reported-and-tested-by: Sven Joachim <svenjoac@gmx.de> Reported-and-tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05hung_task: fix the broken rcu_lock_break() logicOleg Nesterov
check_hung_uninterruptible_tasks()->rcu_lock_break() introduced by "softlockup: check all tasks in hung_task" commit ce9dbe24 looks absolutely wrong. - rcu_lock_break() does put_task_struct(). If the task has exited it is not safe to even read its ->state, nothing protects this task_struct. - The TASK_DEAD checks are wrong too. Contrary to the comment, we can't use it to check if the task was unhashed. It can be unhashed without TASK_DEAD, or it can be valid with TASK_DEAD. For example, an autoreaping task can do release_task(current) long before it sets TASK_DEAD in do_exit(). Or, a zombie task can have ->state == TASK_DEAD but release_task() was not called, and in this case we must not break the loop. Change this code to check pid_alive() instead, and do this before we drop the reference to the task_struct. Note: while_each_thread() under rcu_read_lock() is not really safe, it can livelock. This will be fixed later, but fortunately in this case the "max_count" logic saves us anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Mandeep Singh Baines <msb@google.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05vfork: kill PF_STARTINGOleg Nesterov
Previously it was (ab)used by utrace. Then it was wrongly used by the scheduler code. Currently it is not used, kill it before it finds the new erroneous user. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05coredump_wait: don't call complete_vfork_done()Oleg Nesterov
Now that CLONE_VFORK is killable, coredump_wait() no longer needs complete_vfork_done(). zap_threads() should find and kill all tasks with the same ->mm, this includes our parent if ->vfork_done is set. mm_release() becomes the only caller, unexport complete_vfork_done(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05vfork: make it killableOleg Nesterov
Make vfork() killable. Change do_fork(CLONE_VFORK) to do wait_for_completion_killable(). If it fails we do not return to the user-mode and never touch the memory shared with our child. However, in this case we should clear child->vfork_done before return, we use task_lock() in do_fork()->wait_for_vfork_done() and complete_vfork_done() to serialize with each other. Note: now that we use task_lock() we don't really need completion, we could turn task->vfork_done into "task_struct *wake_up_me" but this needs some complications. NOTE: this and the next patches do not affect in-kernel users of CLONE_VFORK, kernel threads run with all signals ignored including SIGKILL/SIGSTOP. However this is obviously the user-visible change. Not only a fatal signal can kill the vforking parent, a sub-thread can do execve or exit_group() and kill the thread sleeping in vfork(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05vfork: introduce complete_vfork_done()Oleg Nesterov
No functional changes. Move the clear-and-complete-vfork_done code into the new trivial helper, complete_vfork_done(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05kprobes: return proper error code from register_kprobe()Prashanth Nageshappa
register_kprobe() aborts if the address of the new request falls in a prohibited area (such as ftrace pouch, __kprobes annotated functions, non-kernel text addresses, jump label text). We however don't return the right error on this abort, resulting in a silent failure - incorrect adding/reporting of kprobes ('perf probe do_fork+18' or 'perf probe mcount' for instance). In V2 we are incorporating Masami Hiramatsu's feedback. This patch fixes it by returning -EINVAL upon failure. While we are here, rename the label used for exit to be more appropriate. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Prashanth K Nageshappa <prashanth@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Jason Baron <jbaron@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05kmsg_dump: don't run on non-error paths by defaultMatthew Garrett
Since commit 04c6862c055f ("kmsg_dump: add kmsg_dump() calls to the reboot, halt, poweroff and emergency_restart paths"), kmsg_dump() gets run on normal paths including poweroff and reboot. This is less than ideal given pstore implementations that can only represent single backtraces, since a reboot may overwrite a stored oops before it's been picked up by userspace. In addition, some pstore backends may have low performance and provide a significant delay in reboot as a result. This patch adds a printk.always_kmsg_dump kernel parameter (which can also be changed from userspace). Without it, the code will only be run on failure paths rather than on normal paths. The option can be enabled in environments where there's a desire to attempt to audit whether or not a reboot was cleanly requested or not. Signed-off-by: Matthew Garrett <mjg@redhat.com> Acked-by: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Marco Stornelli <marco.stornelli@gmail.com> Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Don Zickus <dzickus@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-02Merge branches 'core-urgent-for-linus', 'perf-urgent-for-linus' and ↵Linus Torvalds
'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pulling latest branches from Ingo: * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: memblock: Fix size aligning of memblock_alloc_base_nid() * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf probe: Ensure offset provided is not greater than function length without DWARF info too perf tools: Ensure comm string is properly terminated perf probe: Ensure offset provided is not greater than function length perf evlist: Return first evsel for non-sample event on old kernel perf/hwbp: Fix a possible memory leak * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: CPU hotplug, cpusets, suspend: Don't touch cpusets during suspend/resume
2012-03-02Block: use a freezable workqueue for disk-event pollingAlan Stern
This patch (as1519) fixes a bug in the block layer's disk-events polling. The polling is done by a work routine queued on the system_nrt_wq workqueue. Since that workqueue isn't freezable, the polling continues even in the middle of a system sleep transition. Obviously, polling a suspended drive for media changes and such isn't a good thing to do; in the case of USB mass-storage devices it can lead to real problems requiring device resets and even re-enumeration. The patch fixes things by creating a new system-wide, non-reentrant, freezable workqueue and using it for disk-events polling. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: <stable@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-02-28perf/hwbp: Fix a possible memory leakNamhyung Kim
If kzalloc() for TYPE_DATA failed on a given cpu, previous chunk of TYPE_INST will be leaked. Fix it. Thanks to Peter Zijlstra for suggesting this better solution. It should work as long as the initial value of the region is all 0's and that's the case of static (per-cpu) memory allocation. Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/1330391978-28070-1-git-send-email-namhyung.kim@lge.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-27Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/events: Revert trace_sched_stat_sleeptime()
2012-02-27Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Handle pending irqs in irq_startup() genirq: Unmask oneshot irqs when thread was not woken
2012-02-27CPU hotplug, cpusets, suspend: Don't touch cpusets during suspend/resumeSrivatsa S. Bhat
Currently, during CPU hotplug, the cpuset callbacks modify the cpusets to reflect the state of the system, and this handling is asymmetric. That is, upon CPU offline, that CPU is removed from all cpusets. However when it comes back online, it is put back only to the root cpuset. This gives rise to a significant problem during suspend/resume. During suspend, we offline all non-boot cpus and during resume we online them back. Which means, after a resume, all cpusets (except the root cpuset) will be restricted to just one single CPU (the boot cpu). But the whole point of suspend/resume is to restore the system to a state which is as close as possible to how it was before suspend. So to fix this, don't touch cpusets during suspend/resume. That is, modify the cpuset-related CPU hotplug callback to just ignore CPU hotplug when it is initiated as part of the suspend/resume sequence. Reported-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/4F460D7B.1020703@linux.vnet.ibm.com Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-24epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()Oleg Nesterov
This patch is intentionally incomplete to simplify the review. It ignores ep_unregister_pollwait() which plays with the same wqh. See the next change. epoll assumes that the EPOLL_CTL_ADD'ed file controls everything f_op->poll() needs. In particular it assumes that the wait queue can't go away until eventpoll_release(). This is not true in case of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand which is not connected to the file. This patch adds the special event, POLLFREE, currently only for epoll. It expects that init_poll_funcptr()'ed hook should do the necessary cleanup. Perhaps it should be defined as EPOLLFREE in eventpoll. __cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if ->signalfd_wqh is not empty, we add the new signalfd_cleanup() helper. ep_poll_callback(POLLFREE) simply does list_del_init(task_list). This make this poll entry inconsistent, but we don't care. If you share epoll fd which contains our sigfd with another process you should blame yourself. signalfd is "really special". I simply do not know how we can define the "right" semantics if it used with epoll. The main problem is, epoll calls signalfd_poll() once to establish the connection with the wait queue, after that signalfd_poll(NULL) returns the different/inconsistent results depending on who does EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd has nothing to do with the file, it works with the current thread. In short: this patch is the hack which tries to fix the symptoms. It also assumes that nobody can take tasklist_lock under epoll locks, this seems to be true. Note: - we do not have wake_up_all_poll() but wake_up_poll() is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE. - signalfd_cleanup() uses POLLHUP along with POLLFREE, we need a couple of simple changes in eventpoll.c to make sure it can't be "lost". Reported-by: Maxime Bizon <mbizon@freebox.fr> Cc: <stable@kernel.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-22sched/events: Revert trace_sched_stat_sleeptime()Peter Zijlstra
Commit 1ac9bc69 ("sched/tracing: Add a new tracepoint for sleeptime") added a new sched:sched_stat_sleeptime tracepoint. It's broken: the first sample we get on a task might be bad because of a stale sleep_start value that wasn't reset at the last task switch because the tracepoint was not active. It also breaks the existing schedstat samples due to the side effects of: - se->statistics.sleep_start = 0; ... - se->statistics.block_start = 0; Nor do I see means to fix it without adding overhead to the scheduler fast path, which I'm not willing to for the sake of redundant instrumentation. Most importantly, sleep time information can already be constructed by tracing context switches and wakeups, and taking the timestamp difference between the schedule-out, the wakeup and the schedule-in. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Vagin <avagin@openvz.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-pc4c9qhl8q6vg3bs4j6k0rbd@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Assorted fixes, sat in -next for a week or so... * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ocfs2: deal with wraparounds of i_nlink in ocfs2_rename() vfs: fix compat_sys_stat() handling of overflows in st_nlink quota: Fix deadlock with suspend and quotas vfs: Provide function to get superblock and wait for it to thaw vfs: fix panic in __d_lookup() with high dentry hashtable counts autofs4 - fix lockdep splat in autofs vfs: fix d_inode_lookup() dentry ref leak
2012-02-15genirq: Handle pending irqs in irq_startup()Thomas Gleixner
An interrupt might be pending when irq_startup() is called, but the startup code does not invoke the resend logic. In some cases this prevents the device from issuing another interrupt which renders the device non functional. Call the resend function in irq_startup() to keep things going. Reported-and-tested-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15genirq: Unmask oneshot irqs when thread was not wokenThomas Gleixner
When the primary handler of an interrupt which is marked IRQ_ONESHOT returns IRQ_HANDLED or IRQ_NONE, then the interrupt thread is not woken and the unmask logic of the interrupt line is never invoked. This keeps the interrupt masked forever. This was not noticed as most IRQ_ONESHOT users wake the thread unconditionally (usually because they cannot access the underlying device from hard interrupt context). Though this behaviour was nowhere documented and not necessarily intentional. Some drivers can avoid the thread wakeup in certain cases and run into the situation where the interrupt line s kept masked. Handle it gracefully. Reported-and-tested-by: Lothar Wassmann <lw@karo-electronics.de> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-13vfs: fix panic in __d_lookup() with high dentry hashtable countsDimitri Sivanich
When the number of dentry cache hash table entries gets too high (2147483648 entries), as happens by default on a 16TB system, use of a signed integer in the dcache_init() initialization loop prevents the dentry_hashtable from getting initialized, causing a panic in __d_lookup(). Fix this in dcache_init() and similar areas. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-02-13Merge tag 'for-linus' of git://github.com/rustyrussell/linuxLinus Torvalds
* tag 'for-linus' of git://github.com/rustyrussell/linux: module: fix broken isapnp handling in file2alias module: make module param bint handle nul value
2012-02-14module: make module param bint handle nul valueDave Young
Allow bint param accept nul values, just do same as bool param. Signed-off-by: Dave Young <dyoung@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-02-11Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Says Jens: "Time to push off some of the pending items. I really wanted to wait until we had the regression nailed, but alas it's not quite there yet. But I'm very confident that it's "just" a missing expire on exit, so fix from Tejun should be fairly trivial. I'm headed out for a week on the slopes. - Killing the barrier part of mtip32xx. It doesn't really support barriers, and it doesn't need them (writes are fully ordered). - A few fixes from Dan Carpenter, preventing overflows of integer multiplication. - A fixup for loop, fixing a previous commit that didn't quite solve the partial read problem from Dave Young. - A bio integer overflow fix from Kent Overstreet. - Improvement/fix of the door "keep locked" part of the cdrom shared code from Paolo Benzini. - A few cfq fixes from Shaohua Li. - A fix for bsg sysfs warning when removing a file it did not create from Stanislaw Gruszka. - Two fixes for floppy from Vivek, preventing a crash. - A few block core fixes from Tejun. One killing the over-optimized ioc exit path, cleaning that up nicely. Two others fixing an oops on elevator switch, due to calling into the scheduler merge check code without holding the queue lock." * 'for-linus' of git://git.kernel.dk/linux-block: block: fix lockdep warning on io_context release put_io_context() relay: prevent integer overflow in relay_open() loop: zero fill bio instead of return -EIO for partial read bio: don't overflow in bio_get_nr_vecs() floppy: Fix a crash during rmmod floppy: Cleanup disk->queue before caling put_disk() if add_disk() was never called cdrom: move shared static to cdrom_device_info bsg: fix sysfs link remove warning block: don't call elevator callbacks for plug merges block: separate out blk_rq_merge_ok() and blk_try_merge() from elevator functions mtip32xx: removed the irrelevant argument of mtip_hw_submit_io() and the unused member of struct driver_data block: strip out locking optimization in put_io_context() cdrom: use copy_to_user() without the underscores block: fix ioc locking warning block: fix NULL icq_cache reference block,cfq: change code order
2012-02-10Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix double start/stop in x86_pmu_start() perf evsel: Fix an issue where perf report fails to show the proper percentage perf tools: Fix prefix matching for kernel maps perf tools: Fix perf stack to non executable on x86_64 perf: Remove deprecated WARN_ON_ONCE()
2012-02-10relay: prevent integer overflow in relay_open()Dan Carpenter
"subbuf_size" and "n_subbufs" come from the user and they need to be capped to prevent an integer overflow. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-02-07perf: Fix double start/stop in x86_pmu_start()Stephane Eranian
The following patch fixes a bug introduced by the following commit: e050e3f0a71b ("perf: Fix broken interrupt rate throttling") The patch caused the following warning to pop up depending on the sampling frequency adjustments: ------------[ cut here ]------------ WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4() It was caused by the following call sequence: perf_adjust_freq_unthr_context.part() { stop() if (delta > 0) { perf_adjust_period() { if (period > 8*...) { stop() ... start() } } } start() } Which caused a double start and a double stop, thus triggering the assert in x86_pmu_start(). The patch fixes the problem by avoiding the double calls. We pass a new argument to perf_adjust_period() to indicate whether or not the event is already stopped. We can't just remove the start/stop from that function because it's called from __perf_event_overflow where the event needs to be reloaded via a stop/start back-toback call. The patch reintroduces the assertion in x86_pmu_start() which was removed by commit: 84f2b9b ("perf: Remove deprecated WARN_ON_ONCE()") In this second version, we've added calls to disable/enable PMU during unthrottling or frequency adjustment based on bug report of spurious NMI interrupts from Eric Dumazet. Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: markus@trippelsdorf.de Cc: paulus@samba.org Link: http://lkml.kernel.org/r/20120207133956.GA4932@quad [ Minor edits to the changelog and to the code ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-07block: strip out locking optimization in put_io_context()Tejun Heo
put_io_context() performed a complex trylock dancing to avoid deferring ioc release to workqueue. It was also broken on UP because trylock was always assumed to succeed which resulted in unbalanced preemption count. While there are ways to fix the UP breakage, even the most pathological microbench (forced ioc allocation and tight fork/exit loop) fails to show any appreciable performance benefit of the optimization. Strip it out. If there turns out to be workloads which are affected by this change, simpler optimization from the discussion thread can be applied later. Signed-off-by: Tejun Heo <tj@kernel.org> LKML-Reference: <1328514611.21268.66.camel@sli10-conroe> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-02-04Merge tag 'pm-fixes-for-3.3-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Power management fixes for 3.3-rc3 Three power management regression fixes, one for a recent regression introcuded by the freezer changes during the 3.3 merge window and two for regressions in cpuidle (resulting from PM QoS changes) and in the hibernate user space interface, both introduced during the 3.2 development cycle. They include: * Two hibernate (s2disk) regression fixes from Srivatsa S. Bhat (for regressions introduced during the 3.3 merge window and during the 3.2 development cycle). * A cpuidle fix from Venki Pallipadi for a regression resulting from PM QoS changes during the 3.2 development cycle causing cpuidle to work incorrectly for CONFIG_PM unset. * tag 'pm-fixes-for-3.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / QoS: CPU C-state breakage with PM Qos change PM / Freezer: Thaw only kernel threads if freezing of kernel threads fails PM / Hibernate: Thaw kernel threads in SNAPSHOT_CREATE_IMAGE ioctl path
2012-02-04PM / Freezer: Thaw only kernel threads if freezing of kernel threads failsSrivatsa S. Bhat
If freezing of kernel threads fails, we are expected to automatically thaw tasks in the error recovery path. However, at times, we encounter situations in which we would like the automatic error recovery path to thaw only the kernel threads, because we want to be able to do some more cleanup before we thaw userspace. Something like: error = freeze_kernel_threads(); if (error) { /* Do some cleanup */ /* Only then thaw userspace tasks*/ thaw_processes(); } An example of such a situation is where we freeze/thaw filesystems during suspend/hibernation. There, if freezing of kernel threads fails, we would like to thaw the frozen filesystems before thawing the userspace tasks. So, modify freeze_kernel_threads() to thaw only kernel threads in case of freezing failure. And change suspend_freeze_processes() accordingly. (At the same time, let us also get rid of the rather cryptic usage of the conditional operator (:?) in that function.) [rjw: In fact, this patch fixes a regression introduced during the 3.3 merge window, because without it thaw_processes() may be called before swsusp_free() in some situations and that may lead to massive memory allocation failures.] Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Nigel Cunningham <nigel@tuxonice.net> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-02-03kprobes: fix a memory leak in function pre_handler_kretprobe()Jiang Liu
In function pre_handler_kretprobe(), the allocated kretprobe_instance object will get leaked if the entry_handler callback returns non-zero. This may cause all the preallocated kretprobe_instance objects exhausted. This issue can be reproduced by changing samples/kprobes/kretprobe_example.c to probe "mutex_unlock". And the fix is straightforward: just put the allocated kretprobe_instance object back onto the free_instances list. [akpm@linux-foundation.org: use raw_spin_lock/unlock] Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>