linux/fs/exec.c, branch v2.6.35.9

execve: make responsive to SIGKILL with large arguments

2010-10-29T04:51:47Z

commit 9aea5a65aa7a1af9a4236dfaeb0088f1624f9919 upstream. An execve with a very large total of argument/environment strings can take a really long time in the execve system call. It runs uninterruptibly to count and copy all the strings. This change makes it abort the exec quickly if sent a SIGKILL. Note that this is the conservative change, to interrupt only for SIGKILL, by using fatal_signal_pending(). It would be perfectly correct semantics to let any signal interrupt the string-copying in execve, i.e. use signal_pending() instead of fatal_signal_pending(). We'll save that change for later, since it could have user-visible consequences, such as having a timer set too quickly make it so that an execve can never complete, though it always happened to work before. Signed-off-by: Roland McGrath Reviewed-by: KOSAKI Motohiro Cc: Chuck Ebbert Signed-off-by: Linus Torvalds

execve: improve interactivity with large arguments

2010-10-29T04:51:46Z

commit 7993bc1f4663c0db67bb8f0d98e6678145b387cd upstream. This adds a preemption point during the copying of the argument and environment strings for execve, in copy_strings(). There is already a preemption point in the count() loop, so this doesn't add any new points in the abstract sense. When the total argument+environment strings are very large, the time spent copying them can be much more than a normal user time slice. So this change improves the interactivity of the rest of the system when one process is doing an execve with very large arguments. Signed-off-by: Roland McGrath Reviewed-by: KOSAKI Motohiro Signed-off-by: Linus Torvalds Cc: Chuck Ebbert Signed-off-by: Greg Kroah-Hartman

setup_arg_pages: diagnose excessive argument size

2010-10-29T04:51:46Z

commit 1b528181b2ffa14721fb28ad1bd539fe1732c583 upstream. The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not check the size of the argument/environment area on the stack. When it is unworkably large, shift_arg_pages() hits its BUG_ON. This is exploitable with a very large RLIMIT_STACK limit, to create a crash pretty easily. Check that the initial stack is not too large to make it possible to map in any executable. We're not checking that the actual executable (or intepreter, for binfmt_elf) will fit. So those mappings might clobber part of the initial stack mapping. But that is just userland lossage that userland made happen, not a kernel problem. Signed-off-by: Roland McGrath Reviewed-by: KOSAKI Motohiro Signed-off-by: Linus Torvalds Cc: Chuck Ebbert Signed-off-by: Greg Kroah-Hartman

exit: avoid sig->count in de_thread/__exit_signal synchronization

2010-05-27T16:12:46Z

de_thread() and __exit_signal() use signal_struct->count/notify_count for synchronization. We can simplify the code and use ->notify_count only. Instead of comparing these two counters, we can change de_thread() to set ->notify_count = nr_of_sub_threads, then change __exit_signal() to dec-and-test this counter and notify group_exit_task. Note that __exit_signal() checks "notify_count > 0" just for symmetry with exit_notify(), we could just check it is != 0. Signed-off-by: Oleg Nesterov Acked-by: Roland McGrath Cc: Veaceslav Falico Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

coredump: shift down_write(mmap_sem) into coredump_wait()

2010-05-27T16:12:45Z

- move the cprm.mm_flags checks up, before we take mmap_sem - move down_write(mmap_sem) and ->core_state check from do_coredump() to coredump_wait() This simplifies the code and makes the locking symmetrical. Signed-off-by: Oleg Nesterov Cc: David Howells Cc: Neil Horman Cc: Roland McGrath Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

coredump: factor out put_cred() calls

2010-05-27T16:12:45Z

Given that do_coredump() calls put_cred() on exit path, it is a bit ugly to do put_cred() + "goto fail" twice, just add the new "fail_creds" label. Signed-off-by: Oleg Nesterov Cc: David Howells Cc: Neil Horman Cc: Roland McGrath Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

coredump: cleanup "ispipe" code

2010-05-27T16:12:45Z

- kill "int dump_count", argv_split(argcp) accepts argcp == NULL. - move "int dump_count" under " if (ispipe)" branch, fail_dropcount can check ispipe. - move "char **helper_argv" as well, change the code to do argv_free() right after call_usermodehelper_fns(). - If call_usermodehelper_fns() fails goto close_fail label instead of closing the file by hand. Signed-off-by: Oleg Nesterov Cc: David Howells Cc: Neil Horman Cc: Roland McGrath Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

coredump: factor out the not-ispipe file checks

2010-05-27T16:12:45Z

do_coredump() does a lot of file checks after it opens the file or calls usermode helper. But all of these checks are only needed in !ispipe case. Move this code into the "else" branch and kill the ugly repetitive ispipe checks. Signed-off-by: Oleg Nesterov Cc: David Howells Cc: Neil Horman Cc: Roland McGrath Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

exec: replace call_usermodehelper_pipe with use of umh init function and resolve limit

2010-05-27T16:12:44Z

The first patch in this series introduced an init function to the call_usermodehelper api so that processes could be customized by caller. This patch takes advantage of that fact, by customizing the helper in do_coredump to create the pipe and set its core limit to one (for our recusrsion check). This lets us clean up the previous uglyness in the usermodehelper internals and factor call_usermodehelper out entirely. While I'm at it, we can also modify the helper setup to look for a core limit value of 1 rather than zero for our recursion check Signed-off-by: Neil Horman Reviewed-by: Oleg Nesterov Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: migration: avoid race between shift_arg_pages() and rmap_walk() during migration by not migrating temporary stacks

2010-05-25T15:06:59Z

Page migration requires rmap to be able to find all ptes mapping a page at all times, otherwise the migration entry can be instantiated, but it is possible to leave one behind if the second rmap_walk fails to find the page. If this page is later faulted, migration_entry_to_page() will call BUG because the page is locked indicating the page was migrated by the migration PTE not cleaned up. For example kernel BUG at include/linux/swapops.h:105! invalid opcode: 0000 [#1] PREEMPT SMP ... Call Trace: [] handle_mm_fault+0x3f8/0x76a [] do_page_fault+0x44a/0x46e [] page_fault+0x25/0x30 [] load_elf_binary+0x152a/0x192b [] search_binary_handler+0x173/0x313 [] do_execve+0x219/0x30a [] sys_execve+0x43/0x5e [] stub_execve+0x6a/0xc0 RIP [] migration_entry_wait+0xc1/0x129 There is a race between shift_arg_pages and migration that triggers this bug. A temporary stack is setup during exec and later moved. If migration moves a page in the temporary stack and the VMA is then removed before migration completes, the migration PTE may not be found leading to a BUG when the stack is faulted. This patch causes pages within the temporary stack during exec to be skipped by migration. It does this by marking the VMA covering the temporary stack with an otherwise impossible combination of VMA flags. These flags are cleared when the temporary stack is moved to its final location. [kamezawa.hiroyu@jp.fujitsu.com: idea for having migration skip temporary stacks] Signed-off-by: Mel Gorman Reviewed-by: KAMEZAWA Hiroyuki Reviewed-by: Rik van Riel Acked-by: Linus Torvalds Cc: Minchan Kim Cc: Christoph Lameter Cc: Andrea Arcangeli Cc: Rik van Riel Cc: Peter Zijlstra Reviewed-by: KOSAKI Motohiro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds