aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2010-08-09oom: move sysctl declarations to oom.hDavid Rientjes
The three oom killer sysctl variables (sysctl_oom_dump_tasks, sysctl_oom_kill_allocating_task, and sysctl_panic_on_oom) are better declared in include/linux/oom.h rather than kernel/sysctl.c. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09oom: remove special handling for pagefault oomsDavid Rientjes
It is possible to remove the special pagefault oom handler by simply oom locking all system zones and then calling directly into out_of_memory(). All populated zones must have ZONE_OOM_LOCKED set, otherwise there is a parallel oom killing in progress that will lead to eventual memory freeing so it's not necessary to needlessly kill another task. The context in which the pagefault is allocating memory is unknown to the oom killer, so this is done on a system-wide level. If a task has already been oom killed and hasn't fully exited yet, this will be a no-op since select_bad_process() recognizes tasks across the system with TIF_MEMDIE set. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Nick Piggin <npiggin@suse.de> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09oom: extract panic helper functionDavid Rientjes
There are various points in the oom killer where the kernel must determine whether to panic or not. It's better to extract this to a helper function to remove all the confusion as to its semantics. Also fix a call to dump_header() where tasklist_lock is not read- locked, as required. There's no functional change with this patch. Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09oom: avoid oom killer for lowmem allocationsDavid Rientjes
If memory has been depleted in lowmem zones even with the protection afforded to it by /proc/sys/vm/lowmem_reserve_ratio, it is unlikely that killing current users will help. The memory is either reclaimable (or migratable) already, in which case we should not invoke the oom killer at all, or it is pinned by an application for I/O. Killing such an application may leave the hardware in an unspecified state and there is no guarantee that it will be able to make a timely exit. Lowmem allocations are now failed in oom conditions when __GFP_NOFAIL is not used so that the task can perhaps recover or try again later. Previously, the heuristic provided some protection for those tasks with CAP_SYS_RAWIO, but this is no longer necessary since we will not be killing tasks for the purposes of ISA allocations. high_zoneidx is gfp_zone(gfp_flags), meaning that ZONE_NORMAL will be the default for all allocations that are not __GFP_DMA, __GFP_DMA32, __GFP_HIGHMEM, and __GFP_MOVABLE on kernels configured to support those flags. Testing for high_zoneidx being less than ZONE_NORMAL will only return true for allocations that have either __GFP_DMA or __GFP_DMA32. Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09oom: enable oom tasklist dump by defaultDavid Rientjes
The oom killer tasklist dump, enabled with the oom_dump_tasks sysctl, is very helpful information in diagnosing why a user's task has been killed. It emits useful information such as each eligible thread's memory usage that can determine why the system is oom, so it should be enabled by default. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09oom: select task from tasklist for mempolicy oomsDavid Rientjes
The oom killer presently kills current whenever there is no more memory free or reclaimable on its mempolicy's nodes. There is no guarantee that current is a memory-hogging task or that killing it will free any substantial amount of memory, however. In such situations, it is better to scan the tasklist for nodes that are allowed to allocate on current's set of nodes and kill the task with the highest badness() score. This ensures that the most memory-hogging task, or the one configured by the user with /proc/pid/oom_adj, is always selected in such scenarios. Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09oom: sacrifice child with highest badness score for parentDavid Rientjes
When a task is chosen for oom kill, the oom killer first attempts to sacrifice a child not sharing its parent's memory instead. Unfortunately, this often kills in a seemingly random fashion based on the ordering of the selected task's child list. Additionally, it is not guaranteed at all to free a large amount of memory that we need to prevent additional oom killing in the very near future. Instead, we now only attempt to sacrifice the worst child not sharing its parent's memory, if one exists. The worst child is indicated with the highest badness() score. This serves two advantages: we kill a memory-hogging task more often, and we allow the configurable /proc/pid/oom_adj value to be considered as a factor in which child to kill. Reviewers may observe that the previous implementation would iterate through the children and attempt to kill each until one was successful and then the parent if none were found while the new code simply kills the most memory-hogging task or the parent. Note that the only time oom_kill_task() fails, however, is when a child does not have an mm or has a /proc/pid/oom_adj of OOM_DISABLE. badness() returns 0 for both cases, so the final oom_kill_task() will always succeed. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Nick Piggin <npiggin@suse.de> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09oom: filter tasks not sharing the same cpusetDavid Rientjes
Tasks that do not share the same set of allowed nodes with the task that triggered the oom should not be considered as candidates for oom kill. Tasks in other cpusets with a disjoint set of mems would be unfairly penalized otherwise because of oom conditions elsewhere; an extreme example could unfairly kill all other applications on the system if a single task in a user's cpuset sets itself to OOM_DISABLE and then uses more memory than allowed. Killing tasks outside of current's cpuset rarely would free memory for current anyway. To use a sane heuristic, we must ensure that killing a task would likely free memory for current and avoid needlessly killing others at all costs just because their potential memory freeing is unknown. It is better to kill current than another task needlessly. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Nick Piggin <npiggin@suse.de> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09oom: avoid sending exiting tasks a SIGKILLDavid Rientjes
It's unnecessary to SIGKILL a task that is already PF_EXITING and can actually cause a NULL pointer dereference of the sighand if it has already been detached. Instead, simply set TIF_MEMDIE so it has access to memory reserves and can quickly exit as the comment implies. Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09oom: give current access to memory reserves if it has been killedDavid Rientjes
It's possible to livelock the page allocator if a thread has mm->mmap_sem and fails to make forward progress because the oom killer selects another thread sharing the same ->mm to kill that cannot exit until the semaphore is dropped. The oom killer will not kill multiple tasks at the same time; each oom killed task must exit before another task may be killed. Thus, if one thread is holding mm->mmap_sem and cannot allocate memory, all threads sharing the same ->mm are blocked from exiting as well. In the oom kill case, that means the thread holding mm->mmap_sem will never free additional memory since it cannot get access to memory reserves and the thread that depends on it with access to memory reserves cannot exit because it cannot acquire the semaphore. Thus, the page allocators livelocks. When the oom killer is called and current happens to have a pending SIGKILL, this patch automatically gives it access to memory reserves and returns. Upon returning to the page allocator, its allocation will hopefully succeed so it can quickly exit and free its memory. If not, the page allocator will fail the allocation if it is not __GFP_NOFAIL. Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09oom: dump_tasks use find_lock_task_mm too fixDavid Rientjes
When find_lock_task_mm() returns a thread other than p in dump_tasks(), its name should be displayed instead. This is the thread that will be targeted by the oom killer, not its mm-less parent. This also allows us to safely dereference task->comm without needing get_task_comm(). While we're here, remove the cast on task_cpu(task) as Andrew suggested. Signed-off-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09oom: improve commentary in dump_tasks()David Rientjes
The comments in dump_tasks() should be updated to be more clear about why tasks are filtered and how they are filtered by its argument. An unnecessary comment concerning a check for is_global_init() is removed since it isn't of importance. Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09oom: dump_tasks use find_lock_task_mm tooKOSAKI Motohiro
dump_task() should use find_lock_task_mm() too. It is necessary for protecting task-exiting race. dump_tasks() currently filters any task that does not have an attached ->mm since it incorrectly assumes that it must either be in the process of exiting and has detached its memory or that it's a kernel thread; multithreaded tasks may actually have subthreads that have a valid ->mm pointer and thus those threads should actually be displayed. This change finds those threads, if they exist, and emit their information along with the rest of the candidate tasks for kill. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09oom: introduce find_lock_task_mm() to fix !mm false positivesOleg Nesterov
Almost all ->mm == NULL checks in oom_kill.c are wrong. The current code assumes that the task without ->mm has already released its memory and ignores the process. However this is not necessarily true when this process is multithreaded, other live sub-threads can use this ->mm. - Remove the "if (!p->mm)" check in select_bad_process(), it is just wrong. - Add the new helper, find_lock_task_mm(), which finds the live thread which uses the memory and takes task_lock() to pin ->mm - change oom_badness() to use this helper instead of just checking ->mm != NULL. - As David pointed out, select_bad_process() must never choose the task without ->mm, but no matter what oom_badness() returns the task can be chosen if nothing else has been found yet. Change oom_badness() to return int, change it to return -1 if find_lock_task_mm() fails, and change select_bad_process() to check points >= 0. Note! This patch is not enough, we need more changes. - oom_badness() was fixed, but oom_kill_task() still ignores the task without ->mm - oom_forkbomb_penalty() should use find_lock_task_mm() too, and it also needs other changes to actually find the first first-descendant children This will be addressed later. [kosaki.motohiro@jp.fujitsu.com: use in badness(), __oom_kill_task()] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09oom: PF_EXITING check should take mm into accountOleg Nesterov
select_bad_process() checks PF_EXITING to detect the task which is going to release its memory, but the logic is very wrong. - a single process P with the dead group leader disables select_bad_process() completely, it will always return ERR_PTR() while P can live forever - if the PF_EXITING task has already released its ->mm it doesn't make sense to expect it is goiing to free more memory (except task_struct/etc) Change the code to ignore the PF_EXITING tasks without ->mm. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09oom: check PF_KTHREAD instead of !mm to skip kthreadsOleg Nesterov
select_bad_process() thinks a kernel thread can't have ->mm != NULL, this is not true due to use_mm(). Change the code to check PF_KTHREAD. Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09buffer_head: remove redundant test from wait_on_bufferRichard Kennedy
The comment suggests that when b_count equals zero it is calling __wait_no_buffer to trigger some debug, but as there is no debug in __wait_on_buffer the whole thing is redundant. AFAICT from the git log this has been the case for at least 5 years, so it seems safe just to remove this. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09mm: extend KSM refcounts to the anon_vma rootRik van Riel
KSM reference counts can cause an anon_vma to exist after the processe it belongs to have already exited. Because the anon_vma lock now lives in the root anon_vma, we need to ensure that the root anon_vma stays around until after all the "child" anon_vmas have been freed. The obvious way to do this is to have a "child" anon_vma take a reference to the root in anon_vma_fork. When the anon_vma is freed at munmap or process exit, we drop the refcount in anon_vma_unlink and possibly free the root anon_vma. The KSM anon_vma reference count function also needs to be modified to deal with the possibility of freeing 2 levels of anon_vma. The easiest way to do this is to break out the KSM magic and make it generic. When compiling without CONFIG_KSM, this code is compiled out. Signed-off-by: Rik van Riel <riel@redhat.com> Tested-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09mm: always lock the root (oldest) anon_vmaRik van Riel
Always (and only) lock the root (oldest) anon_vma whenever we do something in an anon_vma. The recently introduced anon_vma scalability is due to the rmap code scanning only the VMAs that need to be scanned. Many common operations still took the anon_vma lock on the root anon_vma, so always taking that lock is not expected to introduce any scalability issues. However, always taking the same lock does mean we only need to take one lock, which means rmap_walk on pages from any anon_vma in the vma is excluded from occurring during an munmap, expand_stack or other operation that needs to exclude rmap_walk and similar functions. Also add the proper locking to vma_adjust. Signed-off-by: Rik van Riel <riel@redhat.com> Tested-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09mm: track the root (oldest) anon_vmaRik van Riel
Track the root (oldest) anon_vma in each anon_vma tree. Because we only take the lock on the root anon_vma, we cannot use the lock on higher-up anon_vmas to lock anything. This makes it impossible to do an indirect lookup of the root anon_vma, since the data structures could go away from under us. However, a direct pointer is safe because the root anon_vma is always the last one that gets freed on munmap or exit, by virtue of the same_vma list order and unlink_anon_vmas walking the list forward. [akpm@linux-foundation.org: fix typo] Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09mm: change direct call of spin_lock(anon_vma->lock) to inline functionRik van Riel
Subsitute a direct call of spin_lock(anon_vma->lock) with an inline function doing exactly the same. This makes it easier to do the substitution to the root anon_vma lock in a following patch. We will deal with the handful of special locks (nested, dec_and_lock, etc) separately. Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09mm: rename anon_vma_lock to vma_lock_anon_vmaRik van Riel
Rename anon_vma_lock to vma_lock_anon_vma. This matches the naming style used in page_lock_anon_vma and will come in really handy further down in this patch series. Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09kmap_atomic: make kunmap_atomic() harder to misuseCesar Eduardo Barros
kunmap_atomic() is currently at level -4 on Rusty's "Hard To Misuse" list[1] ("Follow common convention and you'll get it wrong"), except in some architectures when CONFIG_DEBUG_HIGHMEM is set[2][3]. kunmap() takes a pointer to a struct page; kunmap_atomic(), however, takes takes a pointer to within the page itself. This seems to once in a while trip people up (the convention they are following is the one from kunmap()). Make it much harder to misuse, by moving it to level 9 on Rusty's list[4] ("The compiler/linker won't let you get it wrong"). This is done by refusing to build if the type of its first argument is a pointer to a struct page. The real kunmap_atomic() is renamed to kunmap_atomic_notypecheck() (which is what you would call in case for some strange reason calling it with a pointer to a struct page is not incorrect in your code). The previous version of this patch was compile tested on x86-64. [1] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html [2] In these cases, it is at level 5, "Do it right or it will always break at runtime." [3] At least mips and powerpc look very similar, and sparc also seems to share a common ancestor with both; there seems to be quite some degree of copy-and-paste coding here. The include/asm/highmem.h file for these three archs mention x86 CPUs at its top. [4] http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html [5] As an aside, could someone tell me why mn10300 uses unsigned long as the first parameter of kunmap_atomic() instead of void *? Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net> Cc: Russell King <linux@arm.linux.org.uk> (arch/arm) Cc: Ralf Baechle <ralf@linux-mips.org> (arch/mips) Cc: David Howells <dhowells@redhat.com> (arch/frv, arch/mn10300) Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> (arch/mn10300) Cc: Kyle McMartin <kyle@mcmartin.ca> (arch/parisc) Cc: Helge Deller <deller@gmx.de> (arch/parisc) Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> (arch/parisc) Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> (arch/powerpc) Cc: Paul Mackerras <paulus@samba.org> (arch/powerpc) Cc: "David S. Miller" <davem@davemloft.net> (arch/sparc) Cc: Thomas Gleixner <tglx@linutronix.de> (arch/x86) Cc: Ingo Molnar <mingo@redhat.com> (arch/x86) Cc: "H. Peter Anvin" <hpa@zytor.com> (arch/x86) Cc: Arnd Bergmann <arnd@arndb.de> (include/asm-generic) Cc: Rusty Russell <rusty@rustcorp.com.au> ("Hard To Misuse" list) Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09hugetlb: call mmu notifiers on hugepage cowDoug Doan
When a copy-on-write occurs, we take one of two paths in handle_mm_fault: through handle_pte_fault for normal pages, or through hugetlb_fault for huge pages. In the normal page case, we eventually get to do_wp_page and call mmu notifiers via ptep_clear_flush_notify. There is no callout to the mmmu notifiers in the huge page case. This patch fixes that. Signed-off-by: Doug Doan <dougd@cray.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09mm: provide init_mm mm_context initializerHeiko Carstens
Provide an INIT_MM_CONTEXT intializer macro which can be used to statically initialize mm_struct:mm_context of init_mm. This way we can get rid of code which will do the initialization at run time (on s390). In addition the current code can be found at a place where it is not expected. So let's have a common initializer which architectures can use if needed. This is based on a patch from Suzuki Poulose. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Suzuki Poulose <suzuki@in.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09mm: use ERR_CASTJulia Lawall
Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more clear what is the purpose of the operation, which otherwise looks like a no-op. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; T x; identifier f; @@ T f (...) { <+... - ERR_PTR(PTR_ERR(x)) + x ...+> } @@ expression x; @@ - ERR_PTR(PTR_ERR(x)) + ERR_CAST(x) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09mm: use memdup_userJulia Lawall
Use memdup_user when user data is immediately copied into the allocated region. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression from,to,size,flag; position p; identifier l1,l2; @@ - to = \(kmalloc@p\|kzalloc@p\)(size,flag); + to = memdup_user(from,size); if ( - to==NULL + IS_ERR(to) || ...) { <+... when != goto l1; - -ENOMEM + PTR_ERR(to) ...+> } - if (copy_from_user(to, from, size) != 0) { - <+... when != goto l2; - -EFAULT - ...+> - } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09asm-generic: use raw_local_irq_save/restore instead local_irq_save/restoreMichal Simek
The start/stop_critical_timing functions for preemptirqsoff, preemptoff and irqsoff tracers contain atomic_inc() and atomic_dec() operations. Atomic operations use local_irq_save/restore macros to ensure atomic access but they are traced by the same function which is causing recursion problem. The reason is when these tracers are turn ON then the local_irq_save/restore macros are changed in include/linux/irqflags.h to call trace_hardirqs_on/off which call start/stop_critical_timing. Microblaze was affected because it uses generic atomic implementation. Signed-off-by: Michal Simek <monstr@monstr.eu> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09drivers/video/w100fb.c: ignore void return value / fix build failurePeter Huewe
Fix a build failure "error: void value not ignored as it ought to be" by removing an assignment of a void return value. The functionality of the code is not changed. Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Acked-by: Henrik Kretzschmar <henne@nachtwindheim.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09ipmi: fix ACPI detection with regspacingYinghai Lu
After the commit that changed ipmi_si detecting sequence from SMBIOS/ACPI to ACPI/SMBIOS, | commit 754d453185275951d39792865927ec494fa1ebd8 | Author: Matthew Garrett <mjg@redhat.com> | Date: Wed May 26 14:43:47 2010 -0700 | | ipmi: change device discovery order | | The ipmi spec provides an ordering for si discovery. Change the driver to | match, with the exception of preferring smbios to SPMI as HPs (at least) | contain accurate information in the former but not the latter. ipmi_si can not be initialized. [ 138.799739] calling init_ipmi_devintf+0x0/0x109 @ 1 [ 138.805050] ipmi device interface [ 138.818131] initcall init_ipmi_devintf+0x0/0x109 returned 0 after 12797 usecs [ 138.822998] calling init_ipmi_si+0x0/0xa90 @ 1 [ 138.840276] IPMI System Interface driver. [ 138.846137] ipmi_si: probing via ACPI [ 138.849225] ipmi_si 00:09: [io 0x0ca2] regsize 1 spacing 1 irq 0 [ 138.864438] ipmi_si: Adding ACPI-specified kcs state machine [ 138.870893] ipmi_si: probing via SMBIOS [ 138.880945] ipmi_si: Adding SMBIOS-specified kcs state machineipmi_si: duplicate interface [ 138.896511] ipmi_si: probing via SPMI [ 138.899861] ipmi_si: Adding SPMI-specified kcs state machineipmi_si: duplicate interface [ 138.917095] ipmi_si: Trying ACPI-specified kcs state machine at i/o address 0xca2, slave address 0x0, irq 0 [ 138.928658] ipmi_si: Interface detection failed [ 138.953411] initcall init_ipmi_si+0x0/0xa90 returned 0 after 110847 usecs in smbios has DMI/SMBIOS Handle 0x00C5, DMI type 38, 18 bytes IPMI Device Information Interface Type: KCS (Keyboard Control Style) Specification Version: 2.0 I2C Slave Address: 0x00 NV Storage Device: Not Present Base Address: 0x0000000000000CA2 (I/O) Register Spacing: 32-bit Boundaries in DSDT has Device (BMC) { Name (_HID, EisaId ("IPI0001")) Method (_STA, 0, NotSerialized) { If (LEqual (OSN, Zero)) { Return (Zero) } Return (0x0F) } Name (_STR, Unicode ("IPMI_KCS")) Name (_UID, Zero) Name (_CRS, ResourceTemplate () { IO (Decode16, 0x0CA2, // Range Minimum 0x0CA2, // Range Maximum 0x00, // Alignment 0x01, // Length ) IO (Decode16, 0x0CA6, // Range Minimum 0x0CA6, // Range Maximum 0x00, // Alignment 0x01, // Length ) }) Method (_IFT, 0, NotSerialized) { Return (One) } Method (_SRV, 0, NotSerialized) { Return (0x0200) } } so the reg spacing should be 4 instead of 1. Try to calculate regspacing for this kind of system. Observed on a Sun Fire X4800. Other OSes work and pass certification. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Acked-by: Matthew Garrett <mjg@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Myron Stowe <myron.stowe@hp.com> Cc: Corey Minyard <minyard@acm.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: drm: fix fallouts from slow-work -> wq conversion workqueue: workqueue_cpu_callback() should be cpu_notifier instead of hotcpu_notifier workqueue: add missing __percpu markup in kernel/workqueue.c
2010-08-09drm: fix fallouts from slow-work -> wq conversionTejun Heo
Commit 991ea75c (drm: use workqueue instead of slow-work), which made drm to use wq instead of slow-work, didn't account for the return value difference between delayed_slow_work_enqueue() and queue_delayed_work(). The former returns 0 on success and -errno on failures while the latter never fails and only uses the return value to indicate whether the work was already pending or not. This misconversion triggered spurious error messages. Remove the now unnecessary return value check and error message. Markus: caught another incorrect conversion in drm_kms_helper_poll_enable() Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: David Airlie <airlied@linux.ie> Cc: dri-devel@lists.freedesktop.org
2010-08-09workqueue: workqueue_cpu_callback() should be cpu_notifier instead of ↵Tejun Heo
hotcpu_notifier Commit 6ee0578b (workqueue: mark init_workqueues as early_initcall) made workqueue SMP initialization depend on workqueue_cpu_callback(), which however was registered as hotcpu_notifier() and didn't get called if CONFIG_HOTPLUG_CPU is not set. This made gcwqs on non-boot CPUs not create their initial workers leading to boot failures. Fix it by making it a cpu_notifier. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-bisected-by: walt <w41ter@gmail.com> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
2010-08-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tileLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: arch/tile: check kmalloc() result arch/tile: catch up on various minor cleanups. arch/tile: avoid erroneous error return for PTRACE_POKEUSR. tile: set ARCH_KMALLOC_MINALIGN tile: remove homegrown L1_CACHE_ALIGN macro arch/tile: Miscellaneous cleanup changes. arch/tile: Split the icache flush code off to a generic <arch> header. arch/tile: Fix bug in support for atomic64_xx() ops. arch/tile: Shrink the tile-opcode files considerably. arch/tile: Add driver to enable access to the user dynamic network. arch/tile: Enable more sophisticated IRQ model for 32-bit chips. Move list types from <linux/list.h> to <linux/types.h>. Add wait4() back to the set of <asm-generic/unistd.h> syscalls. Revert adding some arch-specific signal syscalls to <linux/syscalls.h>. arch/tile: Do not use GFP_KERNEL for dma_alloc_coherent(). Feedback from fujita.tomonori@lab.ntt.co.jp. arch/tile: core support for Tilera 32-bit chips. Fix up the "generic" unistd.h ABI to be more useful.
2010-08-08Merge branch 'for-linus' of git://www.jni.nu/crisLinus Torvalds
* 'for-linus' of git://www.jni.nu/cris: (51 commits) CRIS: Fix alignment problem for older ld CRIS: Always dump registers for segfaulting process. CRIS: Add config for pausing a seg-faulting process CRIS: Don't take faults while in_atomic CRIS: Fixup lookup for delay slot faults CRIS: Discard exit.text and .data at runtime CRIS: Add cache aligned and read mostly data sections CRIS: Return something from profile write CRIS: Add ARTPEC-3 and timestamps for sync-serial CRIS: Better ARTPEC-3 support for gpio CRIS: Add include guard CRIS: Better handling of pinmux settings CRIS: New DMA defines for ARTPEC-3 CRIS: __do_strncpy_from_user: Don't read the byte beyond the nil CRIS: Pagetable for ARTPEC-3 CRIS: Machine dependent memmap.h CRIS: Check if pointer is set before using it CRIS: Machine dependent dma.h CRIS: Define __read_mostly for CRISv32 CRIS: Discard .note.gnu.build-id section ...
2010-08-08Merge branch 'for-linus' of git://android.kernel.org/kernel/tegraLinus Torvalds
* 'for-linus' of git://android.kernel.org/kernel/tegra: [ARM] tegra: add MAINTAINERS entry [ARM] tegra: harmony: Add harmony board file [ARM] tegra: add pinmux support [ARM] tegra: add GPIO support [ARM] tegra: Add timer support [ARM] tegra: SMP support [ARM] tegra: Add clock support [ARM] tegra: Add IRQ support [ARM] tegra: initial tegra support
2010-08-08Merge branch 'for-linus' of git://github.com/schandinat/linux-2.6Linus Torvalds
* 'for-linus' of git://github.com/schandinat/linux-2.6: drivers/video/via/via-gpio.c: fix warning viafb: Depends on X86 fbdev: section cleanup in viafb driver viafb: fix accel_flags check_var bug viafb: probe cleanups viafb: remove ioctls which break the framebuffer interface viafb: update fix before calculating depth viafb: PLL value cleanup viafb: simplify lcd size "detection" viafb: fix PCI table viafb: add lcd scaling support for some IGPs viafb: improve lcd code readability viafb: remove duplicated scaling code MAINTAINERS: update viafb entry
2010-08-08Merge branch 'for-linus' of git://gitorious.org/linux-omap-dss2/linuxLinus Torvalds
* 'for-linus' of git://gitorious.org/linux-omap-dss2/linux: (64 commits) OMAP: DSS2: OMAPFB: add support for FBIO_WAITFORVSYNC OMAP: DSS2: Replace strncmp() with sysfs_streq() in overlay_manager_store() OMAP: DSS2: Fix error path in omap_dsi_update() OMAP: DSS2: TDO35S: fix video signaling OMAP: DSS2: OMAPFB: Fix invalid bpp for PAL and NTSC modes OMAP: DSS2: OMAPFB: Fix probe error path OMAP3EVM: Replace vdvi regulator supply with vdds_dsi OMAP: DSS2: Remove extra return statement OMAP: DSS2: adjust YUV overlay width to be even OMAP: DSS2: OMAPFB: Fix sysfs mirror input check OMAP: DSS2: OMAPFB: Remove redundant color register range check OMAP: DSS2: OMAPFB: Remove redundant rotate range check OMAP: DSS2: OMAPFB: Check fb2display() return value OMAP: DSS2: Taal: Optimize enable_te, rotate, mirror when value unchanged OMAP: DSS2: DSI: detect unsupported update requests OMAP: DSS2: DSI: increase FIFO low threshold OMAP: DSS2: DSI: Add error IRQ mask for DSI complexIO OMAP: DSS2: DSI: Remove BTA after set_max_rx_packet_size OMAP: DSS2: change manual update scaling setup OMAP: DSS2: DSI: use BTA to end the frame transfer ...
2010-08-08Merge branch 'omap-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6 * 'omap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6: (143 commits) omap: mailbox: reorganize headers omap: mailbox: standarize on 'omap-mailbox' omap: mailbox: only compile for configured archs omap: mailbox: simplify omap_mbox_register() omap: mailbox: reorganize registering omap: mailbox: add IRQ names omap: mailbox: remove unecessary fields omap: mailbox: don't export unecessary symbols omap: mailbox: update omap1 probing omap: mailbox: use correct config for omap1 omap: mailbox: 2420 should be detected at run-time omap: mailbox: reorganize structures omap: mailbox: trivial cleanups omap mailbox: Set a device in logical mbox instance for traceability omap: mailbox: convert block api to kfifo omap: mailbox: remove (un)likely macros from cold paths omap: mailbox cleanup: split MODULE_AUTHOR line omap: mailbox: convert rwlocks to spinlock Mailbox: disable mailbox interrupt when request queue Mailbox: new mutext lock for h/w mailbox configuration ...
2010-08-08Merge branch 'davinci-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-davinci * 'davinci-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-davinci: davinci: dm646x EVM: Specify reserved EDMA channel/slots davinci: da8xx/omapl EVM: Specify reserved channels/slots davinci: support for EDMA resource sharing davinci: edma: provide ability to detect insufficient CC info data davinci: da8xx: sparse cleanup: remove duplicate entries in irq priorities davinci: DM365: fixed second serial port Davinci: tnetv107x evm board initial support Davinci: tnetv107x initial gpio support Davinci: tnetv107x soc support Davinci: tnetv107x decompresser uart definitions Davinci: generalized debug macros
2010-08-08workqueue: add missing __percpu markup in kernel/workqueue.cNamhyung Kim
works in schecule_on_each_cpu() is a percpu pointer but was missing __percpu markup. Add it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-08-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: (82 commits) firewire: core: add forgotten dummy driver methods, remove unused ones firewire: add isochronous multichannel reception firewire: core: small clarifications in core-cdev firewire: core: remove unused code firewire: ohci: release channel in error path firewire: ohci: use memory barriers to order descriptor updates tools/firewire: nosy-dump: increment program version tools/firewire: nosy-dump: remove unused code tools/firewire: nosy-dump: use linux/firewire-constants.h tools/firewire: nosy-dump: break up a deeply nested function tools/firewire: nosy-dump: make some symbols static or const tools/firewire: nosy-dump: change to kernel coding style tools/firewire: nosy-dump: work around segfault in decode_fcp tools/firewire: nosy-dump: fix it on x86-64 tools/firewire: add userspace front-end of nosy firewire: nosy: note ioctls in ioctl-number.txt firewire: nosy: use generic printk macros firewire: nosy: endianess fixes and annotations firewire: nosy: annotate __user pointers and __iomem pointers firewire: nosy: fix device shutdown with active client ...
2010-08-07Merge branch 'acpica' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (27 commits) ACPI / ACPICA: Simplify acpi_ev_initialize_gpe_block() ACPI / ACPICA: Fail acpi_gpe_wakeup() if ACPI_GPE_CAN_WAKE is unset ACPI / ACPICA: Do not execute _PRW methods during initialization ACPI: Fix bogus GPE test in acpi_bus_set_run_wake_flags() ACPICA: Update version to 20100702 ACPICA: Fix for Alias references within Package objects ACPICA: Fix lint warning for 64-bit constant ACPICA: Remove obsolete GPE function ACPICA: Update debug output components ACPICA: Add support for WDDT - Watchdog Descriptor Table ACPICA: Drop acpi_set_gpe ACPICA: Use low-level GPE enable during GPE block initialization ACPI / EC: Do not use acpi_set_gpe ACPI / EC: Drop suspend and resume routines ACPICA: Remove wakeup GPE reference counting which is not used ACPICA: Introduce acpi_gpe_wakeup() ACPICA: Rename acpi_hw_gpe_register_bit ACPICA: Update version to 20100528 ACPICA: Add signatures for undefined tables: ATKG, GSCI, IEIT ACPICA: Optimization: Reduce the number of namespace walks ...
2010-08-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (42 commits) IB/qib: Add missing <linux/slab.h> include IB/ehca: Drop unnecessary NULL test RDMA/nes: Fix confusing if statement indentation IB/ehca: Init irq tasklet before irq can happen RDMA/nes: Fix misindented code RDMA/nes: Fix showing wqm_quanta RDMA/nes: Get rid of "set but not used" variables RDMA/nes: Read firmware version from correct place IB/srp: Export req_lim via sysfs IB/srp: Make receive buffer handling more robust IB/srp: Use print_hex_dump() IB: Rename RAW_ETY to RAW_ETHERTYPE RDMA/nes: Fix two sparse warnings RDMA/cxgb3: Make needlessly global iwch_l2t_send() static IB/iser: Make needlessly global iser_alloc_rx_descriptors() static RDMA/cxgb4: Add timeouts when waiting for FW responses IB/qib: Fix race between qib_error_qp() and receive packet processing IB/qib: Limit the number of packets processed per interrupt IB/qib: Allow writes to the diag_counters to be able to clear them IB/qib: Set cfgctxts to number of CPUs by default ...
2010-08-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (214 commits) ALSA: hda - Add pin-fix for HP dc5750 ALSA: als4000: Fix potentially invalid DMA mode setup ALSA: als4000: enable burst mode ALSA: hda - Fix initial capsrc selection in patch_alc269() ASoC: TWL4030: Capture route runtime DAPM ordering fix ALSA: hda - Add PC-beep whitelist for an Intel board ALSA: hda - More relax for pending period handling ALSA: hda - Define AC_FMT_* constants ALSA: hda - Fix beep frequency on IDT 92HD73xx and 92HD71Bxx codecs ALSA: hda - Add support for HDMI HBR passthrough ALSA: hda - Set Stream Type in Stream Format according to AES0 ALSA: hda - Fix Thinkpad X300 so SPDIF is not exposed ALSA: hda - FIX to not expose SPDIF on Thinkpad X301, since it does not have the ability to use SPDIF ASoC: wm9081: fix resource reclaim in wm9081_register error path ASoC: wm8978: fix a memory leak if a wm8978_register fail ASoC: wm8974: fix a memory leak if another WM8974 is registered ASoC: wm8961: fix resource reclaim in wm8961_register error path ASoC: wm8955: fix resource reclaim in wm8955_register error path ASoC: wm8940: fix a memory leak if wm8940_register return error ASoC: wm8904: fix resource reclaim in wm8904_register error path ...
2010-08-07Merge branch 'bkl/core' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: do_coredump: Do not take BKL init: Remove the BKL from startup code
2010-08-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: SELINUX: Fix build error.
2010-08-07Merge branch 'for-2.6.36' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux: (34 commits) nfsd4: fix file open accounting for RDWR opens nfsd: don't allow setting maxblksize after svc created nfsd: initialize nfsd versions before creating svc net: sunrpc: removed duplicated #include nfsd41: Fix a crash when a callback is retried nfsd: fix startup/shutdown order bug nfsd: minor nfsd read api cleanup gcc-4.6: nfsd: fix initialized but not read warnings nfsd4: share file descriptors between stateid's nfsd4: fix openmode checking on IO using lock stateid nfsd4: miscellaneous process_open2 cleanup nfsd4: don't pretend to support write delegations nfsd: bypass readahead cache when have struct file nfsd: minor nfsd_svc() cleanup nfsd: move more into nfsd_startup() nfsd: just keep single lockd reference for nfsd nfsd: clean up nfsd_create_serv error handling nfsd: fix error handling in __write_ports_addxprt nfsd: fix error handling when starting nfsd with rpcbind down nfsd4: fix v4 state shutdown error paths ...
2010-08-07AFS: Fix the module init error handlingDavid Howells
Fix the module init error handling. There are a bunch of goto labels for aborting the init procedure at different points and just undoing what needs undoing - they aren't all in the right places, however. This can lead to an oops like the following: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffff81042a31>] destroy_workqueue+0x17/0xc0 ... Modules linked in: kafs(+) dns_resolver rxkad af_rxrpc fscache Pid: 2171, comm: insmod Not tainted 2.6.35-cachefs+ #319 DG965RY/ ... Process insmod (pid: 2171, threadinfo ffff88003ca6a000, task ffff88003dcc3050) ... Call Trace: [<ffffffffa0055994>] afs_callback_update_kill+0x10/0x12 [kafs] [<ffffffffa007d1c5>] afs_init+0x190/0x1ce [kafs] [<ffffffffa007d035>] ? afs_init+0x0/0x1ce [kafs] [<ffffffff810001ef>] do_one_initcall+0x59/0x14e [<ffffffff8105f7ee>] sys_init_module+0x9c/0x1de [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-07Merge branch 'nfs-for-2.6.36' of ↵Linus Torvalds
git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.36' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (42 commits) NFS: NFSv4.1 is no longer a "developer only" feature NFS: NFS_V4 is no longer an EXPERIMENTAL feature NFS: Fix /proc/mount for legacy binary interface NFS: Fix the locking in nfs4_callback_getattr SUNRPC: Defer deleting the security context until gss_do_free_ctx() SUNRPC: prevent task_cleanup running on freed xprt SUNRPC: Reduce asynchronous RPC task stack usage SUNRPC: Move the bound cred to struct rpc_rqst SUNRPC: Clean up of rpc_bindcred() SUNRPC: Move remaining RPC client related task initialisation into clnt.c SUNRPC: Ensure that rpc_exit() always wakes up a sleeping task SUNRPC: Make the credential cache hashtable size configurable SUNRPC: Store the hashtable size in struct rpc_cred_cache NFS: Ensure the AUTH_UNIX credcache is allocated dynamically NFS: Fix the NFS users of rpc_restart_call() SUNRPC: The function rpc_restart_call() should return success/failure NFSv4: Get rid of the bogus RPC_ASSASSINATED(task) checks NFSv4: Clean up the process of renewing the NFSv4 lease NFSv4.1: Handle NFS4ERR_DELAY on SEQUENCE correctly NFS: nfs_rename() should not have to flush out writebacks ...