linux/kernel, branch v2.6.32.12

sched: Use proper type in sched_getaffinity()

2010-04-26T14:41:37Z

commit 8bc037fb89bb3104b9ae290d18c877624cd7d9cc upstream. Using the proper type fixes the following compiler warning: kernel/sched.c:4850: warning: comparison of distinct pointer types lacks a cast Signed-off-by: KOSAKI Motohiro Cc: torvalds@linux-foundation.org Cc: travis@sgi.com Cc: peterz@infradead.org Cc: drepper@redhat.com Cc: rja@sgi.com Cc: sharyath@in.ibm.com Cc: steiner@sgi.com LKML-Reference: <20100317090046.4C79.A69D9226@jp.fujitsu.com> Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman

lockdep: fix incorrect percpu usage

2010-04-26T14:41:35Z

The mainline kernel as of 2.6.34-rc5 is not affected by this problem because commit 10fad5e46f6c7bdfb01b1a012380a38e3c6ab346 fixed it by refactoring. lockdep fix incorrect percpu usage Should use per_cpu_ptr() to obfuscate the per cpu pointers (RELOC_HIDE is needed for per cpu pointers). git blame points to commit: lockdep.c: commit 8e18257d29238311e82085152741f0c3aa18b74d But it's really just moving the code around. But it's enough to say that the problems appeared before Jul 19 01:48:54 2007, which brings us back to 2.6.23. It should be applied to stable 2.6.23.x to 2.6.33.x (or whichever of these stable branches are still maintained). (tested on 2.6.33.1 x86_64) Signed-off-by: Mathieu Desnoyers CC: Randy Dunlap CC: Eric Dumazet CC: Rusty Russell CC: Peter Zijlstra CC: Tejun Heo CC: Ingo Molnar CC: Andrew Morton CC: Linus Torvalds CC: Greg Kroah-Hartman CC: Steven Rostedt Signed-off-by: Greg Kroah-Hartman

modules: fix incorrect percpu usage

2010-04-26T14:41:34Z

Mainline does not need this fix, as commit 259354deaaf03d49a02dbb9975d6ec2a54675672 fixed the problem by refactoring. Should use per_cpu_ptr() to obfuscate the per cpu pointers (RELOC_HIDE is needed for per cpu pointers). Introduced by commit: module.c: commit 6b588c18f8dacfa6d7957c33c5ff832096e752d3 This patch should be queued for the stable branch, for kernels 2.6.29.x to 2.6.33.x. (tested on 2.6.33.1 x86_64) Signed-off-by: Mathieu Desnoyers CC: Randy Dunlap CC: Eric Dumazet CC: Rusty Russell CC: Peter Zijlstra CC: Tejun Heo CC: Ingo Molnar CC: Andrew Morton CC: Linus Torvalds CC: Greg Kroah-Hartman CC: Steven Rostedt Signed-off-by: Greg Kroah-Hartman

sched: Fix a race between ttwu() and migrate_task()

2010-04-26T14:41:33Z

Based on commit e2912009fb7b715728311b0d8fe327a1432b3f79 upstream, but done differently as this issue is not present in .33 or .34 kernels due to rework in this area. If a task is in the TASK_WAITING state, then try_to_wake_up() is working on it, and it will place it on the correct cpu. This commit ensures that neither migrate_task() nor __migrate_task() calls set_task_cpu(p) while p is in the TASK_WAKING state. Otherwise, there could be two concurrent calls to set_task_cpu(p), resulting in the task's cfs_rq being inconsistent with its cpu. Signed-off-by: John Wright Cc: Ingo Molnar Cc: Peter Zijlstra Signed-off-by: Greg Kroah-Hartman

sched: Fix sched_getaffinity()

2010-04-26T14:41:25Z

commit 84fba5ec91f11c0efb27d0ed6098f7447491f0df upstream. taskset on 2.6.34-rc3 fails on one of my ppc64 test boxes with the following error: sched_getaffinity(0, 16, 0x10029650030) = -1 EINVAL (Invalid argument) This box has 128 threads and 16 bytes is enough to cover it. Commit cd3d8031eb4311e516329aee03c79a08333141f1 (sched: sched_getaffinity(): Allow less than NR_CPUS length) is comparing this 16 bytes agains nr_cpu_ids. Fix it by comparing nr_cpu_ids to the number of bits in the cpumask we pass in. Signed-off-by: Anton Blanchard Reviewed-by: KOSAKI Motohiro Cc: Sharyathi Nagesh Cc: Ulrich Drepper Cc: Peter Zijlstra Cc: Linus Torvalds Cc: Jack Steiner Cc: Russ Anderson Cc: Mike Travis LKML-Reference: <20100406070218.GM5594@kryten> Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman

sched: sched_getaffinity(): Allow less than NR_CPUS length

2010-04-26T14:41:25Z

commit cd3d8031eb4311e516329aee03c79a08333141f1 upstream. [ Note, this commit changes the syscall ABI for > 1024 CPUs systems. ] Recently, some distro decided to use NR_CPUS=4096 for mysterious reasons. Unfortunately, glibc sched interface has the following definition: # define __CPU_SETSIZE 1024 # define __NCPUBITS (8 * sizeof (__cpu_mask)) typedef unsigned long int __cpu_mask; typedef struct { __cpu_mask __bits[__CPU_SETSIZE / __NCPUBITS]; } cpu_set_t; It mean, if NR_CPUS is bigger than 1024, cpu_set_t makes an ABI issue ... More recently, Sharyathi Nagesh reported following test program makes misterious syscall failure: ----------------------------------------------------------------------- #define _GNU_SOURCE #include #include #include int main() { cpu_set_t set; if (sched_getaffinity(0, sizeof(cpu_set_t), &set) < 0) printf("\n Call is failing with:%d", errno); } ----------------------------------------------------------------------- Because the kernel assumes len argument of sched_getaffinity() is bigger than NR_CPUS. But now it is not correct. Now we are faced with the following annoying dilemma, due to the limitations of the glibc interface built in years ago: (1) if we change glibc's __CPU_SETSIZE definition, we lost binary compatibility of _all_ application. (2) if we don't change it, we also lost binary compatibility of Sharyathi's use case. Then, I would propse to change the rule of the len argument of sched_getaffinity(). Old: len should be bigger than NR_CPUS New: len should be bigger than maximum possible cpu id This creates the following behavior: (A) In the real 4096 cpus machine, the above test program still return -EINVAL. (B) NR_CPUS=4096 but the machine have less than 1024 cpus (almost all machines in the world), the above can run successfully. Fortunatelly, BIG SGI machine is mainly used for HPC use case. It means they can rebuild their programs. IOW we hope they are not annoyed by this issue ... Reported-by: Sharyathi Nagesh Signed-off-by: KOSAKI Motohiro Acked-by: Ulrich Drepper Acked-by: Peter Zijlstra Cc: Linus Torvalds Cc: Andrew Morton Cc: Jack Steiner Cc: Russ Anderson Cc: Mike Travis LKML-Reference: <20100312161316.9520.A69D9226@jp.fujitsu.com> Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman

genirq: Force MSI irq handlers to run with interrupts disabled

2010-04-26T14:41:18Z

commit 753649dbc49345a73a2454c770a3f2d54d11aec6 upstream. Network folks reported that directing all MSI-X vectors of their multi queue NICs to a single core can cause interrupt stack overflows when enough interrupts fire at the same time. This is caused by the fact that we run interrupt handlers by default with interrupts enabled unless the driver reuqests the interrupt with the IRQF_DISABLED set. The NIC handlers do not set this flag, so simultaneous interrupts can nest unlimited and cause the stack overflow. The only safe counter measure is to run the interrupt handlers with interrupts disabled. We can't switch to this mode in general right now, but it is safe to do so for MSI interrupts. Force IRQF_DISABLED for MSI interrupt handlers. Signed-off-by: Thomas Gleixner Cc: Andi Kleen Cc: Linus Torvalds Cc: Andrew Morton Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Alan Cox Cc: David Miller Cc: Greg Kroah-Hartman Cc: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman

Freezer: Fix buggy resume test for tasks frozen with cgroup freezer

2010-04-26T14:41:17Z

commit 5a7aadfe2fcb0f69e2acc1fbefe22a096e792fc9 upstream. When the cgroup freezer is used to freeze tasks we do not want to thaw those tasks during resume. Currently we test the cgroup freezer state of the resuming tasks to see if the cgroup is FROZEN. If so then we don't thaw the task. However, the FREEZING state also indicates that the task should remain frozen. This also avoids a problem pointed out by Oren Ladaan: the freezer state transition from FREEZING to FROZEN is updated lazily when userspace reads or writes the freezer.state file in the cgroup filesystem. This means that resume will thaw tasks in cgroups which should be in the FROZEN state if there is no read/write of the freezer.state file to trigger this transition before suspend. NOTE: Another "simple" solution would be to always update the cgroup freezer state during resume. However it's a bad choice for several reasons: Updating the cgroup freezer state is somewhat expensive because it requires walking all the tasks in the cgroup and checking if they are each frozen. Worse, this could easily make resume run in N^2 time where N is the number of tasks in the cgroup. Finally, updating the freezer state from this code path requires trickier locking because of the way locks must be ordered. Instead of updating the freezer state we rely on the fact that lazy updates only manage the transition from FREEZING to FROZEN. We know that a cgroup with the FREEZING state may actually be FROZEN so test for that state too. This makes sense in the resume path even for partially-frozen cgroups -- those that really are FREEZING but not FROZEN. Reported-by: Oren Ladaan Signed-off-by: Matt Helsley Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman

softlockup: Stop spurious softlockup messages due to overflow

2010-04-01T22:58:47Z

commit 8c2eb4805d422bdbf60ba00ff233c794d23c3c00 upstream. Ensure additions on touch_ts do not overflow. This can occur when the top 32 bits of the TSC reach 0xffffffff causing additions to touch_ts to overflow and this in turn generates spurious softlockup warnings. Signed-off-by: Colin Ian King Cc: Peter Zijlstra Cc: Eric Dumazet LKML-Reference: <1268994482.1798.6.camel@lenovo> Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman

cpuset: fix the problem that cpuset_mem_spread_node() returns an offline node

2010-04-01T22:58:46Z

commit 5ab116c9349ef52d6fbd2e2917a53f13194b048e upstream. cpuset_mem_spread_node() returns an offline node, and causes an oops. This patch fixes it by initializing task->mems_allowed to node_states[N_HIGH_MEMORY], and updating task->mems_allowed when doing memory hotplug. Signed-off-by: Miao Xie Acked-by: David Rientjes Reported-by: Nick Piggin Tested-by: Nick Piggin Cc: Paul Menage Cc: Li Zefan Cc: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman