nohz: Fix printk_needs_cpu() return value on offline cpus

commit 61ab25447ad6334a74e32f60efb135a3467223f8 upstream. This patch fixes a hang observed with 2.6.32 kernels where timers got enqueued on offline cpus. printk_needs_cpu() may return 1 if called on offline cpus. When a cpu gets offlined it schedules the idle process which, before killing its own cpu, will call tick_nohz_stop_sched_tick(). That function in turn will call printk_needs_cpu() in order to check if the local tick can be disabled. On offline cpus this function should naturally return 0 since regardless if the tick gets disabled or not the cpu will be dead short after. That is besides the fact that __cpu_disable() should already have made sure that no interrupts on the offlined cpu will be delivered anyway. In this case it prevents tick_nohz_stop_sched_tick() to call select_nohz_load_balancer(). No idea if that really is a problem. However what made me debug this is that on 2.6.32 the function get_nohz_load_balancer() is used within __mod_timer() to select a cpu on which a timer gets enqueued. If printk_needs_cpu() returns 1 then the nohz_load_balancer cpu doesn't get updated when a cpu gets offlined. It may contain the cpu number of an offline cpu. In turn timers get enqueued on an offline cpu and not very surprisingly they never expire and cause system hangs. This has been observed 2.6.32 kernels. On current kernels __mod_timer() uses get_nohz_timer_target() which doesn't have that problem. However there might be other problems because of the too early exit tick_nohz_stop_sched_tick() in case a cpu goes offline. Easiest way to fix this is just to test if the current cpu is offline and call printk_tick() directly which clears the condition. Alternatively I tried a cpu hotplug notifier which would clear the condition, however between calling the notifier function and printk_needs_cpu() something could have called printk() again and the problem is back again. This seems to be the safest fix. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101126120235.406766476@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
author: Heiko Carstens <heiko.carstens@de.ibm.com> 2010-11-26 13:00:59 +0100
committer: Greg Kroah-Hartman <gregkh@suse.de> 2011-03-21 12:44:21 -0700
commit: 8300619b701e4441950f505c32df651dff26286b (patch)
tree: b3e40aaea360bc485b2c4d54387fe870ab708812 /kernel
parent: bb494f01fec09228682c060841f7f6033b7563f4 (diff)
1 files changed, 2 insertions, 0 deletions
diff --git a/kernel/printk.c b/kernel/printk.c
index 1751c456b71..fd86add1c0a 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1016,6 +1016,8 @@ void printk_tick(void)
 
 int printk_needs_cpu(int cpu)
 {
+	if (unlikely(cpu_is_offline(cpu)))
+		printk_tick();
 	return per_cpu(printk_pending, cpu);
 }
author	Heiko Carstens <heiko.carstens@de.ibm.com>	2010-11-26 13:00:59 +0100
committer	Greg Kroah-Hartman <gregkh@suse.de>	2011-03-21 12:44:21 -0700
commit	8300619b701e4441950f505c32df651dff26286b (patch)
tree	b3e40aaea360bc485b2c4d54387fe870ab708812 /kernel
parent	bb494f01fec09228682c060841f7f6033b7563f4 (diff)