sched: fix the theoretical signal_wake_up() vs schedule() race

commit e0acd0a68ec7dbf6b7a81a87a867ebd7ac9b76c4 upstream. This is only theoretical, but after try_to_wake_up(p) was changed to check p->state under p->pi_lock the code like __set_current_state(TASK_INTERRUPTIBLE); schedule(); can miss a signal. This is the special case of wait-for-condition, it relies on try_to_wake_up/schedule interaction and thus it does not need mb() between __set_current_state() and if(signal_pending). However, this __set_current_state() can move into the critical section protected by rq->lock, now that try_to_wake_up() takes another lock we need to ensure that it can't be reordered with "if (signal_pending(current))" check inside that section. The patch is actually one-liner, it simply adds smp_wmb() before spin_lock_irq(rq->lock). This is what try_to_wake_up() already does by the same reason. We turn this wmb() into the new helper, smp_mb__before_spinlock(), for better documentation and to allow the architectures to change the default implementation. While at it, kill smp_mb__after_lock(), it has no callers. Perhaps we can also add smp_mb__before/after_spinunlock() for prepare_to_wait(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
author: Oleg Nesterov <oleg@redhat.com> 2013-08-12 18:14:00 +0200
committer: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 2014-01-09 12:24:23 -0800
commit: 57f74b6ecebf59991677dd2da0f0433e8be6c945 (patch)
tree: 4e3b332c34f41796e61bf7bcfb3e6bb06386e0b3 /include
parent: a29ccdd1b5a61fad7d4883b3ef63da3a313f1e44 (diff)
1 files changed, 11 insertions, 3 deletions
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 7d537ced949..75f34949d9a 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -117,9 +117,17 @@ do {								\
 #endif /*arch_spin_is_contended*/
 #endif
 
-/* The lock does not imply full memory barrier. */
-#ifndef ARCH_HAS_SMP_MB_AFTER_LOCK
-static inline void smp_mb__after_lock(void) { smp_mb(); }
+/*
+ * Despite its name it doesn't necessarily has to be a full barrier.
+ * It should only guarantee that a STORE before the critical section
+ * can not be reordered with a LOAD inside this section.
+ * spin_lock() is the one-way barrier, this LOAD can not escape out
+ * of the region. So the default implementation simply ensures that
+ * a STORE can not move into the critical section, smp_wmb() should
+ * serialize it with another STORE done by spin_lock().
+ */
+#ifndef smp_mb__before_spinlock
+#define smp_mb__before_spinlock()	smp_wmb()
 #endif
 
 /**
author	Oleg Nesterov <oleg@redhat.com>	2013-08-12 18:14:00 +0200
committer	Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-01-09 12:24:23 -0800
commit	57f74b6ecebf59991677dd2da0f0433e8be6c945 (patch)
tree	4e3b332c34f41796e61bf7bcfb3e6bb06386e0b3 /include
parent	a29ccdd1b5a61fad7d4883b3ef63da3a313f1e44 (diff)