tracing: Do not record user stack trace from NMI context

commit b6345879ccbd9b92864fbd7eb8ac48acdb4d6b15 upstream. A bug was found with Li Zefan's ftrace_stress_test that caused applications to segfault during the test. Placing a tracing_off() in the segfault code, and examining several traces, I found that the following was always the case. The lock tracer was enabled (lockdep being required) and userstack was enabled. Testing this out, I just enabled the two, but that was not good enough. I needed to run something else that could trigger it. Running a load like hackbench did not work, but executing a new program would. The following would trigger the segfault within seconds: # echo 1 > /debug/tracing/options/userstacktrace # echo 1 > /debug/tracing/events/lock/enable # while :; do ls > /dev/null ; done Enabling the function graph tracer and looking at what was happening I finally noticed that all cashes happened just after an NMI. 1) | copy_user_handle_tail() { 1) | bad_area_nosemaphore() { 1) | __bad_area_nosemaphore() { 1) | no_context() { 1) | fixup_exception() { 1) 0.319 us | search_exception_tables(); 1) 0.873 us | } [...] 1) 0.314 us | __rcu_read_unlock(); 1) 0.325 us | native_apic_mem_write(); 1) 0.943 us | } 1) 0.304 us | rcu_nmi_exit(); [...] 1) 0.479 us | find_vma(); 1) | bad_area() { 1) | __bad_area() { After capturing several traces of failures, all of them happened after an NMI. Curious about this, I added a trace_printk() to the NMI handler to read the regs->ip to see where the NMI happened. In which I found out it was here: ffffffff8135b660 <page_fault>: ffffffff8135b660: 48 83 ec 78 sub $0x78,%rsp ffffffff8135b664: e8 97 01 00 00 callq ffffffff8135b800 <error_entry> What was happening is that the NMI would happen at the place that a page fault occurred. It would call rcu_read_lock() which was traced by the lock events, and the user_stack_trace would run. This would trigger a page fault inside the NMI. I do not see where the CR2 register is saved or restored in NMI handling. This means that it would corrupt the page fault handling that the NMI interrupted. The reason the while loop of ls helped trigger the bug, was that each execution of ls would cause lots of pages to be faulted in, and increase the chances of the race happening. The simple solution is to not allow user stack traces in NMI context. After this patch, I ran the above "ls" test for a couple of hours without any issues. Without this patch, the bug would trigger in less than a minute. Reported-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
author: Steven Rostedt <srostedt@redhat.com> 2010-03-12 20:03:30 -0500
committer: Greg Kroah-Hartman <gregkh@suse.de> 2010-04-01 15:58:12 -0700
commit: 80736c50f92cc545fe190d5ca47ca04fd3eca860 (patch)
tree: c19b72e53b1040f53c094c51786784103389a105 /kernel/trace
parent: 95b9725fa051fe1a58d89b1c445948881f998907 (diff)
1 files changed, 7 insertions, 0 deletions
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index dff6329b94a..3cfb60b4770 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1171,6 +1171,13 @@ ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags, int pc)
 	if (!(trace_flags & TRACE_ITER_USERSTACKTRACE))
 		return;
 
+	/*
+	 * NMIs can not handle page faults, even with fix ups.
+	 * The save user stack can (and often does) fault.
+	 */
+	if (unlikely(in_nmi()))
+		return;
+
 	event = trace_buffer_lock_reserve(buffer, TRACE_USER_STACK,
 					  sizeof(*entry), flags, pc);
 	if (!event)
author	Steven Rostedt <srostedt@redhat.com>	2010-03-12 20:03:30 -0500
committer	Greg Kroah-Hartman <gregkh@suse.de>	2010-04-01 15:58:12 -0700
commit	80736c50f92cc545fe190d5ca47ca04fd3eca860 (patch)
tree	c19b72e53b1040f53c094c51786784103389a105 /kernel/trace
parent	95b9725fa051fe1a58d89b1c445948881f998907 (diff)