summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2013-04-16ftrace: Move ftrace_filter_lseek out of CONFIG_DYNAMIC_FTRACE sectionSteven Rostedt (Red Hat)
commit 7f49ef69db6bbf756c0abca7e9b65b32e999eec8 upstream. As ftrace_filter_lseek is now used with ftrace_pid_fops, it needs to be moved out of the #ifdef CONFIG_DYNAMIC_FTRACE section as the ftrace_pid_fops is defined when DYNAMIC_FTRACE is not. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-16tracing: Fix possible NULL pointer dereferencesNamhyung Kim
commit 6a76f8c0ab19f215af2a3442870eeb5f0e81998d upstream. Currently set_ftrace_pid and set_graph_function files use seq_lseek for their fops. However seq_open() is called only for FMODE_READ in the fops->open() so that if an user tries to seek one of those file when she open it for writing, it sees NULL seq_file and then panic. It can be easily reproduced with following command: $ cd /sys/kernel/debug/tracing $ echo 1234 | sudo tee -a set_ftrace_pid In this example, GNU coreutils' tee opens the file with fopen(, "a") and then the fopen() internally calls lseek(). Link: http://lkml.kernel.org/r/1365663302-2170-1-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-16tracing: Fix double free when function profile init failedNamhyung Kim
commit 83e03b3fe4daffdebbb42151d5410d730ae50bd1 upstream. On the failure path, stat->start and stat->pages will refer same page. So it'll attempt to free the same page again and get kernel panic. Link: http://lkml.kernel.org/r/1364820385-32027-1-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-12ftrace: Consistently restore trace function on sysctl enablingJan Kiszka
commit 5000c418840b309251c5887f0b56503aae30f84c upstream. If we reenable ftrace via syctl, we currently set ftrace_trace_function based on the previous simplistic algorithm. This is inconsistent with what update_ftrace_function does. So better call that helper instead. Link: http://lkml.kernel.org/r/5151D26F.1070702@siemens.com Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-05tracing: Prevent buffer overwrite disabled for latency tracersSteven Rostedt (Red Hat)
commit 613f04a0f51e6e68ac6fe571ab79da3c0a5eb4da upstream. The latency tracers require the buffers to be in overwrite mode, otherwise they get screwed up. Force the buffers to stay in overwrite mode when latency tracers are enabled. Added a flag_changed() method to the tracer structure to allow the tracers to see what flags are being changed, and also be able to prevent the change from happing. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Lingzhu Xiang <lxiang@redhat.com> Reviewed-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-28tracing: Keep overwrite in sync between regular and snapshot buffersSteven Rostedt (Red Hat)
commit 80902822658aab18330569587cdb69ac1dfdcea8 upstream. Changing the overwrite mode for the ring buffer via the trace option only sets the normal buffer. But the snapshot buffer could swap with it, and then the snapshot would be in non overwrite mode and the normal buffer would be in overwrite mode, even though the option flag states otherwise. Keep the two buffers overwrite modes in sync. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-28tracing: Protect tracer flags with trace_types_lockSteven Rostedt (Red Hat)
commit 69d34da2984c95b33ea21518227e1f9470f11d95 upstream. Seems that the tracer flags have never been protected from synchronous writes. Luckily, admins don't usually modify the tracing flags via two different tasks. But if scripts were to be used to modify them, then they could get corrupted. Move the trace_types_lock that protects against tracers changing to also protect the flags being set. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-28tracing: Fix free of probe entry by calling call_rcu_sched()Steven Rostedt (Red Hat)
commit 740466bc89ad8bd5afcc8de220f715f62b21e365 upstream. Because function tracing is very invasive, and can even trace calls to rcu_read_lock(), RCU access in function tracing is done with preempt_disable_notrace(). This requires a synchronize_sched() for updates and not a synchronize_rcu(). Function probes (traceon, traceoff, etc) must be freed after a synchronize_sched() after its entry has been removed from the hash. But call_rcu() is used. Fix this by using call_rcu_sched(). Also fix the usage to use hlist_del_rcu() instead of hlist_del(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-28tracing: Fix race in snapshot swappingSteven Rostedt (Red Hat)
commit 2721e72dd10f71a3ba90f59781becf02638aa0d9 upstream. Although the swap is wrapped with a spin_lock, the assignment of the temp buffer used to swap is not within that lock. It needs to be moved into that lock, otherwise two swaps happening on two different CPUs, can end up using the wrong temp buffer to assign in the swap. Luckily, all current callers of the swap function appear to have their own locks. But in case something is added that allows two different callers to call the swap, then there's a chance that this race can trigger and corrupt the buffers. New code is coming soon that will allow for this race to trigger. I've Cc'd stable, so this bug will not show up if someone backports one of the changes that can trigger this bug. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-14ftrace: Update the kconfig for DYNAMIC_FTRACESteven Rostedt
commit db05021d49a994ee40a9735d9c3cb0060c9babb8 upstream. The prompt to enable DYNAMIC_FTRACE (the ability to nop and enable function tracing at run time) had a confusing statement: "enable/disable ftrace tracepoints dynamically" This was written before tracepoints were added to the kernel, but now that tracepoints have been added, this is very confusing and has confused people enough to give wrong information during presentations. Not only that, I looked at the help text, and it still references that dreaded daemon that use to wake up once a second to update the nop locations and brick NICs, that hasn't been around for over five years. Time to bring the text up to the current decade. Reported-by: Ezequiel Garcia <elezegarcia@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-04ftrace: Call ftrace cleanup module notifier after all other notifiersSteven Rostedt (Red Hat)
commit 8c189ea64eea01ca20d102ddb74d6936dd16c579 upstream. Commit: c1bf08ac "ftrace: Be first to run code modification on modules" changed ftrace module notifier's priority to INT_MAX in order to process the ftrace nops before anything else could touch them (namely kprobes). This was the correct thing to do. Unfortunately, the ftrace module notifier also contains the ftrace clean up code. As opposed to the set up code, this code should be run *after* all the module notifiers have run in case a module is doing correct clean-up and unregisters its ftrace hooks. Basically, ftrace needs to do clean up on module removal, as it needs to know about code being removed so that it doesn't try to modify that code. But after it removes the module from its records, if a ftrace user tries to remove a probe, that removal will fail due as the record of that code segment no longer exists. Nothing really bad happens if the probe removal is called after ftrace did the clean up, but the ftrace removal function will return an error. Correct code (such as kprobes) will produce a WARN_ON() if it fails to remove the probe. As people get annoyed by frivolous warnings, it's best to do the ftrace clean up after everything else. By splitting the ftrace_module_notifier into two notifiers, one that does the module load setup that is run at high priority, and the other that is called for module clean up that is run at low priority, the problem is solved. Reported-by: Frank Ch. Eigler <fche@redhat.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21ftrace: Be first to run code modification on modulesSteven Rostedt
If some other kernel subsystem has a module notifier, and adds a kprobe to a ftrace mcount point (now that kprobes work on ftrace points), when the ftrace notifier runs it will fail and disable ftrace, as well as kprobes that are attached to ftrace points. Here's the error: WARNING: at kernel/trace/ftrace.c:1618 ftrace_bug+0x239/0x280() Hardware name: Bochs Modules linked in: fat(+) stap_56d28a51b3fe546293ca0700b10bcb29__8059(F) nfsv4 auth_rpcgss nfs dns_resolver fscache xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack lockd sunrpc ppdev parport_pc parport microcode virtio_net i2c_piix4 drm_kms_helper ttm drm i2c_core [last unloaded: bid_shared] Pid: 8068, comm: modprobe Tainted: GF 3.7.0-0.rc8.git0.1.fc19.x86_64 #1 Call Trace: [<ffffffff8105e70f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff81134106>] ? __probe_kernel_read+0x46/0x70 [<ffffffffa0180000>] ? 0xffffffffa017ffff [<ffffffffa0180000>] ? 0xffffffffa017ffff [<ffffffff8105e76a>] warn_slowpath_null+0x1a/0x20 [<ffffffff810fd189>] ftrace_bug+0x239/0x280 [<ffffffff810fd626>] ftrace_process_locs+0x376/0x520 [<ffffffff810fefb7>] ftrace_module_notify+0x47/0x50 [<ffffffff8163912d>] notifier_call_chain+0x4d/0x70 [<ffffffff810882f8>] __blocking_notifier_call_chain+0x58/0x80 [<ffffffff81088336>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff810c2a23>] sys_init_module+0x73/0x220 [<ffffffff8163d719>] system_call_fastpath+0x16/0x1b ---[ end trace 9ef46351e53bbf80 ]--- ftrace failed to modify [<ffffffffa0180000>] init_once+0x0/0x20 [fat] actual: cc:bb:d2:4b:e1 A kprobe was added to the init_once() function in the fat module on load. But this happened before ftrace could have touched the code. As ftrace didn't run yet, the kprobe system had no idea it was a ftrace point and simply added a breakpoint to the code (0xcc in the cc:bb:d2:4b:e1). Then when ftrace went to modify the location from a call to mcount/fentry into a nop, it didn't see a call op, but instead it saw the breakpoint op and not knowing what to do with it, ftrace shut itself down. The solution is to simply give the ftrace module notifier the max priority. This should have been done regardless, as the core code ftrace modification also happens very early on in boot up. This makes the module modification closer to core modification. Link: http://lkml.kernel.org/r/20130107140333.593683061@goodmis.org Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reported-by: Frank Ch. Eigler <fche@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-14tracing: Fix regression of trace_pipeLiu Bo
Commit 0fb9656d "tracing: Make tracing_enabled be equal to tracing_on" changes the behaviour of trace_pipe, ie. it makes trace_pipe return if we've read something and tracing is enabled, and this means that we have to 'cat trace_pipe' again and again while running tests. IMO the right way is if tracing is enabled, we always block and wait for ring buffer, or we may lose what we want since ring buffer's size is limited. Link: http://lkml.kernel.org/r/1358132051-5410-1-git-send-email-bo.li.liu@oracle.com Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-11tracing: Fix regression with irqsoff tracer and tracing_on fileSteven Rostedt
Commit 02404baf1b47 "tracing: Remove deprecated tracing_enabled file" removed the tracing_enabled file as it never worked properly and the tracing_on file should be used instead. But the tracing_on file didn't call into the tracers start/stop routines like the tracing_enabled file did. This caused trace-cmd to break when it enabled the irqsoff tracer. If you just did "echo irqsoff > current_tracer" then it would work properly. But the tool trace-cmd disables tracing first by writing "0" into the tracing_on file. Then it writes "irqsoff" into current_tracer and then writes "1" into tracing_on. Unfortunately, the above commit changed the irqsoff tracer to check the tracing_on status instead of the tracing_enabled status. If it's disabled then it does not start the tracer internals. The problem is that writing "1" into tracing_on does not call the tracers "start" routine like writing "1" into tracing_enabled did. This makes the irqsoff tracer not start when using the trace-cmd tool, and is a regression for userspace. Simple fix is to have the tracing_on file call the tracers start() method when being enabled (and the stop() method when disabled). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-09tracing: Fix regression of trace_options file settingSteven Rostedt
The latest change to allow trace options to be set on the command line also broke the trace_options file. The zeroing of the last byte of the option name that is echoed into the trace_option file was removed with the consolidation of some of the code. The compare between the option and what was written to the trace_options file fails because the string holding the data written doesn't terminate with a null character. A zero needs to be added to the end of the string copied from user space. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-12-18Merge branch 'tip/perf/core-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull minor tracing updates and fixes from Steven Rostedt: "It seems that one of my old pull requests have slipped through. The changes are contained to just the files that I maintain, and are changes from others that I told I would get into this merge window. They have already been in linux-next for several weeks, and should be well tested." * 'tip/perf/core-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Remove unnecessary WARN_ONCE's from tracing_buffers_splice_read tracing: Remove unneeded checks from the stack tracer tracing: Add a resize function to make one buffer equivalent to another buffer
2012-12-17trace: use kbasename()Andy Shevchenko
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17lseek: the "whence" argument is called "whence"Andrew Morton
But the kernel decided to call it "origin" instead. Fix most of the sites. Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial branch from Jiri Kosina: "Usual stuff -- comment/printk typo fixes, documentation updates, dead code elimination." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) HOWTO: fix double words typo x86 mtrr: fix comment typo in mtrr_bp_init propagate name change to comments in kernel source doc: Update the name of profiling based on sysfs treewide: Fix typos in various drivers treewide: Fix typos in various Kconfig wireless: mwifiex: Fix typo in wireless/mwifiex driver messages: i2o: Fix typo in messages/i2o scripts/kernel-doc: check that non-void fcts describe their return value Kernel-doc: Convention: Use a "Return" section to describe return values radeon: Fix typo and copy/paste error in comments doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c various: Fix spelling of "asynchronous" in comments. Fix misspellings of "whether" in comments. eisa: Fix spelling of "asynchronous". various: Fix spelling of "registered" in comments. doc: fix quite a few typos within Documentation target: iscsi: fix comment typos in target/iscsi drivers treewide: fix typo of "suport" in various comments and Kconfig treewide: fix typo of "suppport" in various comments ...
2012-12-11Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "These are late-v3.7 pending fixes for tracing." Fix up trivial conflict in kernel/trace/ring_buffer.c: the NULL pointer fix clashed with the change of type of the 'ret' variable. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ring-buffer: Fix race between integrity check and readers ring-buffer: Fix NULL pointer if rb_set_head_page() fails ftrace: Clear bits properly in reset_iter_read()
2012-12-08Merge branch 'tip/perf/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core Pull ftrace updates from Steve Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-08Merge branch 'uprobes/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core Pull uprobes fixes, cleanups and preparation for the ARM port from Oleg Nesterov. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-06propagate name change to comments in kernel sourceNadia Yvette Chambers
I've legally changed my name with New York State, the US Social Security Administration, et al. This patch propagates the name change and change in initials and login to comments in the kernel source as well. Signed-off-by: Nadia Yvette Chambers <nyc@holomorphy.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-30ring-buffer: Fix race between integrity check and readersSteven Rostedt
The function rb_check_pages() was added to make sure the ring buffer's pages were sane. This check is done when the ring buffer size is modified as well as when the iterator is released (closing the "trace" file), as that was considered a non fast path and a good place to do a sanity check. The problem is that the check does not have any locks around it. If one process were to read the trace file, and another were to read the raw binary file, the check could happen while the reader is reading the file. The issues with this is that the check requires to clear the HEAD page before doing the full check and it restores it afterward. But readers require the HEAD page to exist before it can read the buffer, otherwise it gives a nasty warning and disables the buffer. By adding the reader lock around the check, this keeps the race from happening. Cc: stable@vger.kernel.org # 3.6 Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-30ring-buffer: Fix NULL pointer if rb_set_head_page() failsSteven Rostedt
The function rb_set_head_page() searches the list of ring buffer pages for a the page that has the HEAD page flag set. If it does not find it, it will do a WARN_ON(), disable the ring buffer and return NULL, as this should never happen. But if this bug happens to happen, not all callers of this function can handle a NULL pointer being returned from it. That needs to be fixed. Cc: stable@vger.kernel.org # 3.0+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-19tracing: Remove unnecessary WARN_ONCE's from tracing_buffers_splice_readDave Jones
WARN shouldn't be used as a means of communicating failure to a userspace programmer. Link: http://lkml.kernel.org/r/20120725153908.GA25203@redhat.com Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-19tracing: Remove unneeded checks from the stack tracerAnton Vorontsov
It seems that 'ftrace_enabled' flag should not be used inside the tracer functions. The ftrace core is using this flag for internal purposes, and the flag wasn't meant to be used in tracers' runtime checks. stack tracer is the only tracer that abusing the flag. So stop it from serving as a bad example. Also, there is a local 'stack_trace_disabled' flag in the stack tracer, which is never updated; so it can be removed as well. Link: http://lkml.kernel.org/r/1342637761-9655-1-git-send-email-anton.vorontsov@linaro.org Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-15tracing: Add a resize function to make one buffer equivalent to another bufferHiraku Toyooka
Trace buffer size is now per-cpu, so that there are the following two patterns in resizing of buffers. (1) resize per-cpu buffers to same given size (2) resize per-cpu buffers to another trace_array's buffer size for each CPU (such as preparing the max_tr which is equivalent to the global_trace's size) __tracing_resize_ring_buffer() can be used for (1), and had implemented (2) inside it for resetting the global_trace to the original size. (2) was also implemented in another place. So this patch assembles them in a new function - resize_buffer_duplicate_size(). Link: http://lkml.kernel.org/r/20121017025616.2627.91226.stgit@falsita Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-15ftrace: Clear bits properly in reset_iter_read()Dan Carpenter
There is a typo here where '&' is used instead of '|' and it turns the statement into a noop. The original code is equivalent to: iter->flags &= ~((1 << 2) & (1 << 4)); Link: http://lkml.kernel.org/r/20120609161027.GD6488@elgon.mountain Cc: stable@vger.kernel.org # all of them Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-13tracing: Show raw time stamp on stats per cpu using counter or tsc mode for ↵Yoshihiro YUNOMAE
trace_clock Show raw time stamp values for stats per cpu if you choose counter or tsc mode for trace_clock. Although a unit of tracing time stamp is nsec in local or global mode, the units in counter and TSC mode are tracing counter and cycles respectively. Link: http://lkml.kernel.org/r/1352837903-32191-3-git-send-email-dhsharp@google.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: David Sharp <dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-13tracing: Format non-nanosec times from tsc clock without a decimal point.David Sharp
With the addition of the "tsc" clock, formatting timestamps to look like fractional seconds is misleading. Mark clocks as either in nanoseconds or not, and format non-nanosecond timestamps as decimal integers. Tested: $ cd /sys/kernel/debug/tracing/ $ cat trace_clock [local] global tsc $ echo sched_switch > set_event $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on $ cat trace <idle>-0 [000] 6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120 sleep-29964 [000] 6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120 ... $ echo 1 > options/latency-format $ cat trace <idle>-0 0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120 sleep-29964 0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120 ... $ echo tsc > trace_clock $ cat trace $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on $ echo 0 > options/latency-format $ cat trace <idle>-0 [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120 sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120 ... echo 1 > options/latency-format $ cat trace <idle>-0 0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120 sleep-31128 0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120 ... v2: Move arch-specific bits out of generic code. v4: Fix x86_32 build due to 64-bit division. Google-Bug-Id: 6980623 Link: http://lkml.kernel.org/r/1352837903-32191-2-git-send-email-dhsharp@google.com Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: David Sharp <dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-13tracing,x86: Add a TSC trace_clockDavid Sharp
In order to promote interoperability between userspace tracers and ftrace, add a trace_clock that reports raw TSC values which will then be recorded in the ring buffer. Userspace tracers that also record TSCs are then on exactly the same time base as the kernel and events can be unambiguously interlaced. Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large timestamp values. v2: Move arch-specific bits out of generic code. v3: Rename "x86-tsc", cleanups v7: Generic arch bits in Kbuild. Google-Bug-Id: 6980623 Link: http://lkml.kernel.org/r/1352837903-32191-1-git-send-email-dhsharp@google.com Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Signed-off-by: David Sharp <dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02tracing: Add trace_options kernel command line parameterSteven Rostedt
Add trace_options to the kernel command line parameter to be able to set options at early boot. For example, to enable stack dumps of events, add the following: trace_options=stacktrace This along with the trace_event option, you can get not only traces of the events but also the stack dumps with them. Requested-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02tracing: Use irq_work for wake ups and remove *_nowake_*() functionsSteven Rostedt
Have the ring buffer commit function use the irq_work infrastructure to wake up any waiters waiting on the ring buffer for new data. The irq_work was created for such a purpose, where doing the actual wake up at the time of adding data is too dangerous, as an event or function trace may be in the midst of the work queue locks and cause deadlocks. The irq_work will either delay the action to the next timer interrupt, or trigger an IPI to itself forcing an interrupt to do the work (in a safe location). With irq_work, all ring buffer commits can safely do wakeups, removing the need for the ring buffer commit "nowake" variants, which were used by events and function tracing. All commits can now safely use the normal commit, and the "nowake" variants can be removed. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02tracing: Remove deprecated tracing_enabled fileSteven Rostedt
The tracing_enabled file was used as a quick way to stop tracers, and try to bring down overhead for things like the latency tracers (irqsoff, wakeup, etc). But it didn't work that well. The tracing_on file was created as a really fast way to stop recording into the ftrace ring buffer and can interact with the kernel. That is a tracing_off() call in the kernel can disable recording of events, and then from userspace one could echo 1 into the tracing_on file to continue it. The tracing_enabled function did too much to allow for this. The tracing_on has taken over as a way to start and stop tracing and the tracing_enabled file should not be used. But because of its existance, it still confuses people. Over a year ago the following commit was added: commit 6752ab4a9c30d5411b2dfdb251a3f1cb18aae487 Author: Steven Rostedt <srostedt@redhat.com> Date: Tue Feb 8 13:54:06 2011 -0500 tracing: Deprecate tracing_enabled for tracing_on This commit added a WARN_ON() if the tracing_enabled file's variable was changed. After this was added, only LatencyTop complained, and they soon fixed their tool as there was no reason that LatencyTop should touch this file as it was using the perf ring buffers which this file does not interact with. But since that time no one else has complained about this WARN_ON(). Thus it is safe to assume that this file is no longer needed. Time to get rid of it. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02tracing: Make tracing_enabled be equal to tracing_onSteven Rostedt
The tracing_enabled file has been deprecated as it never was able to serve its purpose well. The tracing_on file has taken over. Instead of having code to keep tracing_enabled, have the tracing_enabled file just set tracing_on, and remove the tracing_enabled variable. This allows us to remove the tracing_enabled file. The reason that the remove is in a different change set and not removed here is in case we find some lonely userspace tool that requires the file to exist. Then the removal patch will get reverted, but this one will not. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02tracing: Remove unused function unregister_tracer()Steven Rostedt
The function register_tracer() is only used by kernel core code, that never needs to remove the tracer. As trace_events have become the main way to add new tracing to the kernel, the need to unregister a tracer has diminished. Remove the unused function unregister_tracer(). If a need arises where we need it, then we can always add it back. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02tracing: Separate open function from set_event and available_eventsSteven Rostedt
The open function used by available_events is the same as set_event even though it uses different seq functions. This causes a side effect of writing into available_events clearing all events, even though available_events is suppose to be read only. There's no reason to keep a single function for just the open and have both use different functions for everything else. It is a little confusing and causes strange behavior. Just have each have their own function. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02ring-buffer: Change unsigned long type of ring_buffer_oldest_event_ts() to u64Yoshihiro YUNOMAE
ring_buffer_oldest_event_ts() should return a value of u64 type, because ring_buffer_per_cpu->buffer_page->buffer_data_page->time_stamp is u64 type. Link: http://lkml.kernel.org/r/1349998076-15495-5-git-send-email-dhsharp@google.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: David Sharp <dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02tracing: Reset ring buffer when changing trace_clocksDavid Sharp
Because the "tsc" clock isn't in nanoseconds, the ring buffer must be reset when changing clocks so that incomparable timestamps don't end up in the same trace. Tested: Confirmed switching clocks resets the trace buffer. Google-Bug-Id: 6980623 Link: http://lkml.kernel.org/r/1349998076-15495-3-git-send-email-dhsharp@google.com Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: David Sharp <dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31tracing: Cleanup unnecessary function declarationsVaibhav Nagarnaik
The functions defined in include/trace/syscalls.h are not used directly since struct ftrace_event_class was introduced. Remove them from the header file and rearrange the ftrace_event_class declarations in trace_syscalls.c. Link: http://lkml.kernel.org/r/1339112785-21806-2-git-send-email-vnagarnaik@google.com Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31tracing: Trivial cleanupDavid Sharp
Remove ftrace_format_syscall() declaration; it is neither defined nor used. Also update a comment and formatting. Link: http://lkml.kernel.org/r/1339112785-21806-1-git-send-email-vnagarnaik@google.com Signed-off-by: David Sharp <dhsharp@google.com> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31tracing: Cache comms only after an event occurredSteven Rostedt
Whenever an event is registered, the comm of tasks are saved at every task switch instead of saving them at every event. But if an event isn't executed much, the comm cache will be filled up by tasks that did not record the event and you lose out on the comms that did. Here's an example, if you enable the following events: echo 1 > /debug/tracing/events/kvm/kvm_cr/enable echo 1 > /debug/tracing/events/net/net_dev_xmit/enable Note, there's no kvm running on this machine so the first event will never be triggered, but because it is enabled, the storing of comms will continue. If we now disable the network event: echo 0 > /debug/tracing/events/net/net_dev_xmit/enable and look at the trace: cat /debug/tracing/trace sshd-2672 [001] ..s2 375.731616: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0 sshd-2672 [001] ..s1 375.731617: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0 sshd-2672 [001] ..s2 375.859356: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0 sshd-2672 [001] ..s1 375.859357: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0 sshd-2672 [001] ..s2 375.947351: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0 sshd-2672 [001] ..s1 375.947352: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0 sshd-2672 [001] ..s2 376.035383: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0 sshd-2672 [001] ..s1 376.035383: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0 sshd-2672 [001] ..s2 377.563806: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=226 rc=0 sshd-2672 [001] ..s1 377.563807: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=226 rc=0 sshd-2672 [001] ..s2 377.563834: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6be0 len=114 rc=0 sshd-2672 [001] ..s1 377.563842: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6be0 len=114 rc=0 We see that process 2672 which triggered the events has the comm "sshd". But if we run hackbench for a bit and look again: cat /debug/tracing/trace <...>-2672 [001] ..s2 375.731616: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0 <...>-2672 [001] ..s1 375.731617: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0 <...>-2672 [001] ..s2 375.859356: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0 <...>-2672 [001] ..s1 375.859357: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0 <...>-2672 [001] ..s2 375.947351: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0 <...>-2672 [001] ..s1 375.947352: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0 <...>-2672 [001] ..s2 376.035383: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0 <...>-2672 [001] ..s1 376.035383: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0 <...>-2672 [001] ..s2 377.563806: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=226 rc=0 <...>-2672 [001] ..s1 377.563807: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=226 rc=0 <...>-2672 [001] ..s2 377.563834: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6be0 len=114 rc=0 <...>-2672 [001] ..s1 377.563842: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6be0 len=114 rc=0 The stored "sshd" comm has been flushed out and we get a useless "<...>". But by only storing comms after a trace event occurred, we can run hackbench all day and still get the same output. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31tracing: Have tracing_sched_wakeup_trace() use standard unlock_commitSteven Rostedt
The functon tracing_sched_wakeup_trace() does an open coded unlock commit and save stack. This is what the trace_nowake_buffer_unlock_commit() is for. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31tracing: Enable comm recording if trace_printk() is usedSteven Rostedt
If comm recording is not enabled when trace_printk() is used then you just get this type of output: [ adding trace_printk("hello! %d", irq); in do_IRQ ] <...>-2843 [001] d.h. 80.812300: do_IRQ: hello! 14 <...>-2734 [002] d.h2 80.824664: do_IRQ: hello! 14 <...>-2713 [003] d.h. 80.829971: do_IRQ: hello! 14 <...>-2814 [000] d.h. 80.833026: do_IRQ: hello! 14 By enabling the comm recorder when trace_printk is enabled: hackbench-6715 [001] d.h. 193.233776: do_IRQ: hello! 21 sshd-2659 [001] d.h. 193.665862: do_IRQ: hello! 21 <idle>-0 [001] d.h1 193.665996: do_IRQ: hello! 21 Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31tracing: Expand ring buffer when trace_printk() is usedSteven Rostedt
Since tracing is not used by 99% of Linux users, even though tracing may be configured in, it does not make sense to allocate 1.4 Megs per CPU for the ring buffers if they are not used. Thus, on boot up the ring buffers are set to a minimal size until something needs the and they are expanded. This works well for events and tracers (function, etc), but for the asynchronous use of trace_printk() which can write to the ring buffer at any time, does not expand the buffers. On boot up a check is made to see if any trace_printk() is used to see if the trace_printk() temp buffer pages should be allocated. This same code can be used to expand the buffers as well. Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31ring-buffer: Add a 'dropped events' counterSlava Pestov
The existing 'overrun' counter is incremented when the ring buffer wraps around, with overflow on (the default). We wanted a way to count requests lost from the buffer filling up with overflow off, too. I decided to add a new counter instead of retro-fitting the existing one because it seems like a different statistic to count conceptually, and also because of how the code was structured. Link: http://lkml.kernel.org/r/1310765038-26399-1-git-send-email-slavapestov@google.com Signed-off-by: Slava Pestov <slavapestov@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31tracing: Change tracer's integer flags to boolHiraku Toyooka
print_max and use_max_tr in struct tracer are "int" variables and used like flags. This is wasteful, so change the type to "bool". Link: http://lkml.kernel.org/r/20121002082710.9807.86393.stgit@falsita Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31tracing: Allow tracers to start at core initcallSteven Rostedt
There's times during debugging that it is helpful to see traces of early boot functions. But the tracers are initialized at device_initcall() which is quite late during the boot process. Setting the kernel command line parameter ftrace=function will not show anything until the function tracer is initialized. This prevents being able to trace functions before device_initcall(). There's no reason that the tracers need to be initialized so late in the boot process. Move them up to core_initcall() as they still need to come after early_initcall() which initializes the tracing buffers. Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31tracing: Replace strict_strto* with kstrto*Daniel Walter
* remove old string conversions with kstrto* Link: http://lkml.kernel.org/r/20120926200838.GC1244@0x90.at Signed-off-by: Daniel Walter <sahne@0x90.at> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>