diff options
Diffstat (limited to 'Documentation/trace')
| -rw-r--r-- | Documentation/trace/events-nmi.txt | 43 | ||||
| -rw-r--r-- | Documentation/trace/events-power.txt | 31 | ||||
| -rw-r--r-- | Documentation/trace/events.txt | 222 | ||||
| -rw-r--r-- | Documentation/trace/ftrace-design.txt | 5 | ||||
| -rw-r--r-- | Documentation/trace/ftrace.txt | 49 | ||||
| -rw-r--r-- | Documentation/trace/postprocess/trace-vmscan-postprocess.pl | 32 | ||||
| -rw-r--r-- | Documentation/trace/ring-buffer-design.txt | 2 | ||||
| -rw-r--r-- | Documentation/trace/tracepoints.txt | 48 | ||||
| -rw-r--r-- | Documentation/trace/uprobetracer.txt | 36 |
9 files changed, 429 insertions, 39 deletions
diff --git a/Documentation/trace/events-nmi.txt b/Documentation/trace/events-nmi.txt new file mode 100644 index 00000000000..c03c8c89f08 --- /dev/null +++ b/Documentation/trace/events-nmi.txt @@ -0,0 +1,43 @@ +NMI Trace Events + +These events normally show up here: + + /sys/kernel/debug/tracing/events/nmi + +-- + +nmi_handler: + +You might want to use this tracepoint if you suspect that your +NMI handlers are hogging large amounts of CPU time. The kernel +will warn if it sees long-running handlers: + + INFO: NMI handler took too long to run: 9.207 msecs + +and this tracepoint will allow you to drill down and get some +more details. + +Let's say you suspect that perf_event_nmi_handler() is causing +you some problems and you only want to trace that handler +specifically. You need to find its address: + + $ grep perf_event_nmi_handler /proc/kallsyms + ffffffff81625600 t perf_event_nmi_handler + +Let's also say you are only interested in when that function is +really hogging a lot of CPU time, like a millisecond at a time. +Note that the kernel's output is in milliseconds, but the input +to the filter is in nanoseconds! You can filter on 'delta_ns': + +cd /sys/kernel/debug/tracing/events/nmi/nmi_handler +echo 'handler==0xffffffff81625600 && delta_ns>1000000' > filter +echo 1 > enable + +Your output would then look like: + +$ cat /sys/kernel/debug/tracing/trace_pipe +<idle>-0 [000] d.h3 505.397558: nmi_handler: perf_event_nmi_handler() delta_ns: 3236765 handled: 1 +<idle>-0 [000] d.h3 505.805893: nmi_handler: perf_event_nmi_handler() delta_ns: 3174234 handled: 1 +<idle>-0 [000] d.h3 506.158206: nmi_handler: perf_event_nmi_handler() delta_ns: 3084642 handled: 1 +<idle>-0 [000] d.h3 506.334346: nmi_handler: perf_event_nmi_handler() delta_ns: 3080351 handled: 1 + diff --git a/Documentation/trace/events-power.txt b/Documentation/trace/events-power.txt index e1498ff8cf9..21d514ced21 100644 --- a/Documentation/trace/events-power.txt +++ b/Documentation/trace/events-power.txt @@ -63,3 +63,34 @@ power_domain_target "%s state=%lu cpu_id=%lu" The first parameter gives the power domain name (e.g. "mpu_pwrdm"). The second parameter is the power domain target state. +4. PM QoS events +================ +The PM QoS events are used for QoS add/update/remove request and for +target/flags update. + +pm_qos_add_request "pm_qos_class=%s value=%d" +pm_qos_update_request "pm_qos_class=%s value=%d" +pm_qos_remove_request "pm_qos_class=%s value=%d" +pm_qos_update_request_timeout "pm_qos_class=%s value=%d, timeout_us=%ld" + +The first parameter gives the QoS class name (e.g. "CPU_DMA_LATENCY"). +The second parameter is value to be added/updated/removed. +The third parameter is timeout value in usec. + +pm_qos_update_target "action=%s prev_value=%d curr_value=%d" +pm_qos_update_flags "action=%s prev_value=0x%x curr_value=0x%x" + +The first parameter gives the QoS action name (e.g. "ADD_REQ"). +The second parameter is the previous QoS value. +The third parameter is the current QoS value to update. + +And, there are also events used for device PM QoS add/update/remove request. + +dev_pm_qos_add_request "device=%s type=%s new_value=%d" +dev_pm_qos_update_request "device=%s type=%s new_value=%d" +dev_pm_qos_remove_request "device=%s type=%s new_value=%d" + +The first parameter gives the device name which tries to add/update/remove +QoS requests. +The second parameter gives the request type (e.g. "DEV_PM_QOS_RESUME_LATENCY"). +The third parameter is value to be added/updated/removed. diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt index bb24c2a0e87..75d25a1d6e4 100644 --- a/Documentation/trace/events.txt +++ b/Documentation/trace/events.txt @@ -183,13 +183,22 @@ The relational-operators depend on the type of the field being tested: The operators available for numeric fields are: -==, !=, <, <=, >, >= +==, !=, <, <=, >, >=, & And for string fields they are: -==, != +==, !=, ~ -Currently, only exact string matches are supported. +The glob (~) only accepts a wild card character (*) at the start and or +end of the string. For example: + + prev_comm ~ "*sh" + prev_comm ~ "sh*" + prev_comm ~ "*sh*" + +But does not allow for it to be within the string: + + prev_comm ~ "ba*sh" <-- is invalid 5.2 Setting filters ------------------- @@ -278,3 +287,210 @@ their old filters): prev_pid == 0 # cat sched_wakeup/filter common_pid == 0 + +6. Event triggers +================= + +Trace events can be made to conditionally invoke trigger 'commands' +which can take various forms and are described in detail below; +examples would be enabling or disabling other trace events or invoking +a stack trace whenever the trace event is hit. Whenever a trace event +with attached triggers is invoked, the set of trigger commands +associated with that event is invoked. Any given trigger can +additionally have an event filter of the same form as described in +section 5 (Event filtering) associated with it - the command will only +be invoked if the event being invoked passes the associated filter. +If no filter is associated with the trigger, it always passes. + +Triggers are added to and removed from a particular event by writing +trigger expressions to the 'trigger' file for the given event. + +A given event can have any number of triggers associated with it, +subject to any restrictions that individual commands may have in that +regard. + +Event triggers are implemented on top of "soft" mode, which means that +whenever a trace event has one or more triggers associated with it, +the event is activated even if it isn't actually enabled, but is +disabled in a "soft" mode. That is, the tracepoint will be called, +but just will not be traced, unless of course it's actually enabled. +This scheme allows triggers to be invoked even for events that aren't +enabled, and also allows the current event filter implementation to be +used for conditionally invoking triggers. + +The syntax for event triggers is roughly based on the syntax for +set_ftrace_filter 'ftrace filter commands' (see the 'Filter commands' +section of Documentation/trace/ftrace.txt), but there are major +differences and the implementation isn't currently tied to it in any +way, so beware about making generalizations between the two. + +6.1 Expression syntax +--------------------- + +Triggers are added by echoing the command to the 'trigger' file: + + # echo 'command[:count] [if filter]' > trigger + +Triggers are removed by echoing the same command but starting with '!' +to the 'trigger' file: + + # echo '!command[:count] [if filter]' > trigger + +The [if filter] part isn't used in matching commands when removing, so +leaving that off in a '!' command will accomplish the same thing as +having it in. + +The filter syntax is the same as that described in the 'Event +filtering' section above. + +For ease of use, writing to the trigger file using '>' currently just +adds or removes a single trigger and there's no explicit '>>' support +('>' actually behaves like '>>') or truncation support to remove all +triggers (you have to use '!' for each one added.) + +6.2 Supported trigger commands +------------------------------ + +The following commands are supported: + +- enable_event/disable_event + + These commands can enable or disable another trace event whenever + the triggering event is hit. When these commands are registered, + the other trace event is activated, but disabled in a "soft" mode. + That is, the tracepoint will be called, but just will not be traced. + The event tracepoint stays in this mode as long as there's a trigger + in effect that can trigger it. + + For example, the following trigger causes kmalloc events to be + traced when a read system call is entered, and the :1 at the end + specifies that this enablement happens only once: + + # echo 'enable_event:kmem:kmalloc:1' > \ + /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger + + The following trigger causes kmalloc events to stop being traced + when a read system call exits. This disablement happens on every + read system call exit: + + # echo 'disable_event:kmem:kmalloc' > \ + /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger + + The format is: + + enable_event:<system>:<event>[:count] + disable_event:<system>:<event>[:count] + + To remove the above commands: + + # echo '!enable_event:kmem:kmalloc:1' > \ + /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger + + # echo '!disable_event:kmem:kmalloc' > \ + /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger + + Note that there can be any number of enable/disable_event triggers + per triggering event, but there can only be one trigger per + triggered event. e.g. sys_enter_read can have triggers enabling both + kmem:kmalloc and sched:sched_switch, but can't have two kmem:kmalloc + versions such as kmem:kmalloc and kmem:kmalloc:1 or 'kmem:kmalloc if + bytes_req == 256' and 'kmem:kmalloc if bytes_alloc == 256' (they + could be combined into a single filter on kmem:kmalloc though). + +- stacktrace + + This command dumps a stacktrace in the trace buffer whenever the + triggering event occurs. + + For example, the following trigger dumps a stacktrace every time the + kmalloc tracepoint is hit: + + # echo 'stacktrace' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + The following trigger dumps a stacktrace the first 5 times a kmalloc + request happens with a size >= 64K + + # echo 'stacktrace:5 if bytes_req >= 65536' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + The format is: + + stacktrace[:count] + + To remove the above commands: + + # echo '!stacktrace' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + # echo '!stacktrace:5 if bytes_req >= 65536' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + The latter can also be removed more simply by the following (without + the filter): + + # echo '!stacktrace:5' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + Note that there can be only one stacktrace trigger per triggering + event. + +- snapshot + + This command causes a snapshot to be triggered whenever the + triggering event occurs. + + The following command creates a snapshot every time a block request + queue is unplugged with a depth > 1. If you were tracing a set of + events or functions at the time, the snapshot trace buffer would + capture those events when the trigger event occurred: + + # echo 'snapshot if nr_rq > 1' > \ + /sys/kernel/debug/tracing/events/block/block_unplug/trigger + + To only snapshot once: + + # echo 'snapshot:1 if nr_rq > 1' > \ + /sys/kernel/debug/tracing/events/block/block_unplug/trigger + + To remove the above commands: + + # echo '!snapshot if nr_rq > 1' > \ + /sys/kernel/debug/tracing/events/block/block_unplug/trigger + + # echo '!snapshot:1 if nr_rq > 1' > \ + /sys/kernel/debug/tracing/events/block/block_unplug/trigger + + Note that there can be only one snapshot trigger per triggering + event. + +- traceon/traceoff + + These commands turn tracing on and off when the specified events are + hit. The parameter determines how many times the tracing system is + turned on and off. If unspecified, there is no limit. + + The following command turns tracing off the first time a block + request queue is unplugged with a depth > 1. If you were tracing a + set of events or functions at the time, you could then examine the + trace buffer to see the sequence of events that led up to the + trigger event: + + # echo 'traceoff:1 if nr_rq > 1' > \ + /sys/kernel/debug/tracing/events/block/block_unplug/trigger + + To always disable tracing when nr_rq > 1 : + + # echo 'traceoff if nr_rq > 1' > \ + /sys/kernel/debug/tracing/events/block/block_unplug/trigger + + To remove the above commands: + + # echo '!traceoff:1 if nr_rq > 1' > \ + /sys/kernel/debug/tracing/events/block/block_unplug/trigger + + # echo '!traceoff if nr_rq > 1' > \ + /sys/kernel/debug/tracing/events/block/block_unplug/trigger + + Note that there can be only one traceon or traceoff trigger per + triggering event. diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt index 79fcafc7fd6..3f669b9e885 100644 --- a/Documentation/trace/ftrace-design.txt +++ b/Documentation/trace/ftrace-design.txt @@ -358,11 +358,8 @@ Every arch has an init callback function. If you need to do something early on to initialize some state, this is the time to do that. Otherwise, this simple function below should be sufficient for most people: -int __init ftrace_dyn_arch_init(void *data) +int __init ftrace_dyn_arch_init(void) { - /* return value is done indirectly via data */ - *(unsigned long *)data = 0; - return 0; } diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt index bfe8c29b1f1..2479b2a0c77 100644 --- a/Documentation/trace/ftrace.txt +++ b/Documentation/trace/ftrace.txt @@ -655,7 +655,11 @@ explains which is which. read the irq flags variable, an 'X' will always be printed here. - need-resched: 'N' task need_resched is set, '.' otherwise. + need-resched: + 'N' both TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED is set, + 'n' only TIF_NEED_RESCHED is set, + 'p' only PREEMPT_NEED_RESCHED is set, + '.' otherwise. hardirq/softirq: 'H' - hard irq occurred inside a softirq. @@ -735,7 +739,7 @@ Here are the available options: function as well as the function being traced. print-parent: - bash-4000 [01] 1477.606694: simple_strtoul <-strict_strtoul + bash-4000 [01] 1477.606694: simple_strtoul <-kstrtoul noprint-parent: bash-4000 [01] 1477.606694: simple_strtoul @@ -759,7 +763,7 @@ Here are the available options: latency-format option is enabled. bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \ - (+0.000ms): simple_strtoul (strict_strtoul) + (+0.000ms): simple_strtoul (kstrtoul) raw - This will display raw numbers. This option is best for use with user applications that can translate the raw @@ -1999,6 +2003,32 @@ want, depending on your needs. 360.774530 | 1) 0.594 us | __phys_addr(); +The function name is always displayed after the closing bracket +for a function if the start of that function is not in the +trace buffer. + +Display of the function name after the closing bracket may be +enabled for functions whose start is in the trace buffer, +allowing easier searching with grep for function durations. +It is default disabled. + + hide: echo nofuncgraph-tail > trace_options + show: echo funcgraph-tail > trace_options + + Example with nofuncgraph-tail (default): + 0) | putname() { + 0) | kmem_cache_free() { + 0) 0.518 us | __phys_addr(); + 0) 1.757 us | } + 0) 2.861 us | } + + Example with funcgraph-tail: + 0) | putname() { + 0) | kmem_cache_free() { + 0) 0.518 us | __phys_addr(); + 0) 1.757 us | } /* kmem_cache_free() */ + 0) 2.861 us | } /* putname() */ + You can put some comments on specific functions by using trace_printk() For example, if you want to put a comment inside the __might_sleep() function, you just have to include @@ -2430,6 +2460,19 @@ The following commands are supported: echo '!schedule:disable_event:sched:sched_switch' > \ set_ftrace_filter +- dump + When the function is hit, it will dump the contents of the ftrace + ring buffer to the console. This is useful if you need to debug + something, and want to dump the trace when a certain function + is hit. Perhaps its a function that is called before a tripple + fault happens and does not allow you to get a regular dump. + +- cpudump + When the function is hit, it will dump the contents of the ftrace + ring buffer for the current CPU to the console. Unlike the "dump" + command, it only prints out the contents of the ring buffer for the + CPU that executed the function that triggered the dump. + trace_pipe ---------- diff --git a/Documentation/trace/postprocess/trace-vmscan-postprocess.pl b/Documentation/trace/postprocess/trace-vmscan-postprocess.pl index 4a37c4759cd..78c9a7b2b58 100644 --- a/Documentation/trace/postprocess/trace-vmscan-postprocess.pl +++ b/Documentation/trace/postprocess/trace-vmscan-postprocess.pl @@ -47,7 +47,6 @@ use constant HIGH_KSWAPD_REWAKEUP => 21; use constant HIGH_NR_SCANNED => 22; use constant HIGH_NR_TAKEN => 23; use constant HIGH_NR_RECLAIMED => 24; -use constant HIGH_NR_CONTIG_DIRTY => 25; my %perprocesspid; my %perprocess; @@ -105,7 +104,7 @@ my $regex_direct_end_default = 'nr_reclaimed=([0-9]*)'; my $regex_kswapd_wake_default = 'nid=([0-9]*) order=([0-9]*)'; my $regex_kswapd_sleep_default = 'nid=([0-9]*)'; my $regex_wakeup_kswapd_default = 'nid=([0-9]*) zid=([0-9]*) order=([0-9]*)'; -my $regex_lru_isolate_default = 'isolate_mode=([0-9]*) order=([0-9]*) nr_requested=([0-9]*) nr_scanned=([0-9]*) nr_taken=([0-9]*) contig_taken=([0-9]*) contig_dirty=([0-9]*) contig_failed=([0-9]*)'; +my $regex_lru_isolate_default = 'isolate_mode=([0-9]*) order=([0-9]*) nr_requested=([0-9]*) nr_scanned=([0-9]*) nr_taken=([0-9]*) file=([0-9]*)'; my $regex_lru_shrink_inactive_default = 'nid=([0-9]*) zid=([0-9]*) nr_scanned=([0-9]*) nr_reclaimed=([0-9]*) priority=([0-9]*) flags=([A-Z_|]*)'; my $regex_lru_shrink_active_default = 'lru=([A-Z_]*) nr_scanned=([0-9]*) nr_rotated=([0-9]*) priority=([0-9]*)'; my $regex_writepage_default = 'page=([0-9a-f]*) pfn=([0-9]*) flags=([A-Z_|]*)'; @@ -123,7 +122,7 @@ my $regex_writepage; # Static regex used. Specified like this for readability and for use with /o # (process_pid) (cpus ) ( time ) (tpoint ) (details) -my $regex_traceevent = '\s*([a-zA-Z0-9-]*)\s*(\[[0-9]*\])\s*([0-9.]*):\s*([a-zA-Z_]*):\s*(.*)'; +my $regex_traceevent = '\s*([a-zA-Z0-9-]*)\s*(\[[0-9]*\])(\s*[dX.][Nnp.][Hhs.][0-9a-fA-F.]*|)\s*([0-9.]*):\s*([a-zA-Z_]*):\s*(.*)'; my $regex_statname = '[-0-9]*\s\((.*)\).*'; my $regex_statppid = '[-0-9]*\s\(.*\)\s[A-Za-z]\s([0-9]*).*'; @@ -200,7 +199,7 @@ $regex_lru_isolate = generate_traceevent_regex( $regex_lru_isolate_default, "isolate_mode", "order", "nr_requested", "nr_scanned", "nr_taken", - "contig_taken", "contig_dirty", "contig_failed"); + "file"); $regex_lru_shrink_inactive = generate_traceevent_regex( "vmscan/mm_vmscan_lru_shrink_inactive", $regex_lru_shrink_inactive_default, @@ -270,8 +269,8 @@ EVENT_PROCESS: while ($traceevent = <STDIN>) { if ($traceevent =~ /$regex_traceevent/o) { $process_pid = $1; - $timestamp = $3; - $tracepoint = $4; + $timestamp = $4; + $tracepoint = $5; $process_pid =~ /(.*)-([0-9]*)$/; my $process = $1; @@ -299,7 +298,7 @@ EVENT_PROCESS: $perprocesspid{$process_pid}->{MM_VMSCAN_DIRECT_RECLAIM_BEGIN}++; $perprocesspid{$process_pid}->{STATE_DIRECT_BEGIN} = $timestamp; - $details = $5; + $details = $6; if ($details !~ /$regex_direct_begin/o) { print "WARNING: Failed to parse mm_vmscan_direct_reclaim_begin as expected\n"; print " $details\n"; @@ -322,7 +321,7 @@ EVENT_PROCESS: $perprocesspid{$process_pid}->{HIGH_DIRECT_RECLAIM_LATENCY}[$index] = "$order-$latency"; } } elsif ($tracepoint eq "mm_vmscan_kswapd_wake") { - $details = $5; + $details = $6; if ($details !~ /$regex_kswapd_wake/o) { print "WARNING: Failed to parse mm_vmscan_kswapd_wake as expected\n"; print " $details\n"; @@ -356,7 +355,7 @@ EVENT_PROCESS: } elsif ($tracepoint eq "mm_vmscan_wakeup_kswapd") { $perprocesspid{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD}++; - $details = $5; + $details = $6; if ($details !~ /$regex_wakeup_kswapd/o) { print "WARNING: Failed to parse mm_vmscan_wakeup_kswapd as expected\n"; print " $details\n"; @@ -366,7 +365,7 @@ EVENT_PROCESS: my $order = $3; $perprocesspid{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD_PERORDER}[$order]++; } elsif ($tracepoint eq "mm_vmscan_lru_isolate") { - $details = $5; + $details = $6; if ($details !~ /$regex_lru_isolate/o) { print "WARNING: Failed to parse mm_vmscan_lru_isolate as expected\n"; print " $details\n"; @@ -375,7 +374,6 @@ EVENT_PROCESS: } my $isolate_mode = $1; my $nr_scanned = $4; - my $nr_contig_dirty = $7; # To closer match vmstat scanning statistics, only count isolate_both # and isolate_inactive as scanning. isolate_active is rotation @@ -385,9 +383,8 @@ EVENT_PROCESS: if ($isolate_mode != 2) { $perprocesspid{$process_pid}->{HIGH_NR_SCANNED} += $nr_scanned; } - $perprocesspid{$process_pid}->{HIGH_NR_CONTIG_DIRTY} += $nr_contig_dirty; } elsif ($tracepoint eq "mm_vmscan_lru_shrink_inactive") { - $details = $5; + $details = $6; if ($details !~ /$regex_lru_shrink_inactive/o) { print "WARNING: Failed to parse mm_vmscan_lru_shrink_inactive as expected\n"; print " $details\n"; @@ -397,7 +394,7 @@ EVENT_PROCESS: my $nr_reclaimed = $4; $perprocesspid{$process_pid}->{HIGH_NR_RECLAIMED} += $nr_reclaimed; } elsif ($tracepoint eq "mm_vmscan_writepage") { - $details = $5; + $details = $6; if ($details !~ /$regex_writepage/o) { print "WARNING: Failed to parse mm_vmscan_writepage as expected\n"; print " $details\n"; @@ -539,13 +536,6 @@ sub dump_stats { } } } - if ($stats{$process_pid}->{HIGH_NR_CONTIG_DIRTY}) { - print " "; - my $count = $stats{$process_pid}->{HIGH_NR_CONTIG_DIRTY}; - if ($count != 0) { - print "contig-dirty=$count "; - } - } print "\n"; } diff --git a/Documentation/trace/ring-buffer-design.txt b/Documentation/trace/ring-buffer-design.txt index 7d350b49658..ff747b6fa39 100644 --- a/Documentation/trace/ring-buffer-design.txt +++ b/Documentation/trace/ring-buffer-design.txt @@ -683,7 +683,7 @@ against nested writers. cmpxchg(tail_page, temp_page, next_page) The above will update the tail page if it is still pointing to the expected -page. If this fails, a nested write pushed it forward, the the current write +page. If this fails, a nested write pushed it forward, the current write does not need to push it. diff --git a/Documentation/trace/tracepoints.txt b/Documentation/trace/tracepoints.txt index da49437d5ae..a3efac621c5 100644 --- a/Documentation/trace/tracepoints.txt +++ b/Documentation/trace/tracepoints.txt @@ -40,7 +40,13 @@ Two elements are required for tracepoints : In order to use tracepoints, you should include linux/tracepoint.h. -In include/trace/subsys.h : +In include/trace/events/subsys.h : + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM subsys + +#if !defined(_TRACE_SUBSYS_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_SUBSYS_H #include <linux/tracepoint.h> @@ -48,10 +54,16 @@ DECLARE_TRACE(subsys_eventname, TP_PROTO(int firstarg, struct task_struct *p), TP_ARGS(firstarg, p)); +#endif /* _TRACE_SUBSYS_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> + In subsys/file.c (where the tracing statement must be added) : -#include <trace/subsys.h> +#include <trace/events/subsys.h> +#define CREATE_TRACE_POINTS DEFINE_TRACE(subsys_eventname); void somefct(void) @@ -72,6 +84,9 @@ Where : - TP_ARGS(firstarg, p) are the parameters names, same as found in the prototype. +- if you use the header in multiple source files, #define CREATE_TRACE_POINTS + should appear only in one source file. + Connecting a function (probe) to a tracepoint is done by providing a probe (function to call) for the specific tracepoint through register_trace_subsys_eventname(). Removing a probe is done through @@ -99,3 +114,32 @@ core kernel image or in modules. If the tracepoint has to be used in kernel modules, an EXPORT_TRACEPOINT_SYMBOL_GPL() or EXPORT_TRACEPOINT_SYMBOL() can be used to export the defined tracepoints. + +If you need to do a bit of work for a tracepoint parameter, and +that work is only used for the tracepoint, that work can be encapsulated +within an if statement with the following: + + if (trace_foo_bar_enabled()) { + int i; + int tot = 0; + + for (i = 0; i < count; i++) + tot += calculate_nuggets(); + + trace_foo_bar(tot); + } + +All trace_<tracepoint>() calls have a matching trace_<tracepoint>_enabled() +function defined that returns true if the tracepoint is enabled and +false otherwise. The trace_<tracepoint>() should always be within the +block of the if (trace_<tracepoint>_enabled()) to prevent races between +the tracepoint being enabled and the check being seen. + +The advantage of using the trace_<tracepoint>_enabled() is that it uses +the static_key of the tracepoint to allow the if statement to be implemented +with jump labels and avoid conditional branches. + +Note: The convenience macro TRACE_EVENT provides an alternative way to + define tracepoints. Check http://lwn.net/Articles/379903, + http://lwn.net/Articles/381064 and http://lwn.net/Articles/383362 + for a series of articles with more details. diff --git a/Documentation/trace/uprobetracer.txt b/Documentation/trace/uprobetracer.txt index d9c3e682312..f1cf9a34ad9 100644 --- a/Documentation/trace/uprobetracer.txt +++ b/Documentation/trace/uprobetracer.txt @@ -19,18 +19,44 @@ user to calculate the offset of the probepoint in the object. Synopsis of uprobe_tracer ------------------------- - p[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS] : Set a uprobe - r[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS] : Set a return uprobe (uretprobe) - -:[GRP/]EVENT : Clear uprobe or uretprobe event + p[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a uprobe + r[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a return uprobe (uretprobe) + -:[GRP/]EVENT : Clear uprobe or uretprobe event GRP : Group name. If omitted, "uprobes" is the default value. EVENT : Event name. If omitted, the event name is generated based - on SYMBOL+offs. + on PATH+OFFSET. PATH : Path to an executable or a library. - SYMBOL[+offs] : Symbol+offset where the probe is inserted. + OFFSET : Offset where the probe is inserted. FETCHARGS : Arguments. Each probe can have up to 128 args. %REG : Fetch register REG + @ADDR : Fetch memory at ADDR (ADDR should be in userspace) + @+OFFSET : Fetch memory at OFFSET (OFFSET from same file as PATH) + $stackN : Fetch Nth entry of stack (N >= 0) + $stack : Fetch stack address. + $retval : Fetch return value.(*) + +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**) + NAME=FETCHARG : Set NAME as the argument name of FETCHARG. + FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types + (u8/u16/u32/u64/s8/s16/s32/s64), "string" and bitfield + are supported. + + (*) only for return probe. + (**) this is useful for fetching a field of data structures. + +Types +----- +Several types are supported for fetch-args. Uprobe tracer will access memory +by given type. Prefix 's' and 'u' means those types are signed and unsigned +respectively. Traced arguments are shown in decimal (signed) or hex (unsigned). +String type is a special type, which fetches a "null-terminated" string from +user space. +Bitfield is another special type, which takes 3 parameters, bit-width, bit- +offset, and container-size (usually 32). The syntax is; + + b<bit-width>@<bit-offset>/<container-size> + Event Profiling --------------- |
