diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2009-04-05 11:04:19 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2009-04-05 11:04:19 -0700 |
commit | 714f83d5d9f7c785f622259dad1f4fad12d64664 (patch) | |
tree | 20563541ae438e11d686b4d629074eb002a481b7 /Documentation | |
parent | 8901e7ffc2fa78ede7ce9826dbad68a3a25dc2dc (diff) | |
parent | 645dae969c3b8651c5bc7c54a1835ec03820f85f (diff) |
Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits)
tracing, net: fix net tree and tracing tree merge interaction
tracing, powerpc: fix powerpc tree and tracing tree interaction
ring-buffer: do not remove reader page from list on ring buffer free
function-graph: allow unregistering twice
trace: make argument 'mem' of trace_seq_putmem() const
tracing: add missing 'extern' keywords to trace_output.h
tracing: provide trace_seq_reserve()
blktrace: print out BLK_TN_MESSAGE properly
blktrace: extract duplidate code
blktrace: fix memory leak when freeing struct blk_io_trace
blktrace: fix blk_probes_ref chaos
blktrace: make classic output more classic
blktrace: fix off-by-one bug
blktrace: fix the original blktrace
blktrace: fix a race when creating blk_tree_root in debugfs
blktrace: fix timestamp in binary output
tracing, Text Edit Lock: cleanup
tracing: filter fix for TRACE_EVENT_FORMAT events
ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()
x86: kretprobe-booster interrupt emulation code fix
...
Fix up trivial conflicts in
arch/parisc/include/asm/ftrace.h
include/linux/memory.h
kernel/extable.c
kernel/module.c
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ABI/testing/debugfs-kmemtrace | 71 | ||||
-rw-r--r-- | Documentation/ftrace.txt | 1134 | ||||
-rw-r--r-- | Documentation/kernel-parameters.txt | 12 | ||||
-rw-r--r-- | Documentation/sysrq.txt | 2 | ||||
-rw-r--r-- | Documentation/tracepoints.txt | 21 | ||||
-rw-r--r-- | Documentation/vm/kmemtrace.txt | 126 |
6 files changed, 991 insertions, 375 deletions
diff --git a/Documentation/ABI/testing/debugfs-kmemtrace b/Documentation/ABI/testing/debugfs-kmemtrace new file mode 100644 index 00000000000..5e6a92a02d8 --- /dev/null +++ b/Documentation/ABI/testing/debugfs-kmemtrace @@ -0,0 +1,71 @@ +What: /sys/kernel/debug/kmemtrace/ +Date: July 2008 +Contact: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> +Description: + +In kmemtrace-enabled kernels, the following files are created: + +/sys/kernel/debug/kmemtrace/ + cpu<n> (0400) Per-CPU tracing data, see below. (binary) + total_overruns (0400) Total number of bytes which were dropped from + cpu<n> files because of full buffer condition, + non-binary. (text) + abi_version (0400) Kernel's kmemtrace ABI version. (text) + +Each per-CPU file should be read according to the relay interface. That is, +the reader should set affinity to that specific CPU and, as currently done by +the userspace application (though there are other methods), use poll() with +an infinite timeout before every read(). Otherwise, erroneous data may be +read. The binary data has the following _core_ format: + + Event ID (1 byte) Unsigned integer, one of: + 0 - represents an allocation (KMEMTRACE_EVENT_ALLOC) + 1 - represents a freeing of previously allocated memory + (KMEMTRACE_EVENT_FREE) + Type ID (1 byte) Unsigned integer, one of: + 0 - this is a kmalloc() / kfree() + 1 - this is a kmem_cache_alloc() / kmem_cache_free() + 2 - this is a __get_free_pages() et al. + Event size (2 bytes) Unsigned integer representing the + size of this event. Used to extend + kmemtrace. Discard the bytes you + don't know about. + Sequence number (4 bytes) Signed integer used to reorder data + logged on SMP machines. Wraparound + must be taken into account, although + it is unlikely. + Caller address (8 bytes) Return address to the caller. + Pointer to mem (8 bytes) Pointer to target memory area. Can be + NULL, but not all such calls might be + recorded. + +In case of KMEMTRACE_EVENT_ALLOC events, the next fields follow: + + Requested bytes (8 bytes) Total number of requested bytes, + unsigned, must not be zero. + Allocated bytes (8 bytes) Total number of actually allocated + bytes, unsigned, must not be lower + than requested bytes. + Requested flags (4 bytes) GFP flags supplied by the caller. + Target CPU (4 bytes) Signed integer, valid for event id 1. + If equal to -1, target CPU is the same + as origin CPU, but the reverse might + not be true. + +The data is made available in the same endianness the machine has. + +Other event ids and type ids may be defined and added. Other fields may be +added by increasing event size, but see below for details. +Every modification to the ABI, including new id definitions, are followed +by bumping the ABI version by one. + +Adding new data to the packet (features) is done at the end of the mandatory +data: + Feature size (2 byte) + Feature ID (1 byte) + Feature data (Feature size - 3 bytes) + + +Users: + kmemtrace-user - git://repo.or.cz/kmemtrace-user.git + diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt index 803b1318b13..fd9a3e69381 100644 --- a/Documentation/ftrace.txt +++ b/Documentation/ftrace.txt @@ -15,31 +15,31 @@ Introduction Ftrace is an internal tracer designed to help out developers and designers of systems to find what is going on inside the kernel. -It can be used for debugging or analyzing latencies and performance -issues that take place outside of user-space. +It can be used for debugging or analyzing latencies and +performance issues that take place outside of user-space. Although ftrace is the function tracer, it also includes an -infrastructure that allows for other types of tracing. Some of the -tracers that are currently in ftrace include a tracer to trace -context switches, the time it takes for a high priority task to -run after it was woken up, the time interrupts are disabled, and -more (ftrace allows for tracer plugins, which means that the list of -tracers can always grow). +infrastructure that allows for other types of tracing. Some of +the tracers that are currently in ftrace include a tracer to +trace context switches, the time it takes for a high priority +task to run after it was woken up, the time interrupts are +disabled, and more (ftrace allows for tracer plugins, which +means that the list of tracers can always grow). The File System --------------- -Ftrace uses the debugfs file system to hold the control files as well -as the files to display output. +Ftrace uses the debugfs file system to hold the control files as +well as the files to display output. To mount the debugfs system: # mkdir /debug # mount -t debugfs nodev /debug -(Note: it is more common to mount at /sys/kernel/debug, but for simplicity - this document will use /debug) +( Note: it is more common to mount at /sys/kernel/debug, but for + simplicity this document will use /debug) That's it! (assuming that you have ftrace configured into your kernel) @@ -50,90 +50,124 @@ of ftrace. Here is a list of some of the key files: Note: all time values are in microseconds. - current_tracer: This is used to set or display the current tracer - that is configured. - - available_tracers: This holds the different types of tracers that - have been compiled into the kernel. The tracers - listed here can be configured by echoing their name - into current_tracer. - - tracing_enabled: This sets or displays whether the current_tracer - is activated and tracing or not. Echo 0 into this - file to disable the tracer or 1 to enable it. - - trace: This file holds the output of the trace in a human readable - format (described below). - - latency_trace: This file shows the same trace but the information - is organized more to display possible latencies - in the system (described below). - - trace_pipe: The output is the same as the "trace" file but this - file is meant to be streamed with live tracing. - Reads from this file will block until new data - is retrieved. Unlike the "trace" and "latency_trace" - files, this file is a consumer. This means reading - from this file causes sequential reads to display - more current data. Once data is read from this - file, it is consumed, and will not be read - again with a sequential read. The "trace" and - "latency_trace" files are static, and if the - tracer is not adding more data, they will display - the same information every time they are read. - - trace_options: This file lets the user control the amount of data - that is displayed in one of the above output - files. - - trace_max_latency: Some of the tracers record the max latency. - For example, the time interrupts are disabled. - This time is saved in this file. The max trace - will also be stored, and displayed by either - "trace" or "latency_trace". A new max trace will - only be recorded if the latency is greater than - the value in this file. (in microseconds) - - buffer_size_kb: This sets or displays the number of kilobytes each CPU - buffer can hold. The tracer buffers are the same size - for each CPU. The displayed number is the size of the - CPU buffer and not total size of all buffers. The - trace buffers are allocated in pages (blocks of memory - that the kernel uses for allocation, usually 4 KB in size). - If the last page allocated has room for more bytes - than requested, the rest of the page will be used, - making the actual allocation bigger than requested. - (Note, the size may not be a multiple of the page size due - to buffer managment overhead.) - - This can only be updated when the current_tracer - is set to "nop". - - tracing_cpumask: This is a mask that lets the user only trace - on specified CPUS. The format is a hex string - representing the CPUS. - - set_ftrace_filter: When dynamic ftrace is configured in (see the - section below "dynamic ftrace"), the code is dynamically - modified (code text rewrite) to disable calling of the - function profiler (mcount). This lets tracing be configured - in with practically no overhead in performance. This also - has a side effect of enabling or disabling specific functions - to be traced. Echoing names of functions into this file - will limit the trace to only those functions. - - set_ftrace_notrace: This has an effect opposite to that of - set_ftrace_filter. Any function that is added here will not - be traced. If a function exists in both set_ftrace_filter - and set_ftrace_notrace, the function will _not_ be traced. - - set_ftrace_pid: Have the function tracer only trace a single thread. - - available_filter_functions: This lists the functions that ftrace - has processed and can trace. These are the function - names that you can pass to "set_ftrace_filter" or - "set_ftrace_notrace". (See the section "dynamic ftrace" - below for more details.) + current_tracer: + + This is used to set or display the current tracer + that is configured. + + available_tracers: + + This holds the different types of tracers that + have been compiled into the kernel. The + tracers listed here can be configured by + echoing their name into current_tracer. + + tracing_enabled: + + This sets or displays whether the current_tracer + is activated and tracing or not. Echo 0 into this + file to disable the tracer or 1 to enable it. + + trace: + + This file holds the output of the trace in a human + readable format (described below). + + latency_trace: + + This file shows the same trace but the information + is organized more to display possible latencies + in the system (described below). + + trace_pipe: + + The output is the same as the "trace" file but this + file is meant to be streamed with live tracing. + Reads from this file will block until new data + is retrieved. Unlike the "trace" and "latency_trace" + files, this file is a consumer. This means reading + from this file causes sequential reads to display + more current data. Once data is read from this + file, it is consumed, and will not be read + again with a sequential read. The "trace" and + "latency_trace" files are static, and if the + tracer is not adding more data, they will display + the same information every time they are read. + + trace_options: + + This file lets the user control the amount of data + that is displayed in one of the above output + files. + + tracing_max_latency: + + Some of the tracers record the max latency. + For example, the time interrupts are disabled. + This time is saved in this file. The max trace + will also be stored, and displayed by either + "trace" or "latency_trace". A new max trace will + only be recorded if the latency is greater than + the value in this file. (in microseconds) + + buffer_size_kb: + + This sets or displays the number of kilobytes each CPU + buffer can hold. The tracer buffers are the same size + for each CPU. The displayed number is the size of the + CPU buffer and not total size of all buffers. The + trace buffers are allocated in pages (blocks of memory + that the kernel uses for allocation, usually 4 KB in size). + If the last page allocated has room for more bytes + than requested, the rest of the page will be used, + making the actual allocation bigger than requested. + ( Note, the size may not be a multiple of the page size + due to buffer managment overhead. ) + + This can only be updated when the current_tracer + is set to "nop". + + tracing_cpumask: + + This is a mask that lets the user only trace + on specified CPUS. The format is a hex string + representing the CPUS. + + set_ftrace_filter: + + When dynamic ftrace is configured in (see the + section below "dynamic ftrace"), the code is dynamically + modified (code text rewrite) to disable calling of the + function profiler (mcount). This lets tracing be configured + in with practically no overhead in performance. This also + has a side effect of enabling or disabling specific functions + to be traced. Echoing names of functions into this file + will limit the trace to only those functions. + + set_ftrace_notrace: + + This has an effect opposite to that of + set_ftrace_filter. Any function that is added here will not + be traced. If a function exists in both set_ftrace_filter + and set_ftrace_notrace, the function will _not_ be traced. + + set_ftrace_pid: + + Have the function tracer only trace a single thread. + + set_graph_function: + + Set a "trigger" function where tracing should start + with the function graph tracer (See the section + "dynamic ftrace" for more details). + + available_filter_functions: + + This lists the functions that ftrace + has processed and can trace. These are the function + names that you can pass to "set_ftrace_filter" or + "set_ftrace_notrace". (See the section "dynamic ftrace" + below for more details.) The Tracers @@ -141,36 +175,66 @@ The Tracers Here is the list of current tracers that may be configured. - function - function tracer that uses mcount to trace all functions. + "function" + + Function call tracer to trace all kernel functions. + + "function_graph_tracer" + + Similar to the function tracer except that the + function tracer probes the functions on their entry + whereas the function graph tracer traces on both entry + and exit of the functions. It then provides the ability + to draw a graph of function calls similar to C code + source. - sched_switch - traces the context switches between tasks. + "sched_switch" - irqsoff - traces the areas that disable interrupts and saves - the trace with the longest max latency. - See tracing_max_latency. When a new max is recorded, - it replaces the old trace. It is best to view this - trace via the latency_trace file. + Traces the context switches and wakeups between tasks. - preemptoff - Similar to irqsoff but traces and records the amount of - time for which preemption is disabled. + "irqsoff" - preemptirqsoff - Similar to irqsoff and preemptoff, but traces and - records the largest time for which irqs and/or preemption - is disabled. + Traces the areas that disable interrupts and saves + the trace with the longest max latency. + See tracing_max_latency. When a new max is recorded, + it replaces the old trace. It is best to view this + trace via the latency_trace file. - wakeup - Traces and records the max latency that it takes for - the highest priority task to get scheduled after - it has been woken up. + "preemptoff" - nop - This is not a tracer. To remove all tracers from tracing - simply echo "nop" into current_tracer. + Similar to irqsoff but traces and records the amount of + time for which preemption is disabled. + + "preemptirqsoff" + + Similar to irqsoff and preemptoff, but traces and + records the largest time for which irqs and/or preemption + is disabled. + + "wakeup" + + Traces and records the max latency that it takes for + the highest priority task to get scheduled after + it has been woken up. + + "hw-branch-tracer" + + Uses the BTS CPU feature on x86 CPUs to traces all + branches executed. + + "nop" + + This is the "trace nothing" tracer. To remove all + tracers from tracing simply echo "nop" into + current_tracer. Examples of using the tracer ---------------------------- -Here are typical examples of using the tracers when controlling them only -with the debugfs interface (without using any user-land utilities). +Here are typical examples of using the tracers when controlling +them only with the debugfs interface (without using any +user-land utilities). Output format: -------------- @@ -187,16 +251,16 @@ Here is an example of the output format of the file "trace" bash-4251 [01] 10152.583855: _atomic_dec_and_lock <-dput -------- -A header is printed with the tracer name that is represented by the trace. -In this case the tracer is "function". Then a header showing the format. Task -name "bash", the task PID "4251", the CPU that it was running on -"01", the timestamp in <secs>.<usecs> format, the function name that was -traced "path_put" and the parent function that called this function -"path_walk". The timestamp is the time at which the function was -entered. +A header is printed with the tracer name that is represented by +the trace. In this case the tracer is "function". Then a header +showing the format. Task name "bash", the task PID "4251", the +CPU that it was running on "01", the timestamp in <secs>.<usecs> +format, the function name that was traced "path_put" and the +parent function that called this function "path_walk". The +timestamp is the time at which the function was entered. -The sched_switch tracer also includes tracing of task wakeups and -context switches. +The sched_switch tracer also includes tracing of task wakeups +and context switches. ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 10:115:S @@ -205,8 +269,8 @@ context switches. kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R -Wake ups are represented by a "+" and the context switches are shown as -"==>". The format is: +Wake ups are represented by a "+" and the context switches are +shown as "==>". The format is: Context switches: @@ -220,19 +284,20 @@ Wake ups are represented by a "+" and the context switches are shown as <pid>:<prio>:<state> + <pid>:<prio>:<state> -The prio is the internal kernel priority, which is the inverse of the -priority that is usually displayed by user-space tools. Zero represents -the highest priority (99). Prio 100 starts the "nice" priorities with -100 being equal to nice -20 and 139 being nice 19. The prio "140" is -reserved for the idle task which is the lowest priority thread (pid 0). +The prio is the internal kernel priority, which is the inverse +of the priority that is usually displayed by user-space tools. +Zero represents the highest priority (99). Prio 100 starts the +"nice" priorities with 100 being equal to nice -20 and 139 being +nice 19. The prio "140" is reserved for the idle task which is +the lowest priority thread (pid 0). Latency trace format -------------------- -For traces that display latency times, the latency_trace file gives -somewhat more information to see why a latency happened. Here is a typical -trace. +For traces that display latency times, the latency_trace file +gives somewhat more information to see why a latency happened. +Here is a typical trace. # tracer: irqsoff # @@ -259,20 +324,20 @@ irqsoff latency trace v1.1.5 on 2.6.26-rc8 <idle>-0 0d.s1 98us : trace_hardirqs_on (do_softirq) +This shows that the current tracer is "irqsoff" tracing the time +for which interrupts were disabled. It gives the trace version +and the version of the kernel upon which this was executed on +(2.6.26-rc8). Then it displays the max latency in microsecs (97 +us). The number of trace entries displayed and the total number +recorded (both are three: #3/3). The type of preemption that was +used (PREEMPT). VP, KP, SP, and HP are always zero and are +reserved for later use. #P is the number of online CPUS (#P:2). -This shows that the current tracer is "irqsoff" tracing the time for which -interrupts were disabled. It gives the trace version and the version -of the kernel upon which this was executed on (2.6.26-rc8). Then it displays -the max latency in microsecs (97 us). The number of trace entries displayed -and the total number recorded (both are three: #3/3). The type of -preemption that was used (PREEMPT). VP, KP, SP, and HP are always zero -and are reserved for later use. #P is the number of online CPUS (#P:2). - -The task is the process that was running when the latency occurred. -(swapper pid: 0). +The task is the process that was running when the latency +occurred. (swapper pid: 0). -The start and stop (the functions in which the interrupts were disabled and -enabled respectively) that caused the latencies: +The start and stop (the functions in which the interrupts were +disabled and enabled respectively) that caused the latencies: apic_timer_interrupt is where the interrupts were disabled. do_softirq is where they were enabled again. @@ -308,12 +373,12 @@ The above is mostly meaningful for kernel developers. latency_trace file is relative to the start of the trace. delay: This is just to help catch your eye a bit better. And - needs to be fixed to be only relative to the same CPU. - The marks are determined by the difference between this - current trace and the next trace. - '!' - greater than preempt_mark_thresh (default 100) - '+' - greater than 1 microsecond - ' ' - less than or equal to 1 microsecond. + needs to be fixed to be only relative to the same CPU. + The marks are determined by the difference between this + current trace and the next trace. + '!' - greater than preempt_mark_thresh (default 100) + '+' - greater than 1 microsecond + ' ' - less than or equal to 1 microsecond. The rest is the same as the 'trace' file. @@ -321,14 +386,15 @@ The above is mostly meaningful for kernel developers. trace_options ------------- -The trace_options file is used to control what gets printed in the trace -output. To see what is available, simply cat the file: +The trace_options file is used to control what gets printed in +the trace output. To see what is available, simply cat the file: cat /debug/tracing/trace_options print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \ - noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj + noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj -To disable one of the options, echo in the option prepended with "no". +To disable one of the options, echo in the option prepended with +"no". echo noprint-parent > /debug/tracing/trace_options @@ -338,8 +404,8 @@ To enable an option, leave off the "no". Here are the available options: - print-parent - On function traces, display the calling function - as well as the function being traced. + print-parent - On function traces, display the calling (parent) + function as well as the function being traced. print-parent: bash-4000 [01] 1477.606694: simple_strtoul <-strict_strtoul @@ -348,15 +414,16 @@ Here are the available options: bash-4000 [01] 1477.606694: simple_strtoul - sym-offset - Display not only the function name, but also the offset - in the function. For example, instead of seeing just - "ktime_get", you will see "ktime_get+0xb/0x20". + sym-offset - Display not only the function name, but also the + offset in the function. For example, instead of + seeing just "ktime_get", you will see + "ktime_get+0xb/0x20". sym-offset: bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0 - sym-addr - this will also display the function address as well as - the function name. + sym-addr - this will also display the function address as well + as the function name. sym-addr: bash-4000 [01] 1477.606694: simple_strtoul <c0339346> @@ -366,35 +433,41 @@ Here are the available options: bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \ (+0.000ms): simple_strtoul (strict_strtoul) - raw - This will display raw numbers. This option is best for use with - user applications that can translate the raw numbers better than - having it done in the kernel. + raw - This will display raw numbers. This option is best for + use with user applications that can translate the raw + numbers better than having it done in the kernel. - hex - Similar to raw, but the numbers will be in a hexadecimal format. + hex - Similar to raw, but the numbers will be in a hexadecimal + format. bin - This will print out the formats in raw binary. block - TBD (needs update) - stacktrace - This is one of the options that changes the trace itself. - When a trace is recorded, so is the stack of functions. - This allows for back traces of trace sites. + stacktrace - This is one of the options that changes the trace + itself. When a trace is recorded, so is the stack + of functions. This allows for back traces of + trace sites. - userstacktrace - This option changes the trace. - It records a stacktrace of the current userspace thread. + userstacktrace - This option changes the trace. It records a + stacktrace of the current userspace thread. - sym-userobj - when user stacktrace are enabled, look up which object the - address belongs to, and print a relative address - This is especially useful when ASLR is on, otherwise you don't - get a chance to resolve the address to object/file/line after the app is no - longer running + sym-userobj - when user stacktrace are enabled, look up which + object the address belongs to, and print a + relative address. This is especially useful when + ASLR is on, otherwise you don't get a chance to + resolve the address to object/file/line after + the app is no longer running - The lookup is performed when you read trace,trace_pipe,latency_trace. Example: + The lookup is performed when you read + trace,trace_pipe,latency_trace. Example: a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0 x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] - sched-tree - TBD (any users??) + sched-tree - trace all tasks that are on the runqueue, at + every scheduling event. Will add overhead if + there's a lot of tasks running at once. sched_switch @@ -431,18 +504,19 @@ of how to use it. [...] -As we have discussed previously about this format, the header shows -the name of the trace and points to the options. The "FUNCTION" -is a misnomer since here it represents the wake ups and context -switches. +As we have discussed previously about this format, the header +shows the name of the trace and points to the options. The +"FUNCTION" is a misnomer since here it represents the wake ups +and context switches. -The sched_switch file only lists the wake ups (represented with '+') -and context switches ('==>') with the previous task or current task -first followed by the next task or task waking up. The format for both -of these is PID:KERNEL-PRIO:TASK-STATE. Remember that the KERNEL-PRIO -is the inverse of the actual priority with zero (0) being the highest -priority and the nice values starting at 100 (nice -20). Below is -a quick chart to map the kernel priority to user land priorities. +The sched_switch file only lists the wake ups (represented with +'+') and context switches ('==>') with the previous task or +current task first followed by the next task or task waking up. +The format for both of these is PID:KERNEL-PRIO:TASK-STATE. +Remember that the KERNEL-PRIO is the inverse of the actual +priority with zero (0) being the highest priority and the nice +values starting at 100 (nice -20). Below is a quick chart to map +the kernel priority to user land priorities. Kernel priority: 0 to 99 ==> user RT priority 99 to 0 Kernel priority: 100 to 139 ==> user nice -20 to 19 @@ -463,10 +537,10 @@ The task states are: ftrace_enabled -------------- -The following tracers (listed below) give different output depending -on whether or not the sysctl ftrace_enabled is set. To set ftrace_enabled, -one can either use the sysctl function or set it via the proc -file system interface. +The following tracers (listed below) give different output +depending on whether or not the sysctl ftrace_enabled is set. To +set ftrace_enabled, one can either use the sysctl function or +set it via the proc file system interface. sysctl kernel.ftrace_enabled=1 @@ -474,12 +548,12 @@ file system interface. echo 1 > /proc/sys/kernel/ftrace_enabled -To disable ftrace_enabled simply replace the '1' with '0' in -the above commands. +To disable ftrace_enabled simply replace the '1' with '0' in the +above commands. -When ftrace_enabled is set the tracers will also record the functions -that are within the trace. The descriptions of the tracers -will also show an example with ftrace enabled. +When ftrace_enabled is set the tracers will also record the +functions that are within the trace. The descriptions of the +tracers will also show an example with ftrace enabled. irqsoff @@ -487,17 +561,18 @@ irqsoff When interrupts are disabled, the CPU can not react to any other external event (besides NMIs and SMIs). This prevents the timer -interrupt from triggering or the mouse interrupt from letting the -kernel know of a new mouse event. The result is a latency with the -reaction time. +interrupt from triggering or the mouse interrupt from letting +the kernel know of a new mouse event. The result is a latency +with the reaction time. -The irqsoff tracer tracks the time for which interrupts are disabled. -When a new maximum latency is hit, the tracer saves the trace leading up -to that latency point so that every time a new maximum is reached, the old -saved trace is discarded and the new trace is saved. +The irqsoff tracer tracks the time for which interrupts are +disabled. When a new maximum latency is hit, the tracer saves +the trace leading up to that latency point so that every time a +new maximum is reached, the old saved trace is discarded and the +new trace is saved. -To reset the maximum, echo 0 into tracing_max_latency. Here is an -example: +To reset the maximum, echo 0 into tracing_max_latency. Here is +an example: # echo irqsoff > /debug/tracing/current_tracer # echo 0 > /debug/tracing/tracing_max_latency @@ -532,10 +607,11 @@ irqsoff latency trace v1.1.5 on 2.6.26 Here we see that that we had a latency of 12 microsecs (which is -very good). The _write_lock_irq in sys_setpgid disabled interrupts. -The difference between the 12 and the displayed timestamp 14us occurred -because the clock was incremented between the time of recording the max -latency and the time of recording the function that had that latency. +very good). The _write_lock_irq in sys_setpgid disabled +interrupts. The difference between the 12 and the displayed +timestamp 14us occurred because the clock was incremented +between the time of recording the max latency and the time of +recording the function that had that latency. Note the above example had ftrace_enabled not set. If we set the ftrace_enabled, we get a much larger output: @@ -586,24 +662,24 @@ irqsoff latency trace v1.1.5 on 2.6.26-rc8 Here we traced a 50 microsecond latency. But we also see all the -functions that were called during that time. Note that by enabling -function tracing, we incur an added overhead. This overhead may -extend the latency times. But nevertheless, this trace has provided -some very helpful debugging information. +functions that were called during that time. Note that by +enabling function tracing, we incur an added overhead. This +overhead may extend the latency times. But nevertheless, this +trace has provided some very helpful debugging information. preemptoff ---------- -When preemption is disabled, we may be able to receive interrupts but -the task cannot be preempted and a higher priority task must wait -for preemption to be enabled again before it can preempt a lower -priority task. +When preemption is disabled, we may be able to receive +interrupts but the task cannot be preempted and a higher +priority task must wait for preemption to be enabled again +before it can preempt a lower priority task. The preemptoff tracer traces the places that disable preemption. -Like the irqsoff tracer, it records the maximum latency for which preemption -was disabled. The control of preemptoff tracer is much like the irqsoff -tracer. +Like the irqsoff tracer, it records the maximum latency for +which preemption was disabled. The control of preemptoff tracer +is much like the irqsoff tracer. # echo preemptoff > /debug/tracing/current_tracer # echo 0 > /debug/tracing/tracing_max_latency @@ -637,11 +713,12 @@ preemptoff latency trace v1.1.5 on 2.6.26-rc8 sshd-4261 0d.s1 30us : trace_preempt_on (__do_softirq) -This has some more changes. Preemption was disabled when an interrupt -came in (notice the 'h'), and was enabled while doing a softirq. -(notice the 's'). But we also see that interrupts have been disabled -when entering the preempt off section and leaving it (the 'd'). -We do not know if interrupts were enabled in the mean time. +This has some more changes. Preemption was disabled when an +interrupt came in (notice the 'h'), and was enabled while doing +a softirq. (notice the 's'). But we also see that interrupts +have been disabled when entering the preempt off section and +leaving it (the 'd'). We do not know if interrupts were enabled +in the mean time. # tracer: preemptoff # @@ -700,28 +777,30 @@ preemptoff latency trace v1.1.5 on 2.6.26-rc8 sshd-4261 0d.s1 64us : trace_preempt_on (__do_softirq) -The above is an example of the preemptoff trace with ftrace_enabled -set. Here we see that interrupts were disabled the entire time. -The irq_enter code lets us know that we entered an interrupt 'h'. -Before that, the functions being traced still show that it is not -in an interrupt, but we can see from the functions themselves that -this is not the case. +The above is an example of the preemptoff trace with +ftrace_enabled set. Here we see that interrupts were disabled +the entire time. The irq_enter code lets us know that we entered +an interrupt 'h'. Before that, the functions being traced still +show that it is not in an interrupt, but we can see from the +functions themselves that this is not the case. -Notice that __do_softirq when called does not have a preempt_count. -It may seem that we missed a preempt enabling. What really happened -is that the preempt count is held on the thread's stack and we -switched to the softirq stack (4K stacks in effect). The code -does not copy the preempt count, but because interrupts are disabled, -we do not need to worry about it. Having a tracer like this is good -for letting people know what really happens inside the kernel. +Notice that __do_softirq when called does not have a +preempt_count. It may seem that we missed a preempt enabling. +What really happened is that the preempt count is held on the +thread's stack and we switched to the softirq stack (4K stacks +in effect). The code does not copy the preempt count, but +because interrupts are disabled, we do not need to worry about +it. Having a tracer like this is good for letting people know +what really happens inside the kernel. preemptirqsoff -------------- -Knowing the locations that have interrupts disabled or preemption -disabled for the longest times is helpful. But sometimes we would -like to know when either preemption and/or interrupts are disabled. +Knowing the locations that have interrupts disabled or +preemption disabled for the longest times is helpful. But +sometimes we would like to know when either preemption and/or +interrupts are disabled. Consider the following code: @@ -741,11 +820,13 @@ The preemptoff tracer will record the total length of call_function_with_irqs_and_preemption_off() and call_function_with_preemption_off(). -But neither will trace the time that interrupts and/or preemption -is disabled. This total time is the time that we can not schedule. -To record this time, use the preemptirqsoff tracer. +But neither will trace the time that interrupts and/or +preemption is disabled. This total time is the time that we can +not schedule. To record this time, use the preemptirqsoff +tracer. -Again, using this trace is much like the irqs |