aboutsummaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2011-01-06 10:06:26 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2011-01-06 10:06:26 -0800
commit2af49b6058d857fa5b476db642d4452bf5833ecd (patch)
treedbce19fe5db5c34294a911baedd2e91bb897b9ce /Documentation
parentb08b27213384d1bd6eda04a2b6f788b4cdee0f34 (diff)
parent394f4528c523d88daabd50f883a8d6b164075555 (diff)
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: remove unused __list_for_each_rcu() macro rculist: fix borked __list_for_each_rcu() macro rcu: reduce __call_rcu()-induced contention on rcu_node structures rcu: limit rcu_node leaf-level fanout rcu: fine-tune grace-period begin/end checks rcu: Keep gpnum and completed fields synchronized rcu: Stop chasing QS if another CPU did it for us rcu: increase synchronize_sched_expedited() batching rcu: Make synchronize_srcu_expedited() fast if running readers rcu: fix race condition in synchronize_sched_expedited() rcu: update documentation/comments for Lai's adoption patch rcu,cleanup: simplify the code when cpu is dying rcu,cleanup: move synchronize_sched_expedited() out of sched.c rcu: get rid of obsolete "classic" names in TREE_RCU tracing rcu: Distinguish between boosting and boosted rcu: document TINY_RCU and TINY_PREEMPT_RCU tracing. rcu: add tracing for TINY_RCU and TINY_PREEMPT_RCU rcu: priority boosting for TINY_PREEMPT_RCU rcu: move TINY_RCU from softirq to kthread rcu: add priority-inversion testing to rcutorture
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/RCU/trace.txt144
1 files changed, 128 insertions, 16 deletions
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index a851118775d..6a8c73f55b8 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -1,18 +1,22 @@
CONFIG_RCU_TRACE debugfs Files and Formats
-The rcutree implementation of RCU provides debugfs trace output that
-summarizes counters and state. This information is useful for debugging
-RCU itself, and can sometimes also help to debug abuses of RCU.
-The following sections describe the debugfs files and formats.
+The rcutree and rcutiny implementations of RCU provide debugfs trace
+output that summarizes counters and state. This information is useful for
+debugging RCU itself, and can sometimes also help to debug abuses of RCU.
+The following sections describe the debugfs files and formats, first
+for rcutree and next for rcutiny.
-Hierarchical RCU debugfs Files and Formats
+CONFIG_TREE_RCU and CONFIG_TREE_PREEMPT_RCU debugfs Files and Formats
-This implementation of RCU provides three debugfs files under the
+These implementations of RCU provides five debugfs files under the
top-level directory RCU: rcu/rcudata (which displays fields in struct
-rcu_data), rcu/rcugp (which displays grace-period counters), and
-rcu/rcuhier (which displays the struct rcu_node hierarchy).
+rcu_data), rcu/rcudata.csv (which is a .csv spreadsheet version of
+rcu/rcudata), rcu/rcugp (which displays grace-period counters),
+rcu/rcuhier (which displays the struct rcu_node hierarchy), and
+rcu/rcu_pending (which displays counts of the reasons that the
+rcu_pending() function decided that there was core RCU work to do).
The output of "cat rcu/rcudata" looks as follows:
@@ -130,7 +134,8 @@ o "ci" is the number of RCU callbacks that have been invoked for
been registered in absence of CPU-hotplug activity.
o "co" is the number of RCU callbacks that have been orphaned due to
- this CPU going offline.
+ this CPU going offline. These orphaned callbacks have been moved
+ to an arbitrarily chosen online CPU.
o "ca" is the number of RCU callbacks that have been adopted due to
other CPUs going offline. Note that ci+co-ca+ql is the number of
@@ -168,12 +173,12 @@ o "gpnum" is the number of grace periods that have started. It is
The output of "cat rcu/rcuhier" looks as follows, with very long lines:
-c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6 oqlen=0
+c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6
1/1 .>. 0:127 ^0
3/3 .>. 0:35 ^0 0/0 .>. 36:71 ^1 0/0 .>. 72:107 ^2 0/0 .>. 108:127 ^3
3/3f .>. 0:5 ^0 2/3 .>. 6:11 ^1 0/0 .>. 12:17 ^2 0/0 .>. 18:23 ^3 0/0 .>. 24:29 ^4 0/0 .>. 30:35 ^5 0/0 .>. 36:41 ^0 0/0 .>. 42:47 ^1 0/0 .>. 48:53 ^2 0/0 .>. 54:59 ^3 0/0 .>. 60:65 ^4 0/0 .>. 66:71 ^5 0/0 .>. 72:77 ^0 0/0 .>. 78:83 ^1 0/0 .>. 84:89 ^2 0/0 .>. 90:95 ^3 0/0 .>. 96:101 ^4 0/0 .>. 102:107 ^5 0/0 .>. 108:113 ^0 0/0 .>. 114:119 ^1 0/0 .>. 120:125 ^2 0/0 .>. 126:127 ^3
rcu_bh:
-c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0 oqlen=0
+c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0
0/1 .>. 0:127 ^0
0/3 .>. 0:35 ^0 0/0 .>. 36:71 ^1 0/0 .>. 72:107 ^2 0/0 .>. 108:127 ^3
0/3f .>. 0:5 ^0 0/3 .>. 6:11 ^1 0/0 .>. 12:17 ^2 0/0 .>. 18:23 ^3 0/0 .>. 24:29 ^4 0/0 .>. 30:35 ^5 0/0 .>. 36:41 ^0 0/0 .>. 42:47 ^1 0/0 .>. 48:53 ^2 0/0 .>. 54:59 ^3 0/0 .>. 60:65 ^4 0/0 .>. 66:71 ^5 0/0 .>. 72:77 ^0 0/0 .>. 78:83 ^1 0/0 .>. 84:89 ^2 0/0 .>. 90:95 ^3 0/0 .>. 96:101 ^4 0/0 .>. 102:107 ^5 0/0 .>. 108:113 ^0 0/0 .>. 114:119 ^1 0/0 .>. 120:125 ^2 0/0 .>. 126:127 ^3
@@ -212,11 +217,6 @@ o "fqlh" is the number of calls to force_quiescent_state() that
exited immediately (without even being counted in nfqs above)
due to contention on ->fqslock.
-o "oqlen" is the number of callbacks on the "orphan" callback
- list. RCU callbacks are placed on this list by CPUs going
- offline, and are "adopted" either by the CPU helping the outgoing
- CPU or by the next rcu_barrier*() call, whichever comes first.
-
o Each element of the form "1/1 0:127 ^0" represents one struct
rcu_node. Each line represents one level of the hierarchy, from
root to leaves. It is best to think of the rcu_data structures
@@ -326,3 +326,115 @@ o "nn" is the number of times that this CPU needed nothing. Alert
readers will note that the rcu "nn" number for a given CPU very
closely matches the rcu_bh "np" number for that same CPU. This
is due to short-circuit evaluation in rcu_pending().
+
+
+CONFIG_TINY_RCU and CONFIG_TINY_PREEMPT_RCU debugfs Files and Formats
+
+These implementations of RCU provides a single debugfs file under the
+top-level directory RCU, namely rcu/rcudata, which displays fields in
+rcu_bh_ctrlblk, rcu_sched_ctrlblk and, for CONFIG_TINY_PREEMPT_RCU,
+rcu_preempt_ctrlblk.
+
+The output of "cat rcu/rcudata" is as follows:
+
+rcu_preempt: qlen=24 gp=1097669 g197/p197/c197 tasks=...
+ ttb=. btg=no ntb=184 neb=0 nnb=183 j=01f7 bt=0274
+ normal balk: nt=1097669 gt=0 bt=371 b=0 ny=25073378 nos=0
+ exp balk: bt=0 nos=0
+rcu_sched: qlen: 0
+rcu_bh: qlen: 0
+
+This is split into rcu_preempt, rcu_sched, and rcu_bh sections, with the
+rcu_preempt section appearing only in CONFIG_TINY_PREEMPT_RCU builds.
+The last three lines of the rcu_preempt section appear only in
+CONFIG_RCU_BOOST kernel builds. The fields are as follows:
+
+o "qlen" is the number of RCU callbacks currently waiting either
+ for an RCU grace period or waiting to be invoked. This is the
+ only field present for rcu_sched and rcu_bh, due to the
+ short-circuiting of grace period in those two cases.
+
+o "gp" is the number of grace periods that have completed.
+
+o "g197/p197/c197" displays the grace-period state, with the
+ "g" number being the number of grace periods that have started
+ (mod 256), the "p" number being the number of grace periods
+ that the CPU has responded to (also mod 256), and the "c"
+ number being the number of grace periods that have completed
+ (once again mode 256).
+
+ Why have both "gp" and "g"? Because the data flowing into
+ "gp" is only present in a CONFIG_RCU_TRACE kernel.
+
+o "tasks" is a set of bits. The first bit is "T" if there are
+ currently tasks that have recently blocked within an RCU
+ read-side critical section, the second bit is "N" if any of the
+ aforementioned tasks are blocking the current RCU grace period,
+ and the third bit is "E" if any of the aforementioned tasks are
+ blocking the current expedited grace period. Each bit is "."
+ if the corresponding condition does not hold.
+
+o "ttb" is a single bit. It is "B" if any of the blocked tasks
+ need to be priority boosted and "." otherwise.
+
+o "btg" indicates whether boosting has been carried out during
+ the current grace period, with "exp" indicating that boosting
+ is in progress for an expedited grace period, "no" indicating
+ that boosting has not yet started for a normal grace period,
+ "begun" indicating that boosting has bebug for a normal grace
+ period, and "done" indicating that boosting has completed for
+ a normal grace period.
+
+o "ntb" is the total number of tasks subjected to RCU priority boosting
+ periods since boot.
+
+o "neb" is the number of expedited grace periods that have had
+ to resort to RCU priority boosting since boot.
+
+o "nnb" is the number of normal grace periods that have had
+ to resort to RCU priority boosting since boot.
+
+o "j" is the low-order 12 bits of the jiffies counter in hexadecimal.
+
+o "bt" is the low-order 12 bits of the value that the jiffies counter
+ will have at the next time that boosting is scheduled to begin.
+
+o In the line beginning with "normal balk", the fields are as follows:
+
+ o "nt" is the number of times that the system balked from
+ boosting because there were no blocked tasks to boost.
+ Note that the system will balk from boosting even if the
+ grace period is overdue when the currently running task
+ is looping within an RCU read-side critical section.
+ There is no point in boosting in this case, because
+ boosting a running task won't make it run any faster.
+
+ o "gt" is the number of times that the system balked
+ from boosting because, although there were blocked tasks,
+ none of them were preventing the current grace period
+ from completing.
+
+ o "bt" is the number of times that the system balked
+ from boosting because boosting was already in progress.
+
+ o "b" is the number of times that the system balked from
+ boosting because boosting had already completed for
+ the grace period in question.
+
+ o "ny" is the number of times that the system balked from
+ boosting because it was not yet time to start boosting
+ the grace period in question.
+
+ o "nos" is the number of times that the system balked from
+ boosting for inexplicable ("not otherwise specified")
+ reasons. This can actually happen due to races involving
+ increments of the jiffies counter.
+
+o In the line beginning with "exp balk", the fields are as follows:
+
+ o "bt" is the number of times that the system balked from
+ boosting because there were no blocked tasks to boost.
+
+ o "nos" is the number of times that the system balked from
+ boosting for inexplicable ("not otherwise specified")
+ reasons.