<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/events, branch v3.10.10</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/kernel/events?h=v3.10.10</id>
<link rel='self' href='https://git.amat.us/linux/atom/kernel/events?h=v3.10.10'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2013-07-25T21:07:43Z</updated>
<entry>
<title>perf: Fix perf_lock_task_context() vs RCU</title>
<updated>2013-07-25T21:07:43Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-07-12T09:08:33Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=65e303d786e20460c3d67d362f989f59944fb744'/>
<id>urn:sha1:65e303d786e20460c3d67d362f989f59944fb744</id>
<content type='text'>
commit 058ebd0eba3aff16b144eabf4510ed9510e1416e upstream.

Jiri managed to trigger this warning:

 [] ======================================================
 [] [ INFO: possible circular locking dependency detected ]
 [] 3.10.0+ #228 Tainted: G        W
 [] -------------------------------------------------------
 [] p/6613 is trying to acquire lock:
 []  (rcu_node_0){..-...}, at: [&lt;ffffffff810ca797&gt;] rcu_read_unlock_special+0xa7/0x250
 []
 [] but task is already holding lock:
 []  (&amp;ctx-&gt;lock){-.-...}, at: [&lt;ffffffff810f2879&gt;] perf_lock_task_context+0xd9/0x2c0
 []
 [] which lock already depends on the new lock.
 []
 [] the existing dependency chain (in reverse order) is:
 []
 [] -&gt; #4 (&amp;ctx-&gt;lock){-.-...}:
 [] -&gt; #3 (&amp;rq-&gt;lock){-.-.-.}:
 [] -&gt; #2 (&amp;p-&gt;pi_lock){-.-.-.}:
 [] -&gt; #1 (&amp;rnp-&gt;nocb_gp_wq[1]){......}:
 [] -&gt; #0 (rcu_node_0){..-...}:

Paul was quick to explain that due to preemptible RCU we cannot call
rcu_read_unlock() while holding scheduler (or nested) locks when part
of the read side critical section was preemptible.

Therefore solve it by making the entire RCU read side non-preemptible.

Also pull out the retry from under the non-preempt to play nice with RT.

Reported-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Helped-out-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>perf: Remove WARN_ON_ONCE() check in __perf_event_enable() for valid scenario</title>
<updated>2013-07-25T21:07:42Z</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@redhat.com</email>
</author>
<published>2013-07-09T15:44:11Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=b2412679ab3e923437e2ee109560c151c9b0cedc'/>
<id>urn:sha1:b2412679ab3e923437e2ee109560c151c9b0cedc</id>
<content type='text'>
commit 06f417968beac6e6b614e17b37d347aa6a6b1d30 upstream.

The '!ctx-&gt;is_active' check has a valid scenario, so
there's no need for the warning.

The reason is that there's a time window between the
'ctx-&gt;is_active' check in the perf_event_enable() function
and the __perf_event_enable() function having:

  - IRQs on
  - ctx-&gt;lock unlocked

where the task could be killed and 'ctx' deactivated by
perf_event_exit_task(), ending up with the warning below.

So remove the WARN_ON_ONCE() check and add comments to
explain it all.

This addresses the following warning reported by Vince Weaver:

[  324.983534] ------------[ cut here ]------------
[  324.984420] WARNING: at kernel/events/core.c:1953 __perf_event_enable+0x187/0x190()
[  324.984420] Modules linked in:
[  324.984420] CPU: 19 PID: 2715 Comm: nmi_bug_snb Not tainted 3.10.0+ #246
[  324.984420] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
[  324.984420]  0000000000000009 ffff88043fce3ec8 ffffffff8160ea0b ffff88043fce3f00
[  324.984420]  ffffffff81080ff0 ffff8802314fdc00 ffff880231a8f800 ffff88043fcf7860
[  324.984420]  0000000000000286 ffff880231a8f800 ffff88043fce3f10 ffffffff8108103a
[  324.984420] Call Trace:
[  324.984420]  &lt;IRQ&gt;  [&lt;ffffffff8160ea0b&gt;] dump_stack+0x19/0x1b
[  324.984420]  [&lt;ffffffff81080ff0&gt;] warn_slowpath_common+0x70/0xa0
[  324.984420]  [&lt;ffffffff8108103a&gt;] warn_slowpath_null+0x1a/0x20
[  324.984420]  [&lt;ffffffff81134437&gt;] __perf_event_enable+0x187/0x190
[  324.984420]  [&lt;ffffffff81130030&gt;] remote_function+0x40/0x50
[  324.984420]  [&lt;ffffffff810e51de&gt;] generic_smp_call_function_single_interrupt+0xbe/0x130
[  324.984420]  [&lt;ffffffff81066a47&gt;] smp_call_function_single_interrupt+0x27/0x40
[  324.984420]  [&lt;ffffffff8161fd2f&gt;] call_function_single_interrupt+0x6f/0x80
[  324.984420]  &lt;EOI&gt;  [&lt;ffffffff816161a1&gt;] ? _raw_spin_unlock_irqrestore+0x41/0x70
[  324.984420]  [&lt;ffffffff8113799d&gt;] perf_event_exit_task+0x14d/0x210
[  324.984420]  [&lt;ffffffff810acd04&gt;] ? switch_task_namespaces+0x24/0x60
[  324.984420]  [&lt;ffffffff81086946&gt;] do_exit+0x2b6/0xa40
[  324.984420]  [&lt;ffffffff8161615c&gt;] ? _raw_spin_unlock_irq+0x2c/0x30
[  324.984420]  [&lt;ffffffff81087279&gt;] do_group_exit+0x49/0xc0
[  324.984420]  [&lt;ffffffff81096854&gt;] get_signal_to_deliver+0x254/0x620
[  324.984420]  [&lt;ffffffff81043057&gt;] do_signal+0x57/0x5a0
[  324.984420]  [&lt;ffffffff8161a164&gt;] ? __do_page_fault+0x2a4/0x4e0
[  324.984420]  [&lt;ffffffff8161665c&gt;] ? retint_restore_args+0xe/0xe
[  324.984420]  [&lt;ffffffff816166cd&gt;] ? retint_signal+0x11/0x84
[  324.984420]  [&lt;ffffffff81043605&gt;] do_notify_resume+0x65/0x80
[  324.984420]  [&lt;ffffffff81616702&gt;] retint_signal+0x46/0x84
[  324.984420] ---[ end trace 442ec2f04db3771a ]---

Reported-by: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Signed-off-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Suggested-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Corey Ashford &lt;cjashfor@linux.vnet.ibm.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1373384651-6109-2-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>perf: Clone child context from parent context pmu</title>
<updated>2013-07-25T21:07:42Z</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@redhat.com</email>
</author>
<published>2013-07-09T15:44:10Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f38bac3d6d1fc1a726e4381f0423dcd85885b3d4'/>
<id>urn:sha1:f38bac3d6d1fc1a726e4381f0423dcd85885b3d4</id>
<content type='text'>
commit 734df5ab549ca44f40de0f07af1c8803856dfb18 upstream.

Currently when the child context for inherited events is
created, it's based on the pmu object of the first event
of the parent context.

This is wrong for the following scenario:

  - HW context having HW and SW event
  - HW event got removed (closed)
  - SW event stays in HW context as the only event
    and its pmu is used to clone the child context

The issue starts when the cpu context object is touched
based on the pmu context object (__get_cpu_context). In
this case the HW context will work with SW cpu context
ending up with following WARN below.

Fixing this by using parent context pmu object to clone
from child context.

Addresses the following warning reported by Vince Weaver:

[ 2716.472065] ------------[ cut here ]------------
[ 2716.476035] WARNING: at kernel/events/core.c:2122 task_ctx_sched_out+0x3c/0x)
[ 2716.476035] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs locn
[ 2716.476035] CPU: 0 PID: 3164 Comm: perf_fuzzer Not tainted 3.10.0-rc4 #2
[ 2716.476035] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[ 2716.476035]  0000000000000000 ffffffff8102e215 0000000000000000 ffff88011fc18
[ 2716.476035]  ffff8801175557f0 0000000000000000 ffff880119fda88c ffffffff810ad
[ 2716.476035]  ffff880119fda880 ffffffff810af02a 0000000000000009 ffff880117550
[ 2716.476035] Call Trace:
[ 2716.476035]  [&lt;ffffffff8102e215&gt;] ? warn_slowpath_common+0x5b/0x70
[ 2716.476035]  [&lt;ffffffff810ab2bd&gt;] ? task_ctx_sched_out+0x3c/0x5f
[ 2716.476035]  [&lt;ffffffff810af02a&gt;] ? perf_event_exit_task+0xbf/0x194
[ 2716.476035]  [&lt;ffffffff81032a37&gt;] ? do_exit+0x3e7/0x90c
[ 2716.476035]  [&lt;ffffffff810cd5ab&gt;] ? __do_fault+0x359/0x394
[ 2716.476035]  [&lt;ffffffff81032fe6&gt;] ? do_group_exit+0x66/0x98
[ 2716.476035]  [&lt;ffffffff8103dbcd&gt;] ? get_signal_to_deliver+0x479/0x4ad
[ 2716.476035]  [&lt;ffffffff810ac05c&gt;] ? __perf_event_task_sched_out+0x230/0x2d1
[ 2716.476035]  [&lt;ffffffff8100205d&gt;] ? do_signal+0x3c/0x432
[ 2716.476035]  [&lt;ffffffff810abbf9&gt;] ? ctx_sched_in+0x43/0x141
[ 2716.476035]  [&lt;ffffffff810ac2ca&gt;] ? perf_event_context_sched_in+0x7a/0x90
[ 2716.476035]  [&lt;ffffffff810ac311&gt;] ? __perf_event_task_sched_in+0x31/0x118
[ 2716.476035]  [&lt;ffffffff81050dd9&gt;] ? mmdrop+0xd/0x1c
[ 2716.476035]  [&lt;ffffffff81051a39&gt;] ? finish_task_switch+0x7d/0xa6
[ 2716.476035]  [&lt;ffffffff81002473&gt;] ? do_notify_resume+0x20/0x5d
[ 2716.476035]  [&lt;ffffffff813654f5&gt;] ? retint_signal+0x3d/0x78
[ 2716.476035] ---[ end trace 827178d8a5966c3d ]---

Reported-by: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Signed-off-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Corey Ashford &lt;cjashfor@linux.vnet.ibm.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1373384651-6109-1-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()</title>
<updated>2013-06-20T15:57:01Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-06-20T15:50:09Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=c790b0ad23f427c7522ffed264706238c57c007e'/>
<id>urn:sha1:c790b0ad23f427c7522ffed264706238c57c007e</id>
<content type='text'>
fetch_bp_busy_slots() and toggle_bp_slot() use
for_each_online_cpu(), this is obviously wrong wrt cpu_up() or
cpu_down(), we can over/under account the per-cpu numbers.

For example:

	# echo 0 &gt;&gt; /sys/devices/system/cpu/cpu1/online
	# perf record -e mem:0x10 -p 1 &amp;
	# echo 1 &gt;&gt; /sys/devices/system/cpu/cpu1/online
	# perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10 -C1 -a &amp;
	# taskset -p 0x2 1

triggers the same WARN_ONCE("Can't find any breakpoint slot") in
arch_install_hw_breakpoint().

Reported-by: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/20130620155009.GA6327@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)</title>
<updated>2013-06-20T15:57:00Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-06-20T15:50:06Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=8b4d801b2b123b6c09742f861fe44a8527b84d47'/>
<id>urn:sha1:8b4d801b2b123b6c09742f861fe44a8527b84d47</id>
<content type='text'>
trinity fuzzer triggered WARN_ONCE("Can't find any breakpoint
slot") in arch_install_hw_breakpoint() but the problem is not
arch-specific.

The problem is, task_bp_pinned(cpu) checks "cpu == iter-&gt;cpu"
but this doesn't account the "all cpus" events with iter-&gt;cpu &lt;
0.

This means that, say, register_user_hw_breakpoint(tsk) can
happily create the arbitrary number &gt; HBP_NUM of breakpoints
which can not be activated. toggle_bp_task_slot() is equally
wrong by the same reason and nr_task_bp_pinned[] can have
negative entries.

Simple test:

	# perl -e 'sleep 1 while 1' &amp;
	# perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10,mem:0x10 -p `pidof perl`

Before this patch this triggers the same problem/WARN_ON(),
after the patch it correctly fails with -ENOSPC.

Reported-by: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/20130620155006.GA6324@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf: Fix mmap() accounting hole</title>
<updated>2013-06-19T10:44:13Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-06-04T08:44:21Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9bb5d40cd93c9dd4be74834b1dcb1ba03629716b'/>
<id>urn:sha1:9bb5d40cd93c9dd4be74834b1dcb1ba03629716b</id>
<content type='text'>
Vince's fuzzer once again found holes. This time it spotted a leak in
the locked page accounting.

When an event had redirected output and its close() was the last
reference to the buffer we didn't have a vm context to undo accounting.

Change the code to destroy the buffer on the last munmap() and detach
all redirected events at that time. This provides us the right context
to undo the vm accounting.

Reported-and-tested-by: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20130604084421.GI8923@twins.programming.kicks-ass.net
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf: Fix perf mmap bugs</title>
<updated>2013-05-28T09:05:08Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-05-28T08:55:48Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=26cb63ad11e04047a64309362674bcbbd6a6f246'/>
<id>urn:sha1:26cb63ad11e04047a64309362674bcbbd6a6f246</id>
<content type='text'>
Vince reported a problem found by his perf specific trinity
fuzzer.

Al noticed 2 problems with perf's mmap():

 - it has issues against fork() since we use vma-&gt;vm_mm for accounting.
 - it has an rb refcount leak on double mmap().

We fix the issues against fork() by using VM_DONTCOPY; I don't
think there's code out there that uses this; we didn't hear
about weird accounting problems/crashes. If we do need this to
work, the previously proposed VM_PINNED could make this work.

Aside from the rb reference leak spotted by Al, Vince's example
prog was indeed doing a double mmap() through the use of
perf_event_set_output().

This exposes another problem, since we now have 2 events with
one buffer, the accounting gets screwy because we account per
event. Fix this by making the buffer responsible for its own
accounting.

Reported-by: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Link: http://lkml.kernel.org/r/20130528085548.GA12193@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf: Factor out auxiliary events notification</title>
<updated>2013-05-07T11:17:29Z</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@redhat.com</email>
</author>
<published>2013-05-06T16:27:18Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=52d857a8784a09576215c71cebf368d61c12a754'/>
<id>urn:sha1:52d857a8784a09576215c71cebf368d61c12a754</id>
<content type='text'>
Add perf_event_aux() function to send out all types of
auxiliary events - mmap, task, comm events. For each type
there's match and output functions defined and used as
callbacks during perf_event_aux processing.

This way we can centralize the pmu/context iterating and
event matching logic. Also since lot of the code was
duplicated, this patch reduces the .text size about 2kB
on my setup:

  snipped output from 'objdump -x kernel/events/core.o'

  before:
  Idx Name          Size
    0 .text         0000d313

  after:
  Idx Name          Size
    0 .text         0000cad3

Signed-off-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Corey Ashford &lt;cjashfor@linux.vnet.ibm.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Link: http://lkml.kernel.org/r/1367857638-27631-3-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf: Fix EXIT event notification</title>
<updated>2013-05-07T11:17:28Z</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@redhat.com</email>
</author>
<published>2013-05-06T16:27:17Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=524eff183f51d080a83b348d0ea97c08b3607b9a'/>
<id>urn:sha1:524eff183f51d080a83b348d0ea97c08b3607b9a</id>
<content type='text'>
The perf_event_task_ctx() function needs to be called with
preemption disabled, since it's checking for currently
scheduled cpu against event cpu.

We disable preemption for task related perf event context
if there's one defined, leaving up to the chance which cpu
it gets scheduled in.

Signed-off-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Corey Ashford &lt;cjashfor@linux.vnet.ibm.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Link: http://lkml.kernel.org/r/1367857638-27631-2-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2013-05-05T20:23:27Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-05-05T20:23:27Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=534c97b0950b1967bca1c753aeaed32f5db40264'/>
<id>urn:sha1:534c97b0950b1967bca1c753aeaed32f5db40264</id>
<content type='text'>
Pull 'full dynticks' support from Ingo Molnar:
 "This tree from Frederic Weisbecker adds a new, (exciting! :-) core
  kernel feature to the timer and scheduler subsystems: 'full dynticks',
  or CONFIG_NO_HZ_FULL=y.

  This feature extends the nohz variable-size timer tick feature from
  idle to busy CPUs (running at most one task) as well, potentially
  reducing the number of timer interrupts significantly.

  This feature got motivated by real-time folks and the -rt tree, but
  the general utility and motivation of full-dynticks runs wider than
  that:

   - HPC workloads get faster: CPUs running a single task should be able
     to utilize a maximum amount of CPU power.  A periodic timer tick at
     HZ=1000 can cause a constant overhead of up to 1.0%.  This feature
     removes that overhead - and speeds up the system by 0.5%-1.0% on
     typical distro configs even on modern systems.

   - Real-time workload latency reduction: CPUs running critical tasks
     should experience as little jitter as possible.  The last remaining
     source of kernel-related jitter was the periodic timer tick.

   - A single task executing on a CPU is a pretty common situation,
     especially with an increasing number of cores/CPUs, so this feature
     helps desktop and mobile workloads as well.

  The cost of the feature is mainly related to increased timer
  reprogramming overhead when a CPU switches its tick period, and thus
  slightly longer to-idle and from-idle latency.

  Configuration-wise a third mode of operation is added to the existing
  two NOHZ kconfig modes:

   - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
     as a config option.  This is the traditional Linux periodic tick
     design: there's a HZ tick going on all the time, regardless of
     whether a CPU is idle or not.

   - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
     periodic tick when a CPU enters idle mode.

   - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
     tick when a CPU is idle, also slows the tick down to 1 Hz (one
     timer interrupt per second) when only a single task is running on a
     CPU.

  The .config behavior is compatible: existing !CONFIG_NO_HZ and
  CONFIG_NO_HZ=y settings get translated to the new values, without the
  user having to configure anything.  CONFIG_NO_HZ_FULL is turned off by
  default.

  This feature is based on a lot of infrastructure work that has been
  steadily going upstream in the last 2-3 cycles: related RCU support
  and non-periodic cputime support in particular is upstream already.

  This tree adds the final pieces and activates the feature.  The pull
  request is marked RFC because:

   - it's marked 64-bit only at the moment - the 32-bit support patch is
     small but did not get ready in time.

   - it has a number of fresh commits that came in after the merge
     window.  The overwhelming majority of commits are from before the
     merge window, but still some aspects of the tree are fresh and so I
     marked it RFC.

   - it's a pretty wide-reaching feature with lots of effects - and
     while the components have been in testing for some time, the full
     combination is still not very widely used.  That it's default-off
     should reduce its regression abilities and obviously there are no
     known regressions with CONFIG_NO_HZ_FULL=y enabled either.

   - the feature is not completely idempotent: there is no 100%
     equivalent replacement for a periodic scheduler/timer tick.  In
     particular there's ongoing work to map out and reduce its effects
     on scheduler load-balancing and statistics.  This should not impact
     correctness though, there are no known regressions related to this
     feature at this point.

   - it's a pretty ambitious feature that with time will likely be
     enabled by most Linux distros, and we'd like you to make input on
     its design/implementation, if you dislike some aspect we missed.
     Without flaming us to crisp! :-)

  Future plans:

   - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
     the periodic tick altogether when there's a single busy task on a
     CPU.  We'd first like 1 Hz to be exposed more widely before we go
     for the 0 Hz target though.

   - once we reach 0 Hz we can remove the periodic tick assumption from
     nr_running&gt;=2 as well, by essentially interrupting busy tasks only
     as frequently as the sched_latency constraints require us to do -
     once every 4-40 msecs, depending on nr_running.

  I am personally leaning towards biting the bullet and doing this in
  v3.10, like the -rt tree this effort has been going on for too long -
  but the final word is up to you as usual.

  More technical details can be found in Documentation/timers/NO_HZ.txt"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
  sched: Keep at least 1 tick per second for active dynticks tasks
  rcu: Fix full dynticks' dependency on wide RCU nocb mode
  nohz: Protect smp_processor_id() in tick_nohz_task_switch()
  nohz_full: Add documentation.
  cputime_nsecs: use math64.h for nsec resolution conversion helpers
  nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
  nohz: Reduce overhead under high-freq idling patterns
  nohz: Remove full dynticks' superfluous dependency on RCU tree
  nohz: Fix unavailable tick_stop tracepoint in dynticks idle
  nohz: Add basic tracing
  nohz: Select wide RCU nocb for full dynticks
  nohz: Disable the tick when irq resume in full dynticks CPU
  nohz: Re-evaluate the tick for the new task after a context switch
  nohz: Prepare to stop the tick on irq exit
  nohz: Implement full dynticks kick
  nohz: Re-evaluate the tick from the scheduler IPI
  sched: New helper to prevent from stopping the tick in full dynticks
  sched: Kick full dynticks CPU that have more than one task enqueued.
  perf: New helper to prevent full dynticks CPUs from stopping tick
  perf: Kick full dynticks CPU if events rotation is needed
  ...
</content>
</entry>
</feed>
