<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched, branch v3.12.20</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/kernel/sched?h=v3.12.20</id>
<link rel='self' href='https://git.amat.us/linux/atom/kernel/sched?h=v3.12.20'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2014-03-22T21:01:48Z</updated>
<entry>
<title>sched: Fix double normalization of vruntime</title>
<updated>2014-03-22T21:01:48Z</updated>
<author>
<name>George McCollister</name>
<email>george.mccollister@gmail.com</email>
</author>
<published>2014-02-18T23:56:51Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=560877ed20d96a6223dbffe6bc618e6fba38311c'/>
<id>urn:sha1:560877ed20d96a6223dbffe6bc618e6fba38311c</id>
<content type='text'>
commit 791c9e0292671a3bfa95286bb5c08129d8605618 upstream.

dequeue_entity() is called when p-&gt;on_rq and sets se-&gt;on_rq = 0
which appears to guarentee that the !se-&gt;on_rq condition is met.
If the task has done set_current_state(TASK_INTERRUPTIBLE) without
schedule() the second condition will be met and vruntime will be
incorrectly adjusted twice.

In certain cases this can result in the task's vruntime never increasing
past the vruntime of other tasks on the CFS' run queue, starving them of
CPU time.

This patch changes switched_from_fair() to use !p-&gt;on_rq instead of
!se-&gt;on_rq.

I'm able to cause a task with a priority of 120 to starve all other
tasks with the same priority on an ARM platform running 3.2.51-rt72
PREEMPT RT by writing one character at time to a serial tty (16550 UART)
in a tight loop. I'm also able to verify making this change corrects the
problem on that platform and kernel version.

Signed-off-by: George McCollister &lt;george.mccollister@gmail.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1392767811-28916-1-git-send-email-george.mccollister@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;

</content>
</entry>
<entry>
<title>sched/rt: Remove redundant nr_cpus_allowed test</title>
<updated>2014-03-12T12:25:40Z</updated>
<author>
<name>Shawn Bohrer</name>
<email>sbohrer@rgmadvisors.com</email>
</author>
<published>2013-10-04T19:24:53Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4aef0b11ba4fa4c2cd115c312dd9f0b64cad1bf1'/>
<id>urn:sha1:4aef0b11ba4fa4c2cd115c312dd9f0b64cad1bf1</id>
<content type='text'>
commit 6bfa687c19b7ab8adee03f0d43c197c2945dd869 upstream.

In 76854c7e8f3f4172fef091e78d88b3b751463ac6 ("sched: Use
rt.nr_cpus_allowed to recover select_task_rq() cycles") an
optimization was added to select_task_rq_rt() that immediately
returns when p-&gt;nr_cpus_allowed == 1 at the beginning of the
function.

This makes the latter p-&gt;nr_cpus_allowed &gt; 1 check redundant,
which can now be removed.

Signed-off-by: Shawn Bohrer &lt;sbohrer@rgmadvisors.com&gt;
Reviewed-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Mike Galbraith &lt;mgalbraith@suse.de&gt;
Cc: tomk@rgmadvisors.com
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1380914693-24634-1-git-send-email-shawn.bohrer@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>sched/rt: Add missing rmb()</title>
<updated>2014-03-12T12:25:39Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-10-15T10:35:07Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d171cbfc349b897d89ae4a0fe5fb75c4e1e65097'/>
<id>urn:sha1:d171cbfc349b897d89ae4a0fe5fb75c4e1e65097</id>
<content type='text'>
commit 7c3f2ab7b844f1a859afbc3d41925e8a0faba5fa upstream.

While discussing the proposed SCHED_DEADLINE patches which in parts
mimic the existing FIFO code it was noticed that the wmb in
rt_set_overloaded() didn't have a matching barrier.

The only site using rt_overloaded() to test the rto_count is
pull_rt_task() and we should issue a matching rmb before then assuming
there's an rto_mask bit set.

Without that smp_rmb() in there we could actually miss seeing the
rto_mask bit.

Also, change to using smp_[wr]mb(), even though this is SMP only code;
memory barriers without smp_ always make me think they're against
hardware of some sort.

Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: vincent.guittot@linaro.org
Cc: luca.abeni@unitn.it
Cc: bruce.ashfield@windriver.com
Cc: dhaval.giani@gmail.com
Cc: rostedt@goodmis.org
Cc: hgu1972@gmail.com
Cc: oleg@redhat.com
Cc: fweisbec@gmail.com
Cc: darren@dvhart.com
Cc: johan.eker@ericsson.com
Cc: p.faure@akatech.ch
Cc: paulmck@linux.vnet.ibm.com
Cc: raistlin@linux.it
Cc: claudio@evidence.eu.com
Cc: insop.song@gmail.com
Cc: michael@amarulasolutions.com
Cc: liming.wang@windriver.com
Cc: fchecconi@gmail.com
Cc: jkacur@redhat.com
Cc: tommaso.cucinotta@sssup.it
Cc: Juri Lelli &lt;juri.lelli@gmail.com&gt;
Cc: harald.gustafsson@ericsson.com
Cc: nicola.manica@disi.unitn.it
Cc: tglx@linutronix.de
Link: http://lkml.kernel.org/r/20131015103507.GF10651@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>sched: Assign correct scheduling domain to 'sd_llc'</title>
<updated>2014-03-12T12:25:39Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2013-12-17T09:21:25Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4edad9c1129ac6b3ad319009f005d20c6bfda3b9'/>
<id>urn:sha1:4edad9c1129ac6b3ad319009f005d20c6bfda3b9</id>
<content type='text'>
commit 5d4cf996cf134e8ddb4f906b8197feb9267c2b77 upstream.

Commit 42eb088e (sched: Avoid NULL dereference on sd_busy) corrected a NULL
dereference on sd_busy but the fix also altered what scheduling domain it
used for the 'sd_llc' percpu variable.

One impact of this is that a task selecting a runqueue may consider
idle CPUs that are not cache siblings as candidates for running.
Tasks are then running on CPUs that are not cache hot.

This was found through bisection where ebizzy threads were not seeing equal
performance and it looked like a scheduling fairness issue. This patch
mitigates but does not completely fix the problem on all machines tested
implying there may be an additional bug or a common root cause. Here are
the average range of performance seen by individual ebizzy threads. It
was tested on top of candidate patches related to x86 TLB range flushing.

	4-core machine
			    3.13.0-rc3            3.13.0-rc3
			       vanilla            fixsd-v3r3
	Mean   1        0.00 (  0.00%)        0.00 (  0.00%)
	Mean   2        0.34 (  0.00%)        0.10 ( 70.59%)
	Mean   3        1.29 (  0.00%)        0.93 ( 27.91%)
	Mean   4        7.08 (  0.00%)        0.77 ( 89.12%)
	Mean   5      193.54 (  0.00%)        2.14 ( 98.89%)
	Mean   6      151.12 (  0.00%)        2.06 ( 98.64%)
	Mean   7      115.38 (  0.00%)        2.04 ( 98.23%)
	Mean   8      108.65 (  0.00%)        1.92 ( 98.23%)

	8-core machine
	Mean   1         0.00 (  0.00%)        0.00 (  0.00%)
	Mean   2         0.40 (  0.00%)        0.21 ( 47.50%)
	Mean   3        23.73 (  0.00%)        0.89 ( 96.25%)
	Mean   4        12.79 (  0.00%)        1.04 ( 91.87%)
	Mean   5        13.08 (  0.00%)        2.42 ( 81.50%)
	Mean   6        23.21 (  0.00%)       69.46 (-199.27%)
	Mean   7        15.85 (  0.00%)      101.72 (-541.77%)
	Mean   8       109.37 (  0.00%)       19.13 ( 82.51%)
	Mean   12      124.84 (  0.00%)       28.62 ( 77.07%)
	Mean   16      113.50 (  0.00%)       24.16 ( 78.71%)

It's eliminated for one machine and reduced for another.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Alex Shi &lt;alex.shi@linaro.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Cc: H Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20131217092124.GV11295@suse.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>sched: Initialize power_orig for overlapping groups</title>
<updated>2014-03-12T12:25:38Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-12-11T10:09:53Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=8dc051a7e50b0bc7eae9a640334258e04a6d2fdd'/>
<id>urn:sha1:8dc051a7e50b0bc7eae9a640334258e04a6d2fdd</id>
<content type='text'>
commit 8e8339a3a1069141985daaa2521ba304509ddecd upstream.

Yinghai reported that he saw a /0 in sg_capacity on his EX parts.
Make sure to always initialize power_orig now that we actually use it.

Ideally build_sched_domains() -&gt; init_sched_groups_power() would also
initialize this; but for some yet unexplained reason some setups seem
to miss updates there.

Reported-by: Yinghai Lu &lt;yinghai@kernel.org&gt;
Tested-by: Yinghai Lu &lt;yinghai@kernel.org&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/n/tip-l8ng2m9uml6fhibln8wqpom7@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>sched: Avoid NULL dereference on sd_busy</title>
<updated>2014-03-12T12:25:38Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-11-19T15:41:49Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a2198407845fa9d85b590eafafacf3b6a82d3528'/>
<id>urn:sha1:a2198407845fa9d85b590eafafacf3b6a82d3528</id>
<content type='text'>
commit 42eb088ed246a5a817bb45a8b32fe234cf1c0f8b upstream.

Commit 37dc6b50cee9 ("sched: Remove unnecessary iteration over sched
domains to update nr_busy_cpus") forgot to clear 'sd_busy' under some
conditions leading to a possible NULL deref in set_cpu_sd_state_idle().

Reported-by: Anton Blanchard &lt;anton@samba.org&gt;
Cc: Preeti U Murthy &lt;preeti@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20131118113701.GF3866@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus</title>
<updated>2014-03-12T12:25:37Z</updated>
<author>
<name>Preeti U Murthy</name>
<email>preeti@linux.vnet.ibm.com</email>
</author>
<published>2013-10-30T03:12:52Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=ec6317cc7e5e333acdfa2426c96d372bfc920077'/>
<id>urn:sha1:ec6317cc7e5e333acdfa2426c96d372bfc920077</id>
<content type='text'>
commit 37dc6b50cee97954c4e6edcd5b1fa614b76038ee upstream.

nr_busy_cpus parameter is used by nohz_kick_needed() to find out the
number of busy cpus in a sched domain which has SD_SHARE_PKG_RESOURCES
flag set.  Therefore instead of updating nr_busy_cpus at every level
of sched domain, since it is irrelevant, we can update this parameter
only at the parent domain of the sd which has this flag set. Introduce
a per-cpu parameter sd_busy which represents this parent domain.

In nohz_kick_needed() we directly query the nr_busy_cpus parameter
associated with the groups of sd_busy.

By associating sd_busy with the highest domain which has
SD_SHARE_PKG_RESOURCES flag set, we cover all lower level domains
which could have this flag set and trigger nohz_idle_balancing if any
of the levels have more than one busy cpu.

sd_busy is irrelevant for asymmetric load balancing. However sd_asym
has been introduced to represent the highest sched domain which has
SD_ASYM_PACKING flag set so that it can be queried directly when
required.

While we are at it, we might as well change the nohz_idle parameter to
be updated at the sd_busy domain level alone and not the base domain
level of a CPU.  This will unify the concept of busy cpus at just one
level of sched domain where it is currently used.

Signed-off-by: Preeti U Murthy&lt;preeti@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: svaidy@linux.vnet.ibm.com
Cc: vincent.guittot@linaro.org
Cc: bitbucket@online.de
Cc: benh@kernel.crashing.org
Cc: anton@samba.org
Cc: Morten.Rasmussen@arm.com
Cc: pjt@google.com
Cc: peterz@infradead.org
Cc: mikey@neuling.org
Link: http://lkml.kernel.org/r/20131030031252.23426.4417.stgit@preeti.in.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>sched: Fix asymmetric scheduling for POWER7</title>
<updated>2014-03-12T12:25:36Z</updated>
<author>
<name>Vaidyanathan Srinivasan</name>
<email>svaidy@linux.vnet.ibm.com</email>
</author>
<published>2013-10-30T03:12:42Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=82a6be9ef57b19f710dbbeac568484ce701b2e26'/>
<id>urn:sha1:82a6be9ef57b19f710dbbeac568484ce701b2e26</id>
<content type='text'>
commit 2042abe7977222ef606306faa2dce8fd51e98e65 upstream.

Asymmetric scheduling within a core is a scheduler loadbalancing
feature that is triggered when SD_ASYM_PACKING flag is set.  The goal
for the load balancer is to move tasks to lower order idle SMT threads
within a core on a POWER7 system.

In nohz_kick_needed(), we intend to check if our sched domain (core)
is completely busy or we have idle cpu.

The following check for SD_ASYM_PACKING:

    (cpumask_first_and(nohz.idle_cpus_mask, sched_domain_span(sd)) &lt; cpu)

already covers the case of checking if the domain has an idle cpu,
because cpumask_first_and() will not yield any set bits if this domain
has no idle cpu.

Hence, nr_busy check against group weight can be removed.

Reported-by: Michael Neuling &lt;michael.neuling@au1.ibm.com&gt;
Signed-off-by: Vaidyanathan Srinivasan &lt;svaidy@linux.vnet.ibm.com&gt;
Signed-off-by: Preeti U Murthy &lt;preeti@linux.vnet.ibm.com&gt;
Tested-by: Michael Neuling &lt;mikey@neuling.org&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: vincent.guittot@linaro.org
Cc: bitbucket@online.de
Cc: benh@kernel.crashing.org
Cc: anton@samba.org
Cc: Morten.Rasmussen@arm.com
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131030031242.23426.13019.stgit@preeti.in.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>sched: Guarantee new group-entities always have weight</title>
<updated>2014-01-15T23:31:44Z</updated>
<author>
<name>Paul Turner</name>
<email>pjt@google.com</email>
</author>
<published>2013-10-16T18:16:27Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a6b79813f2005fa5e6e4de8f80b130c26502c781'/>
<id>urn:sha1:a6b79813f2005fa5e6e4de8f80b130c26502c781</id>
<content type='text'>
commit 0ac9b1c21874d2490331233b3242085f8151e166 upstream.

Currently, group entity load-weights are initialized to zero. This
admits some races with respect to the first time they are re-weighted in
earlty use. ( Let g[x] denote the se for "g" on cpu "x". )

Suppose that we have root-&gt;a and that a enters a throttled state,
immediately followed by a[0]-&gt;t1 (the only task running on cpu[0])
blocking:

  put_prev_task(group_cfs_rq(a[0]), t1)
  put_prev_entity(..., t1)
  check_cfs_rq_runtime(group_cfs_rq(a[0]))
  throttle_cfs_rq(group_cfs_rq(a[0]))

Then, before unthrottling occurs, let a[0]-&gt;b[0]-&gt;t2 wake for the first
time:

  enqueue_task_fair(rq[0], t2)
  enqueue_entity(group_cfs_rq(b[0]), t2)
  enqueue_entity_load_avg(group_cfs_rq(b[0]), t2)
  account_entity_enqueue(group_cfs_ra(b[0]), t2)
  update_cfs_shares(group_cfs_rq(b[0]))
  &lt; skipped because b is part of a throttled hierarchy &gt;
  enqueue_entity(group_cfs_rq(a[0]), b[0])
  ...

We now have b[0] enqueued, yet group_cfs_rq(a[0])-&gt;load.weight == 0
which violates invariants in several code-paths. Eliminate the
possibility of this by initializing group entity weight.

Signed-off-by: Paul Turner &lt;pjt@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20131016181627.22647.47543.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Chris J Arges &lt;chris.j.arges@canonical.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched: Fix hrtimer_cancel()/rq-&gt;lock deadlock</title>
<updated>2014-01-15T23:31:44Z</updated>
<author>
<name>Ben Segall</name>
<email>bsegall@google.com</email>
</author>
<published>2013-10-16T18:16:22Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4748ed5584fd8538b0e04baf9090f25c369291dc'/>
<id>urn:sha1:4748ed5584fd8538b0e04baf9090f25c369291dc</id>
<content type='text'>
commit 927b54fccbf04207ec92f669dce6806848cbec7d upstream.

__start_cfs_bandwidth calls hrtimer_cancel while holding rq-&gt;lock,
waiting for the hrtimer to finish. However, if sched_cfs_period_timer
runs for another loop iteration, the hrtimer can attempt to take
rq-&gt;lock, resulting in deadlock.

Fix this by ensuring that cfs_b-&gt;timer_active is cleared only if the
_latest_ call to do_sched_cfs_period_timer is returning as idle. Then
__start_cfs_bandwidth can just call hrtimer_try_to_cancel and wait for
that to succeed or timer_active == 1.

Signed-off-by: Ben Segall &lt;bsegall@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181622.22647.16643.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Chris J Arges &lt;chris.j.arges@canonical.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
