<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched, branch v3.4.84</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/kernel/sched?h=v3.4.84</id>
<link rel='self' href='https://git.amat.us/linux/atom/kernel/sched?h=v3.4.84'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2014-03-24T04:37:03Z</updated>
<entry>
<title>sched: Fix double normalization of vruntime</title>
<updated>2014-03-24T04:37:03Z</updated>
<author>
<name>George McCollister</name>
<email>george.mccollister@gmail.com</email>
</author>
<published>2014-02-18T23:56:51Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=6ba4d1d9112b3f6bf0f634870bb950b5b59fa86f'/>
<id>urn:sha1:6ba4d1d9112b3f6bf0f634870bb950b5b59fa86f</id>
<content type='text'>
commit 791c9e0292671a3bfa95286bb5c08129d8605618 upstream.

dequeue_entity() is called when p-&gt;on_rq and sets se-&gt;on_rq = 0
which appears to guarentee that the !se-&gt;on_rq condition is met.
If the task has done set_current_state(TASK_INTERRUPTIBLE) without
schedule() the second condition will be met and vruntime will be
incorrectly adjusted twice.

In certain cases this can result in the task's vruntime never increasing
past the vruntime of other tasks on the CFS' run queue, starving them of
CPU time.

This patch changes switched_from_fair() to use !p-&gt;on_rq instead of
!se-&gt;on_rq.

I'm able to cause a task with a priority of 120 to starve all other
tasks with the same priority on an ARM platform running 3.2.51-rt72
PREEMPT RT by writing one character at time to a serial tty (16550 UART)
in a tight loop. I'm also able to verify making this change corrects the
problem on that platform and kernel version.

Signed-off-by: George McCollister &lt;george.mccollister@gmail.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1392767811-28916-1-git-send-email-george.mccollister@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/nohz: Fix rq-&gt;cpu_load calculations some more</title>
<updated>2014-02-20T18:45:32Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2012-05-17T15:15:29Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f5a4c4b79e57f875b6788f6f8352ca246bfd8450'/>
<id>urn:sha1:f5a4c4b79e57f875b6788f6f8352ca246bfd8450</id>
<content type='text'>
commit 5aaa0b7a2ed5b12692c9ffb5222182bd558d3146 upstream.

Follow up on commit 556061b00 ("sched/nohz: Fix rq-&gt;cpu_load[]
calculations") since while that fixed the busy case it regressed the
mostly idle case.

Add a callback from the nohz exit to also age the rq-&gt;cpu_load[]
array. This closes the hole where either there was no nohz load
balance pass during the nohz, or there was a 'significant' amount of
idle time between the last nohz balance and the nohz exit.

So we'll update unconditionally from the tick to not insert any
accidental 0 load periods while busy, and we try and catch up from
nohz idle balance and nohz exit. Both these are still prone to missing
a jiffy, but that has always been the case.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: pjt@google.com
Cc: Venkatesh Pallipadi &lt;venki@google.com&gt;
Link: http://lkml.kernel.org/n/tip-kt0trz0apodbf84ucjfdbr1a@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/nohz: Fix rq-&gt;cpu_load[] calculations</title>
<updated>2014-02-20T18:45:32Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2012-05-11T15:31:26Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=e2d51f27e382be7b70a755f3ea2fbbeacdb50834'/>
<id>urn:sha1:e2d51f27e382be7b70a755f3ea2fbbeacdb50834</id>
<content type='text'>
commit 556061b00c9f2fd6a5524b6bde823ef12f299ecf upstream.

While investigating why the load-balancer did funny I found that the
rq-&gt;cpu_load[] tables were completely screwy.. a bit more digging
revealed that the updates that got through were missing ticks followed
by a catchup of 2 ticks.

The catchup assumes the cpu was idle during that time (since only nohz
can cause missed ticks and the machine is idle etc..) this means that
esp. the higher indices were significantly lower than they ought to
be.

The reason for this is that its not correct to compare against jiffies
on every jiffy on any other cpu than the cpu that updates jiffies.

This patch cludges around it by only doing the catch-up stuff from
nohz_idle_balance() and doing the regular stuff unconditionally from
the tick.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: pjt@google.com
Cc: Venkatesh Pallipadi &lt;venki@google.com&gt;
Link: http://lkml.kernel.org/n/tip-tp4kj18xdd5aj4vvj0qg55s2@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/rt: Avoid updating RT entry timeout twice within one tick period</title>
<updated>2014-02-13T19:51:19Z</updated>
<author>
<name>Ying Xue</name>
<email>ying.xue@windriver.com</email>
</author>
<published>2012-07-17T07:03:43Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=dbf3239455b155c3e72deacda93ef3a041e190c9'/>
<id>urn:sha1:dbf3239455b155c3e72deacda93ef3a041e190c9</id>
<content type='text'>
commit 57d2aa00dcec67afa52478730f2b524521af14fb upstream.

The issue below was found in 2.6.34-rt rather than mainline rt
kernel, but the issue still exists upstream as well.

So please let me describe how it was noticed on 2.6.34-rt:

On this version, each softirq has its own thread, it means there
is at least one RT FIFO task per cpu. The priority of these
tasks is set to 49 by default. If user launches an RT FIFO task
with priority lower than 49 of softirq RT tasks, it's possible
there are two RT FIFO tasks enqueued one cpu runqueue at one
moment. By current strategy of balancing RT tasks, when it comes
to RT tasks, we really need to put them off to a CPU that they
can run on as soon as possible. Even if it means a bit of cache
line flushing, we want RT tasks to be run with the least latency.

When the user RT FIFO task which just launched before is
running, the sched timer tick of the current cpu happens. In this
tick period, the timeout value of the user RT task will be
updated once. Subsequently, we try to wake up one softirq RT
task on its local cpu. As the priority of current user RT task
is lower than the softirq RT task, the current task will be
preempted by the higher priority softirq RT task. Before
preemption, we check to see if current can readily move to a
different cpu. If so, we will reschedule to allow the RT push logic
to try to move current somewhere else. Whenever the woken
softirq RT task runs, it first tries to migrate the user FIFO RT
task over to a cpu that is running a task of lesser priority. If
migration is done, it will send a reschedule request to the found
cpu by IPI interrupt. Once the target cpu responds the IPI
interrupt, it will pick the migrated user RT task to preempt its
current task. When the user RT task is running on the new cpu,
the sched timer tick of the cpu fires. So it will tick the user
RT task again. This also means the RT task timeout value will be
updated again. As the migration may be done in one tick period,
it means the user RT task timeout value will be updated twice
within one tick.

If we set a limit on the amount of cpu time for the user RT task
by setrlimit(RLIMIT_RTTIME), the SIGXCPU signal should be posted
upon reaching the soft limit.

But exactly when the SIGXCPU signal should be sent depends on the
RT task timeout value. In fact the timeout mechanism of sending
the SIGXCPU signal assumes the RT task timeout is increased once
every tick.

However, currently the timeout value may be added twice per
tick. So it results in the SIGXCPU signal being sent earlier
than expected.

To solve this issue, we prevent the timeout value from increasing
twice within one tick time by remembering the jiffies value of
last updating the timeout. As long as the RT task's jiffies is
different with the global jiffies value, we allow its timeout to
be updated.

Signed-off-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Fan Du &lt;fan.du@windriver.com&gt;
Reviewed-by: Yong Zhang &lt;yong.zhang0@gmail.com&gt;
Acked-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1342508623-2887-1-git-send-email-ying.xue@windriver.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[ lizf: backported to 3.4: adjust context ]
Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched: Unthrottle rt runqueues in __disable_runtime()</title>
<updated>2014-02-13T19:51:19Z</updated>
<author>
<name>Peter Boonstoppel</name>
<email>pboonstoppel@nvidia.com</email>
</author>
<published>2012-08-09T22:34:47Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f61eb9ceb26cee3fdbb8c7a4920f171f7661fb4f'/>
<id>urn:sha1:f61eb9ceb26cee3fdbb8c7a4920f171f7661fb4f</id>
<content type='text'>
commit a4c96ae319b8047f62dedbe1eac79e321c185749 upstream.

migrate_tasks() uses _pick_next_task_rt() to get tasks from the
real-time runqueues to be migrated. When rt_rq is throttled
_pick_next_task_rt() won't return anything, in which case
migrate_tasks() can't move all threads over and gets stuck in an
infinite loop.

Instead unthrottle rt runqueues before migrating tasks.

Additionally: move unthrottle_offline_cfs_rqs() to rq_offline_fair()

Signed-off-by: Peter Boonstoppel &lt;pboonstoppel@nvidia.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Link: http://lkml.kernel.org/r/5FBF8E85CA34454794F0F7ECBA79798F379D3648B7@HQMAIL04.nvidia.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[ lizf: backported to 3.4: adjust context ]
Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched,rt: fix isolated CPUs leaving root_task_group indefinitely throttled</title>
<updated>2014-02-13T19:51:18Z</updated>
<author>
<name>Mike Galbraith</name>
<email>efault@gmx.de</email>
</author>
<published>2012-08-07T08:02:38Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=1e5c13ec422f665432bfc9f7c5fc1f9fd614afd3'/>
<id>urn:sha1:1e5c13ec422f665432bfc9f7c5fc1f9fd614afd3</id>
<content type='text'>
commit e221d028bb08b47e624c5f0a31732c642db9d19a upstream.

Root task group bandwidth replenishment must service all CPUs, regardless of
where the timer was last started, and regardless of the isolation mechanism,
lest 'Quoth the Raven, "Nevermore"' become rt scheduling policy.

Signed-off-by: Mike Galbraith &lt;efault@gmx.de&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1344326558.6968.25.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/rt: Fix SCHED_RR across cgroups</title>
<updated>2014-02-13T19:51:18Z</updated>
<author>
<name>Colin Cross</name>
<email>ccross@android.com</email>
</author>
<published>2012-05-17T04:34:23Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=21b53baf40aecb134593ec74eb787f16c569cfc5'/>
<id>urn:sha1:21b53baf40aecb134593ec74eb787f16c569cfc5</id>
<content type='text'>
commit 454c79999f7eaedcdf4c15c449e43902980cbdf5 upstream.

task_tick_rt() has an optimization to only reschedule SCHED_RR tasks
if they were the only element on their rq.  However, with cgroups
a SCHED_RR task could be the only element on its per-cgroup rq but
still be competing with other SCHED_RR tasks in its parent's
cgroup.  In this case, the SCHED_RR task in the child cgroup would
never yield at the end of its timeslice.  If the child cgroup
rt_runtime_us was the same as the parent cgroup rt_runtime_us,
the task in the parent cgroup would starve completely.

Modify task_tick_rt() to check that the task is the only task on its
rq, and that the each of the scheduling entities of its ancestors
is also the only entity on its rq.

Signed-off-by: Colin Cross &lt;ccross@android.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1337229266-15798-1-git-send-email-ccross@android.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched: Guarantee new group-entities always have weight</title>
<updated>2014-01-15T23:27:12Z</updated>
<author>
<name>Paul Turner</name>
<email>pjt@google.com</email>
</author>
<published>2013-10-16T18:16:27Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9af6b695ac764184cdbdd42b77da3e600d337d14'/>
<id>urn:sha1:9af6b695ac764184cdbdd42b77da3e600d337d14</id>
<content type='text'>
commit 0ac9b1c21874d2490331233b3242085f8151e166 upstream.

Currently, group entity load-weights are initialized to zero. This
admits some races with respect to the first time they are re-weighted in
earlty use. ( Let g[x] denote the se for "g" on cpu "x". )

Suppose that we have root-&gt;a and that a enters a throttled state,
immediately followed by a[0]-&gt;t1 (the only task running on cpu[0])
blocking:

  put_prev_task(group_cfs_rq(a[0]), t1)
  put_prev_entity(..., t1)
  check_cfs_rq_runtime(group_cfs_rq(a[0]))
  throttle_cfs_rq(group_cfs_rq(a[0]))

Then, before unthrottling occurs, let a[0]-&gt;b[0]-&gt;t2 wake for the first
time:

  enqueue_task_fair(rq[0], t2)
  enqueue_entity(group_cfs_rq(b[0]), t2)
  enqueue_entity_load_avg(group_cfs_rq(b[0]), t2)
  account_entity_enqueue(group_cfs_ra(b[0]), t2)
  update_cfs_shares(group_cfs_rq(b[0]))
  &lt; skipped because b is part of a throttled hierarchy &gt;
  enqueue_entity(group_cfs_rq(a[0]), b[0])
  ...

We now have b[0] enqueued, yet group_cfs_rq(a[0])-&gt;load.weight == 0
which violates invariants in several code-paths. Eliminate the
possibility of this by initializing group entity weight.

Signed-off-by: Paul Turner &lt;pjt@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20131016181627.22647.47543.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Chris J Arges &lt;chris.j.arges@canonical.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched: Fix hrtimer_cancel()/rq-&gt;lock deadlock</title>
<updated>2014-01-15T23:27:11Z</updated>
<author>
<name>Ben Segall</name>
<email>bsegall@google.com</email>
</author>
<published>2013-10-16T18:16:22Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=03d35a39f4de9f2e9d7fbc5c2d03dbcc5b882df7'/>
<id>urn:sha1:03d35a39f4de9f2e9d7fbc5c2d03dbcc5b882df7</id>
<content type='text'>
commit 927b54fccbf04207ec92f669dce6806848cbec7d upstream.

__start_cfs_bandwidth calls hrtimer_cancel while holding rq-&gt;lock,
waiting for the hrtimer to finish. However, if sched_cfs_period_timer
runs for another loop iteration, the hrtimer can attempt to take
rq-&gt;lock, resulting in deadlock.

Fix this by ensuring that cfs_b-&gt;timer_active is cleared only if the
_latest_ call to do_sched_cfs_period_timer is returning as idle. Then
__start_cfs_bandwidth can just call hrtimer_try_to_cancel and wait for
that to succeed or timer_active == 1.

Signed-off-by: Ben Segall &lt;bsegall@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181622.22647.16643.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Chris J Arges &lt;chris.j.arges@canonical.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining</title>
<updated>2014-01-15T23:27:11Z</updated>
<author>
<name>Ben Segall</name>
<email>bsegall@google.com</email>
</author>
<published>2013-10-16T18:16:17Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9b318052acd24780f4dba1349ef3a30a7aef52ad'/>
<id>urn:sha1:9b318052acd24780f4dba1349ef3a30a7aef52ad</id>
<content type='text'>
commit db06e78cc13d70f10877e0557becc88ab3ad2be8 upstream.

hrtimer_expires_remaining does not take internal hrtimer locks and thus
must be guarded against concurrent __hrtimer_start_range_ns (but
returning HRTIMER_RESTART is safe). Use cfs_b-&gt;lock to make it safe.

Signed-off-by: Ben Segall &lt;bsegall@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181617.22647.73829.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Chris J Arges &lt;chris.j.arges@canonical.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
