<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel, branch v3.4.19</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/kernel?h=v3.4.19</id>
<link rel='self' href='https://git.amat.us/linux/atom/kernel?h=v3.4.19'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2012-11-17T21:16:22Z</updated>
<entry>
<title>futex: Handle futex_pi OWNER_DIED take over correctly</title>
<updated>2012-11-17T21:16:22Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2012-10-23T20:29:38Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=c4cbedfda2227df82126c9dd5e7593565bf45d21'/>
<id>urn:sha1:c4cbedfda2227df82126c9dd5e7593565bf45d21</id>
<content type='text'>
commit 59fa6245192159ab5e1e17b8e31f15afa9cff4bf upstream.

Siddhesh analyzed a failure in the take over of pi futexes in case the
owner died and provided a workaround.
See: http://sourceware.org/bugzilla/show_bug.cgi?id=14076

The detailed problem analysis shows:

Futex F is initialized with PTHREAD_PRIO_INHERIT and
PTHREAD_MUTEX_ROBUST_NP attributes.

T1 lock_futex_pi(F);

T2 lock_futex_pi(F);
   --&gt; T2 blocks on the futex and creates pi_state which is associated
       to T1.

T1 exits
   --&gt; exit_robust_list() runs
       --&gt; Futex F userspace value TID field is set to 0 and
           FUTEX_OWNER_DIED bit is set.

T3 lock_futex_pi(F);
   --&gt; Succeeds due to the check for F's userspace TID field == 0
   --&gt; Claims ownership of the futex and sets its own TID into the
       userspace TID field of futex F
   --&gt; returns to user space

T1 --&gt; exit_pi_state_list()
       --&gt; Transfers pi_state to waiter T2 and wakes T2 via
       	   rt_mutex_unlock(&amp;pi_state-&gt;mutex)

T2 --&gt; acquires pi_state-&gt;mutex and gains real ownership of the
       pi_state
   --&gt; Claims ownership of the futex and sets its own TID into the
       userspace TID field of futex F
   --&gt; returns to user space

T3 --&gt; observes inconsistent state

This problem is independent of UP/SMP, preemptible/non preemptible
kernels, or process shared vs. private. The only difference is that
certain configurations are more likely to expose it.

So as Siddhesh correctly analyzed the following check in
futex_lock_pi_atomic() is the culprit:

	if (unlikely(ownerdied || !(curval &amp; FUTEX_TID_MASK))) {

We check the userspace value for a TID value of 0 and take over the
futex unconditionally if that's true.

AFAICT this check is there as it is correct for a different corner
case of futexes: the WAITERS bit became stale.

Now the proposed change

-	if (unlikely(ownerdied || !(curval &amp; FUTEX_TID_MASK))) {
+       if (unlikely(ownerdied ||
+                       !(curval &amp; (FUTEX_TID_MASK | FUTEX_WAITERS)))) {

solves the problem, but it's not obvious why and it wreckages the
"stale WAITERS bit" case.

What happens is, that due to the WAITERS bit being set (T2 is blocked
on that futex) it enforces T3 to go through lookup_pi_state(), which
in the above case returns an existing pi_state and therefor forces T3
to legitimately fight with T2 over the ownership of the pi_state (via
pi_state-&gt;mutex). Probelm solved!

Though that does not work for the "WAITERS bit is stale" problem
because if lookup_pi_state() does not find existing pi_state it
returns -ERSCH (due to TID == 0) which causes futex_lock_pi() to
return -ESRCH to user space because the OWNER_DIED bit is not set.

Now there is a different solution to that problem. Do not look at the
user space value at all and enforce a lookup of possibly available
pi_state. If pi_state can be found, then the new incoming locker T3
blocks on that pi_state and legitimately races with T2 to acquire the
rt_mutex and the pi_state and therefor the proper ownership of the
user space futex.

lookup_pi_state() has the correct order of checks. It first tries to
find a pi_state associated with the user space futex and only if that
fails it checks for futex TID value = 0. If no pi_state is available
nothing can create new state at that point because this happens with
the hash bucket lock held.

So the above scenario changes to:

T1 lock_futex_pi(F);

T2 lock_futex_pi(F);
   --&gt; T2 blocks on the futex and creates pi_state which is associated
       to T1.

T1 exits
   --&gt; exit_robust_list() runs
       --&gt; Futex F userspace value TID field is set to 0 and
           FUTEX_OWNER_DIED bit is set.

T3 lock_futex_pi(F);
   --&gt; Finds pi_state and blocks on pi_state-&gt;rt_mutex

T1 --&gt; exit_pi_state_list()
       --&gt; Transfers pi_state to waiter T2 and wakes it via
       	   rt_mutex_unlock(&amp;pi_state-&gt;mutex)

T2 --&gt; acquires pi_state-&gt;mutex and gains ownership of the pi_state
   --&gt; Claims ownership of the futex and sets its own TID into the
       userspace TID field of futex F
   --&gt; returns to user space

This covers all gazillion points on which T3 might come in between
T1's exit_robust_list() clearing the TID field and T2 fixing it up. It
also solves the "WAITERS bit stale" problem by forcing the take over.

Another benefit of changing the code this way is that it makes it less
dependent on untrusted user space values and therefor minimizes the
possible wreckage which might be inflicted.

As usual after staring for too long at the futex code my brain hurts
so much that I really want to ditch that whole optimization of
avoiding the syscall for the non contended case for PI futexes and rip
out the maze of corner case handling code. Unfortunately we can't as
user space relies on that existing behaviour, but at least thinking
about it helps me to preserve my mental sanity. Maybe we should
nevertheless :)

Reported-and-tested-by: Siddhesh Poyarekar &lt;siddhesh.poyarekar@gmail.com&gt;
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1210232138540.2756@ionos
Acked-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Revert "cgroup: Drop task_lock(parent) on cgroup_fork()"</title>
<updated>2012-10-28T17:14:14Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-10-19T00:52:07Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=727fa37e405a71dc1c7208ea7352be55e0310eca'/>
<id>urn:sha1:727fa37e405a71dc1c7208ea7352be55e0310eca</id>
<content type='text'>
commit 9bb71308b8133d643648776243e4d5599b1c193d upstream.

This reverts commit 7e381b0eb1e1a9805c37335562e8dc02e7d7848c.

The commit incorrectly assumed that fork path always performed
threadgroup_change_begin/end() and depended on that for
synchronization against task exit and cgroup migration paths instead
of explicitly grabbing task_lock().

threadgroup_change is not locked when forking a new process (as
opposed to a new thread in the same process) and even if it were it
wouldn't be effective as different processes use different threadgroup
locks.

Revert the incorrect optimization.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
LKML-Reference: &lt;20121008020000.GB2575@localhost&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Bitterly-Acked-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Revert "cgroup: Remove task_lock() from cgroup_post_fork()"</title>
<updated>2012-10-28T17:14:14Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-10-19T00:40:30Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=5993bba302a1a6df9c73690a08346411ff7cd56e'/>
<id>urn:sha1:5993bba302a1a6df9c73690a08346411ff7cd56e</id>
<content type='text'>
commit d87838321124061f6c935069d97f37010fa417e6 upstream.

This reverts commit 7e3aa30ac8c904a706518b725c451bb486daaae9.

The commit incorrectly assumed that fork path always performed
threadgroup_change_begin/end() and depended on that for
synchronization against task exit and cgroup migration paths instead
of explicitly grabbing task_lock().

threadgroup_change is not locked when forking a new process (as
opposed to a new thread in the same process) and even if it were it
wouldn't be effective as different processes use different threadgroup
locks.

Revert the incorrect optimization.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
LKML-Reference: &lt;20121008020000.GB2575@localhost&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>cgroup: notify_on_release may not be triggered in some cases</title>
<updated>2012-10-28T17:14:14Z</updated>
<author>
<name>Daisuke Nishimura</name>
<email>nishimura@mxp.nes.nec.co.jp</email>
</author>
<published>2012-10-04T07:37:16Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9a55fef20afb3d7362b0c9d19ba1e78dfe482d3d'/>
<id>urn:sha1:9a55fef20afb3d7362b0c9d19ba1e78dfe482d3d</id>
<content type='text'>
commit 1f5320d5972aa50d3e8d2b227b636b370e608359 upstream.

notify_on_release must be triggered when the last process in a cgroup is
move to another. But if the first(and only) process in a cgroup is moved to
another, notify_on_release is not triggered.

	# mkdir /cgroup/cpu/SRC
	# mkdir /cgroup/cpu/DST
	#
	# echo 1 &gt;/cgroup/cpu/SRC/notify_on_release
	# echo 1 &gt;/cgroup/cpu/DST/notify_on_release
	#
	# sleep 300 &amp;
	[1] 8629
	#
	# echo 8629 &gt;/cgroup/cpu/SRC/tasks
	# echo 8629 &gt;/cgroup/cpu/DST/tasks
	-&gt; notify_on_release for /SRC must be triggered at this point,
	   but it isn't.

This is because put_css_set() is called before setting CGRP_RELEASABLE
in cgroup_task_migrate(), and is a regression introduce by the
commit:74a1166d(cgroups: make procs file writable), which was merged
into v3.0.

Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Ben Blum &lt;bblum@andrew.cmu.edu&gt;
Signed-off-by: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>use clamp_t in UNAME26 fix</title>
<updated>2012-10-28T17:14:13Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2012-10-20T01:45:53Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3680030e9747dfa51fe9c1850361c29ae0ad20fe'/>
<id>urn:sha1:3680030e9747dfa51fe9c1850361c29ae0ad20fe</id>
<content type='text'>
commit 31fd84b95eb211d5db460a1dda85e004800a7b52 upstream.

The min/max call needed to have explicit types on some architectures
(e.g. mn10300). Use clamp_t instead to avoid the warning:

  kernel/sys.c: In function 'override_release':
  kernel/sys.c:1287:10: warning: comparison of distinct pointer types lacks a cast [enabled by default]

Reported-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>kernel/sys.c: fix stack memory content leak via UNAME26</title>
<updated>2012-10-28T17:14:13Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2012-10-19T20:56:51Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=643fde8ed4ea75f5d113803084806eff14d97454'/>
<id>urn:sha1:643fde8ed4ea75f5d113803084806eff14d97454</id>
<content type='text'>
commit 2702b1526c7278c4d65d78de209a465d4de2885e upstream.

Calling uname() with the UNAME26 personality set allows a leak of kernel
stack contents.  This fixes it by defensively calculating the length of
copy_to_user() call, making the len argument unsigned, and initializing
the stack buffer to zero (now technically unneeded, but hey, overkill).

CVE-2012-0957

Reported-by: PaX Team &lt;pageexec@freemail.hu&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: PaX Team &lt;pageexec@freemail.hu&gt;
Cc: Brad Spengler &lt;spender@grsecurity.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>kdb,vt_console: Fix missed data due to pager overruns</title>
<updated>2012-10-21T16:27:59Z</updated>
<author>
<name>Jason Wessel</name>
<email>jason.wessel@windriver.com</email>
</author>
<published>2012-08-27T03:37:03Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=c4c493a4adcee75e1e44af044d0b7fc1b5192b61'/>
<id>urn:sha1:c4c493a4adcee75e1e44af044d0b7fc1b5192b61</id>
<content type='text'>
commit 17b572e82032bc246324ce136696656b66d4e3f1 upstream.

It is possible to miss data when using the kdb pager.  The kdb pager
does not pay attention to the maximum column constraint of the screen
or serial terminal.  This result is not incrementing the shown lines
correctly and the pager will print more lines that fit on the screen.
Obviously that is less than useful when using a VGA console where you
cannot scroll back.

The pager will now look at the kdb_buffer string to see how many
characters are printed.  It might not be perfect considering you can
output ASCII that might move the cursor position, but it is a
substantially better approximation for viewing dmesg and trace logs.

This also means that the vt screen needs to set the kdb COLUMNS
variable.

Signed-off-by: Jason Wessel &lt;jason.wessel@windriver.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>timers: Fix endless looping between cascade() and internal_add_timer()</title>
<updated>2012-10-21T16:27:59Z</updated>
<author>
<name>Hildner, Christian</name>
<email>christian.hildner@siemens.com</email>
</author>
<published>2012-10-08T13:49:03Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=00dff266efe889d4fc6569f3e39ae7195cb83da7'/>
<id>urn:sha1:00dff266efe889d4fc6569f3e39ae7195cb83da7</id>
<content type='text'>
commit 26cff4e2aa4d666dc6a120ea34336b5057e3e187 upstream.

Adding two (or more) timers with large values for "expires" (they have
to reside within tv5 in the same list) leads to endless looping
between cascade() and internal_add_timer() in case CONFIG_BASE_SMALL
is one and jiffies are crossing the value 1 &lt;&lt; 18. The bug was
introduced between 2.6.11 and 2.6.12 (and survived for quite some
time).

This patch ensures that when cascade() is called timers within tv5 are
not added endlessly to their own list again, instead they are added to
the next lower tv level tv4 (as expected).

Signed-off-by: Christian Hildner &lt;christian.hildner@siemens.com&gt;
Reviewed-by: Jan Kiszka &lt;jan.kiszka@siemens.com&gt;
Link: http://lkml.kernel.org/r/98673C87CB31274881CFFE0B65ECC87B0F5FC1963E@DEFTHW99EA4MSX.ww902.siemens.net
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>module: taint kernel when lve module is loaded</title>
<updated>2012-10-21T16:27:59Z</updated>
<author>
<name>Matthew Garrett</name>
<email>mjg59@srcf.ucam.org</email>
</author>
<published>2012-06-22T17:49:31Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=08de372d6c39473773fc303b9253a2be938badbd'/>
<id>urn:sha1:08de372d6c39473773fc303b9253a2be938badbd</id>
<content type='text'>
commit c99af3752bb52ba3aece5315279a57a477edfaf1 upstream.

Cloudlinux have a product called lve that includes a kernel module. This
was previously GPLed but is now under a proprietary license, but the
module continues to declare MODULE_LICENSE("GPL") and makes use of some
EXPORT_SYMBOL_GPL symbols. Forcibly taint it in order to avoid this.

Signed-off-by: Matthew Garrett &lt;mjg59@srcf.ucam.org&gt;
Cc: Alex Lyashkov &lt;umka@cloudlinux.com&gt;
Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched: Fix migration thread runtime bogosity</title>
<updated>2012-10-12T20:39:01Z</updated>
<author>
<name>Mike Galbraith</name>
<email>mgalbraith@suse.de</email>
</author>
<published>2012-08-04T03:44:14Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=93875bc33f71bf7a4851b44d387743866b13b130'/>
<id>urn:sha1:93875bc33f71bf7a4851b44d387743866b13b130</id>
<content type='text'>
commit 8f6189684eb4e85e6c593cd710693f09c944450a upstream.

Make stop scheduler class do the same accounting as other classes,

Migration threads can be caught in the act while doing exec balancing,
leading to the below due to use of unmaintained -&gt;se.exec_start.  The
load that triggered this particular instance was an apparently out of
control heavily threaded application that does system monitoring in
what equated to an exec bomb, with one of the VERY frequently migrated
tasks being ps.

%CPU   PID USER     CMD
99.3    45 root     [migration/10]
97.7    53 root     [migration/12]
97.0    57 root     [migration/13]
90.1    49 root     [migration/11]
89.6    65 root     [migration/15]
88.7    17 root     [migration/3]
80.4    37 root     [migration/8]
78.1    41 root     [migration/9]
44.2    13 root     [migration/2]

Signed-off-by: Mike Galbraith &lt;mgalbraith@suse.de&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1344051854.6739.19.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
