<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel, branch v2.6.38.4</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/kernel?h=v2.6.38.4</id>
<link rel='self' href='https://git.amat.us/linux/atom/kernel?h=v2.6.38.4'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2011-04-21T21:32:52Z</updated>
<entry>
<title>next_pidmap: fix overflow condition</title>
<updated>2011-04-21T21:32:52Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-04-18T17:35:30Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=c8b265db1e63fdb296ec2721f5c1473c44bc4dc5'/>
<id>urn:sha1:c8b265db1e63fdb296ec2721f5c1473c44bc4dc5</id>
<content type='text'>
commit c78193e9c7bcbf25b8237ad0dec82f805c4ea69b upstream.

next_pidmap() just quietly accepted whatever 'last' pid that was passed
in, which is not all that safe when one of the users is /proc.

Admittedly the proc code should do some sanity checking on the range
(and that will be the next commit), but that doesn't mean that the
helper functions should just do that pidmap pointer arithmetic without
checking the range of its arguments.

So clamp 'last' to PID_MAX_LIMIT.  The fact that we then do "last+1"
doesn't really matter, the for-loop does check against the end of the
pidmap array properly (it's only the actual pointer arithmetic overflow
case we need to worry about, and going one bit beyond isn't going to
overflow).

[ Use PID_MAX_LIMIT rather than pid_max as per Eric Biederman ]

Reported-by: Tavis Ormandy &lt;taviso@cmpxchg8b.com&gt;
Analyzed-by: Robert Święcki &lt;robert@swiecki.net&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>sched: Fix erroneous all_pinned logic</title>
<updated>2011-04-21T21:32:48Z</updated>
<author>
<name>Ken Chen</name>
<email>kenchen@google.com</email>
</author>
<published>2011-04-08T19:20:16Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=97c5a1c2d96e5dffe7e0c93cd0f5a50038c8e9e3'/>
<id>urn:sha1:97c5a1c2d96e5dffe7e0c93cd0f5a50038c8e9e3</id>
<content type='text'>
commit b30aef17f71cf9e24b10c11cbb5e5f0ebe8a85ab upstream.

The scheduler load balancer has specific code to deal with cases of
unbalanced system due to lots of unmovable tasks (for example because of
hard CPU affinity). In those situation, it excludes the busiest CPU that
has pinned tasks for load balance consideration such that it can perform
second 2nd load balance pass on the rest of the system.

This all works as designed if there is only one cgroup in the system.

However, when we have multiple cgroups, this logic has false positives and
triggers multiple load balance passes despite there are actually no pinned
tasks at all.

The reason it has false positives is that the all pinned logic is deep in
the lowest function of can_migrate_task() and is too low level:

load_balance_fair() iterates each task group and calls balance_tasks() to
migrate target load. Along the way, balance_tasks() will also set a
all_pinned variable. Given that task-groups are iterated, this all_pinned
variable is essentially the status of last group in the scanning process.
Task group can have number of reasons that no load being migrated, none
due to cpu affinity. However, this status bit is being propagated back up
to the higher level load_balance(), which incorrectly think that no tasks
were moved.  It kick off the all pinned logic and start multiple passes
attempt to move load onto puller CPU.

To fix this, move the all_pinned aggregation up at the iterator level.
This ensures that the status is aggregated over all task-groups, not just
last one in the list.

Signed-off-by: Ken Chen &lt;kenchen@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/BANLkTi=ernzNawaR5tJZEsV_QVnfxqXmsQ@mail.gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>futex: Set FLAGS_HAS_TIMEOUT during futex_wait restart setup</title>
<updated>2011-04-21T21:32:41Z</updated>
<author>
<name>Darren Hart</name>
<email>dvhart@linux.intel.com</email>
</author>
<published>2011-04-14T22:41:57Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=db06752c7f64c445aeae2ae520a48c35e78789ed'/>
<id>urn:sha1:db06752c7f64c445aeae2ae520a48c35e78789ed</id>
<content type='text'>
commit 0cd9c6494ee5c19aef085152bc37f3a4e774a9e1 upstream.

The FLAGS_HAS_TIMEOUT flag was not getting set, causing the restart_block to
restart futex_wait() without a timeout after a signal.

Commit b41277dc7a18ee332d in 2.6.38 introduced the regression by accidentally
removing the the FLAGS_HAS_TIMEOUT assignment from futex_wait() during the setup
of the restart block. Restore the originaly behavior.

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=32922

Reported-by: Tim Smith &lt;tsmith201104@yahoo.com&gt;
Reported-by: Torsten Hilbrich &lt;torsten.hilbrich@secunet.com&gt;
Signed-off-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: John Kacur &lt;jkacur@redhat.com&gt;
Link: http://lkml.kernel.org/r/%3Cdaac0eb3af607f72b9a4d3126b2ba8fb5ed3b883.1302820917.git.dvhart%40linux.intel.com%3E
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>perf: Rebase max unprivileged mlock threshold on top of page size</title>
<updated>2011-04-14T20:02:14Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2011-03-31T01:33:29Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f8e3c0bb58a569babfb8d72e7e0655686dbcbab1'/>
<id>urn:sha1:f8e3c0bb58a569babfb8d72e7e0655686dbcbab1</id>
<content type='text'>
commit 20443384fe090c5f8aeb016e7e85659c5bbdd69f upstream.

Ensure we allow 512 kiB + 1 page for user control without
assuming a 4096 bytes page size.

Reported-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
LKML-Reference: &lt;1301535209-9679-1-git-send-email-fweisbec@gmail.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>perf: Fix task_struct reference leak</title>
<updated>2011-04-14T20:02:14Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-03-28T11:13:56Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=80b9edca1c11ec8118ab30451af9c1d492770c90'/>
<id>urn:sha1:80b9edca1c11ec8118ab30451af9c1d492770c90</id>
<content type='text'>
commit fd1edb3aa2c1d92618d8f0c6d15d44ea41fcac6a upstream.

sys_perf_event_open() had an imbalance in the number of task refs it
took causing memory leakage

Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;new-submission&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>Relax si_code check in rt_sigqueueinfo and rt_tgsigqueueinfo</title>
<updated>2011-04-14T20:02:02Z</updated>
<author>
<name>Roland Dreier</name>
<email>roland@purestorage.com</email>
</author>
<published>2011-03-28T21:13:35Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=76e61a187353bb6798284196b874938a8fbe3db9'/>
<id>urn:sha1:76e61a187353bb6798284196b874938a8fbe3db9</id>
<content type='text'>
commit 243b422af9ea9af4ead07a8ad54c90d4f9b6081a upstream.

Commit da48524eb206 ("Prevent rt_sigqueueinfo and rt_tgsigqueueinfo
from spoofing the signal code") made the check on si_code too strict.
There are several legitimate places where glibc wants to queue a
negative si_code different from SI_QUEUE:

 - This was first noticed with glibc's aio implementation, which wants
   to queue a signal with si_code SI_ASYNCIO; the current kernel
   causes glibc's tst-aio4 test to fail because rt_sigqueueinfo()
   fails with EPERM.

 - Further examination of the glibc source shows that getaddrinfo_a()
   wants to use SI_ASYNCNL (which the kernel does not even define).
   The timer_create() fallback code wants to queue signals with SI_TIMER.

As suggested by Oleg Nesterov &lt;oleg@redhat.com&gt;, loosen the check to
forbid only the problematic SI_TKILL case.

Reported-by: Klaus Dittrich &lt;kladit@arcor.de&gt;
Acked-by: Julien Tinnes &lt;jln@google.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>perf: Better fit max unprivileged mlock pages for tools needs</title>
<updated>2011-04-14T20:01:58Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2011-03-23T18:29:39Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=6c66673c8abb839c99b9692d02ec8032a11e67af'/>
<id>urn:sha1:6c66673c8abb839c99b9692d02ec8032a11e67af</id>
<content type='text'>
commit 880f57318450dbead6a03f9e31a1468924d6dd88 upstream.

The maximum kilobytes of locked memory that an unprivileged user
can reserve is of 512 kB = 128 pages by default, scaled to the
number of onlined CPUs, which fits well with the tools that use
128 data pages by default.

However tools actually use 129 pages, because they need one more
for the user control page. Thus the default mlock threshold is
not sufficient for the default tools needs and we always end up
to evaluate the constant mlock rlimit policy, which doesn't have
this scaling with the number of online CPUs.

Hence, on systems that have more than 16 CPUs, we overlap the
rlimit threshold and fail to mmap:

	$ perf record ls
	Error: failed to mmap with 1 (Operation not permitted)

Just increase the max unprivileged mlock threshold by one page
so that it supports well perf tools even after 16 CPUs.

Reported-by: Han Pingtian &lt;phan@redhat.com&gt;
Reported-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Reported-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Acked-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
LKML-Reference: &lt;1300904979-5508-1-git-send-email-fweisbec@gmail.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>perf: Fix tear-down of inherited group events</title>
<updated>2011-03-27T18:36:35Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-03-15T13:37:10Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=144a4ea712041e818df28a2bea81431143164da8'/>
<id>urn:sha1:144a4ea712041e818df28a2bea81431143164da8</id>
<content type='text'>
commit 38b435b16c36b0d863efcf3f07b34a6fac9873fd upstream.

When destroying inherited events, we need to destroy groups too,
otherwise the event iteration in perf_event_exit_task_context() will
miss group siblings and we leak events with all the consequences.

Reported-and-tested-by: Vince Weaver &lt;vweaver1@eecs.utk.edu&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;1300196470.2203.61.camel@twins&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>sysctl: restrict write access to dmesg_restrict</title>
<updated>2011-03-27T18:36:01Z</updated>
<author>
<name>Richard Weinberger</name>
<email>richard@nod.at</email>
</author>
<published>2011-03-23T23:43:11Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=bd118a66bddf45265832b8185fa177b3d5fd0200'/>
<id>urn:sha1:bd118a66bddf45265832b8185fa177b3d5fd0200</id>
<content type='text'>
commit bfdc0b497faa82a0ba2f9dddcf109231dd519fcc upstream.

When dmesg_restrict is set to 1 CAP_SYS_ADMIN is needed to read the kernel
ring buffer.  But a root user without CAP_SYS_ADMIN is able to reset
dmesg_restrict to 0.

This is an issue when e.g.  LXC (Linux Containers) are used and complete
user space is running without CAP_SYS_ADMIN.  A unprivileged and jailed
root user can bypass the dmesg_restrict protection.

With this patch writing to dmesg_restrict is only allowed when root has
CAP_SYS_ADMIN.

Signed-off-by: Richard Weinberger &lt;richard@nod.at&gt;
Acked-by: Dan Rosenberg &lt;drosenberg@vsecurity.com&gt;
Acked-by: Serge E. Hallyn &lt;serge@hallyn.com&gt;
Cc: Eric Paris &lt;eparis@redhat.com&gt;
Cc: Kees Cook &lt;kees.cook@canonical.com&gt;
Cc: James Morris &lt;jmorris@namei.org&gt;
Cc: Eugene Teo &lt;eugeneteo@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>Prevent rt_sigqueueinfo and rt_tgsigqueueinfo from spoofing the signal code</title>
<updated>2011-03-27T18:35:55Z</updated>
<author>
<name>Julien Tinnes</name>
<email>jln@google.com</email>
</author>
<published>2011-03-18T22:05:21Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=c93f57068049deb9907ca15ba2501bc5e382e513'/>
<id>urn:sha1:c93f57068049deb9907ca15ba2501bc5e382e513</id>
<content type='text'>
commit da48524eb20662618854bb3df2db01fc65f3070c upstream.

Userland should be able to trust the pid and uid of the sender of a
signal if the si_code is SI_TKILL.

Unfortunately, the kernel has historically allowed sigqueueinfo() to
send any si_code at all (as long as it was negative - to distinguish it
from kernel-generated signals like SIGILL etc), so it could spoof a
SI_TKILL with incorrect siginfo values.

Happily, it looks like glibc has always set si_code to the appropriate
SI_QUEUE, so there are probably no actual user code that ever uses
anything but the appropriate SI_QUEUE flag.

So just tighten the check for si_code (we used to allow any negative
value), and add a (one-time) warning in case there are binaries out
there that might depend on using other si_code values.

Signed-off-by: Julien Tinnes &lt;jln@google.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
</feed>
