<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/futex_compat.c, branch v2.6.27.59</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/kernel/futex_compat.c?h=v2.6.27.59</id>
<link rel='self' href='https://git.amat.us/linux/atom/kernel/futex_compat.c?h=v2.6.27.59'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2008-03-30T21:18:41Z</updated>
<entry>
<title>futex_compat __user annotation</title>
<updated>2008-03-30T21:18:41Z</updated>
<author>
<name>Al Viro</name>
<email>viro@ftp.linux.org.uk</email>
</author>
<published>2008-03-29T03:07:58Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=8481664d373e7e2cea3ea0c2d7a06c9e939b19ee'/>
<id>urn:sha1:8481664d373e7e2cea3ea0c2d7a06c9e939b19ee</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>futex: runtime enable pi and robust functionality</title>
<updated>2008-02-24T01:12:15Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2008-02-23T23:23:57Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a0c1e9073ef7428a14309cba010633a6cd6719ea'/>
<id>urn:sha1:a0c1e9073ef7428a14309cba010633a6cd6719ea</id>
<content type='text'>
Not all architectures implement futex_atomic_cmpxchg_inatomic().  The default
implementation returns -ENOSYS, which is currently not handled inside of the
futex guts.

Futex PI calls and robust list exits with a held futex result in an endless
loop in the futex code on architectures which have no support.

Fixing up every place where futex_atomic_cmpxchg_inatomic() is called would
add a fair amount of extra if/else constructs to the already complex code.  It
is also not possible to disable the robust feature before user space tries to
register robust lists.

Compile time disabling is not a good idea either, as there are already
architectures with runtime detection of futex_atomic_cmpxchg_inatomic support.

Detect the functionality at runtime instead by calling
cmpxchg_futex_value_locked() with a NULL pointer from the futex initialization
code.  This is guaranteed to fail, but the call of
futex_atomic_cmpxchg_inatomic() happens with pagefaults disabled.

On architectures, which use the asm-generic implementation or have a runtime
CPU feature detection, a -ENOSYS return value disables the PI/robust features.

On architectures with a working implementation the call returns -EFAULT and
the PI/robust features are enabled.

The relevant syscalls return -ENOSYS and the robust list exit code is blocked,
when the detection fails.

Fixes http://lkml.org/lkml/2008/2/11/149
Originally reported by: Lennart Buytenhek

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Lennert Buytenhek &lt;buytenh@wantstofly.org&gt;
Cc: Riku Voipio &lt;riku.voipio@movial.fi&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>hrtimer: check relative timeouts for overflow</title>
<updated>2008-02-14T21:08:30Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2008-02-13T08:20:43Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=5a7780e725d1bb4c3094fcc12f1c5c5faea1e988'/>
<id>urn:sha1:5a7780e725d1bb4c3094fcc12f1c5c5faea1e988</id>
<content type='text'>
Various user space callers ask for relative timeouts. While we fixed
that overflow issue in hrtimer_start(), the sites which convert
relative user space values to absolute timeouts themself were uncovered.

Instead of putting overflow checks into each place add a function
which does the sanity checking and convert all affected callers to use
it.

Thanks to Frans Pop, who reported the problem and tested the fixes.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Tested-by: Frans Pop &lt;elendil@planet.nl&gt;

</content>
</entry>
<entry>
<title>futex: Add bitset conditional wait/wakeup functionality</title>
<updated>2008-02-01T16:45:14Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2008-02-01T16:45:14Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=cd689985cf49f6ff5c8eddc48d98b9d581d9475d'/>
<id>urn:sha1:cd689985cf49f6ff5c8eddc48d98b9d581d9475d</id>
<content type='text'>
To allow the implementation of optimized rw-locks in user space, glibc
needs a possibility to select waiters for wakeup depending on a bitset
mask.

This requires two new futex OPs: FUTEX_WAIT_BITS and FUTEX_WAKE_BITS
These OPs are basically the same as FUTEX_WAIT and FUTEX_WAKE plus an
additional argument - a bitset. Further the FUTEX_WAIT_BITS OP is
expecting an absolute timeout value instead of the relative one, which
is used for the FUTEX_WAIT OP.

FUTEX_WAIT_BITS calls into the kernel with a bitset. The bitset is
stored in the futex_q structure, which is used to enqueue the waiter
into the hashed futex waitqueue.

FUTEX_WAKE_BITS also calls into the kernel with a bitset. The wakeup
function logically ANDs the bitset with the bitset stored in each
waiters futex_q structure. If the result is zero (i.e. none of the set
bits in the bitsets is matching), then the waiter is not woken up. If
the result is not zero (i.e. one of the set bits in the bitsets is
matching), then the waiter is woken.

The bitset provided by the caller must be non zero. In case the
provided bitset is zero the kernel returns EINVAL.

Internaly the new OPs are only extensions to the existing FUTEX_WAIT
and FUTEX_WAKE functions. The existing OPs hand a bitset with all bits
set into the futex_wait() and futex_wake() functions.

Signed-off-by: Thomas Gleixner &lt;tgxl@linutronix.de&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>[FUTEX] Fix address computation in compat code.</title>
<updated>2007-11-10T00:13:08Z</updated>
<author>
<name>David Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2007-11-07T05:13:56Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3c5fd9c77d609b51c0bab682c9d40cbb496ec6f1'/>
<id>urn:sha1:3c5fd9c77d609b51c0bab682c9d40cbb496ec6f1</id>
<content type='text'>
compat_exit_robust_list() computes a pointer to the
futex entry in userspace as follows:

	(void __user *)entry + futex_offset

'entry' is a 'struct robust_list __user *', and
'futex_offset' is a 'compat_long_t' (typically a 's32').

Things explode if the 32-bit sign bit is set in futex_offset.

Type promotion sign extends futex_offset to a 64-bit value before
adding it to 'entry'.

This triggered a problem on sparc64 running 32-bit applications which
would lock up a cpu looping forever in the fault handling for the
userspace load in handle_futex_death().

Compat userspace runs with address masking (wherein the cpu zeros out
the top 32-bits of every effective address given to a memory operation
instruction) so the sparc64 fault handler accounts for this by
zero'ing out the top 32-bits of the fault address too.

Since the kernel properly uses the compat_uptr interfaces, kernel side
accesses to compat userspace work too since they will only use
addresses with the top 32-bit clear.

Because of this compat futex layer bug we get into the following loop
when executing the get_user() load near the top of handle_futex_death():

1) load from address '0xfffffffff7f16bd8', FAULT
2) fault handler clears upper 32-bits, processes fault
   for address '0xf7f16bd8' which succeeds
3) goto #1

I want to thank Bernd Zeimetz, Josip Rodin, and Fabio Massimo Di Nitto
for their tireless efforts helping me track down this bug.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Uninline find_task_by_xxx set of functions</title>
<updated>2007-10-19T18:53:40Z</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@openvz.org</email>
</author>
<published>2007-10-19T06:40:16Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=228ebcbe634a30aec35132ea4375721bcc41bec0'/>
<id>urn:sha1:228ebcbe634a30aec35132ea4375721bcc41bec0</id>
<content type='text'>
The find_task_by_something is a set of macros are used to find task by pid
depending on what kind of pid is proposed - global or virtual one.  All of
them are wrappers above the most generic one - find_task_by_pid_type_ns() -
and just substitute some args for it.

It turned out, that dereferencing the current-&gt;nsproxy-&gt;pid_ns construction
and pushing one more argument on the stack inline cause kernel text size to
grow.

This patch moves all this stuff out-of-line into kernel/pid.c.  Together
with the next patch it saves a bit less than 400 bytes from the .text
section.

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pid namespaces: changes to show virtual ids to user</title>
<updated>2007-10-19T18:53:40Z</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@openvz.org</email>
</author>
<published>2007-10-19T06:40:14Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=b488893a390edfe027bae7a46e9af8083e740668'/>
<id>urn:sha1:b488893a390edfe027bae7a46e9af8083e740668</id>
<content type='text'>
This is the largest patch in the set. Make all (I hope) the places where
the pid is shown to or get from user operate on the virtual pids.

The idea is:
 - all in-kernel data structures must store either struct pid itself
   or the pid's global nr, obtained with pid_nr() call;
 - when seeking the task from kernel code with the stored id one
   should use find_task_by_pid() call that works with global pids;
 - when showing pid's numerical value to the user the virtual one
   should be used, but however when one shows task's pid outside this
   task's namespace the global one is to be used;
 - when getting the pid from userspace one need to consider this as
   the virtual one and use appropriate task/pid-searching functions.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: yet nuther build fix]
[akpm@linux-foundation.org: remove unneeded casts]
Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Signed-off-by: Alexey Dobriyan &lt;adobriyan@openvz.org&gt;
Cc: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>robust futex thread exit race</title>
<updated>2007-10-01T14:52:23Z</updated>
<author>
<name>Martin Schwidefsky</name>
<email>schwidefsky@de.ibm.com</email>
</author>
<published>2007-10-01T08:20:13Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9f96cb1e8bca179a92afa40dfc3c49990f1cfc71'/>
<id>urn:sha1:9f96cb1e8bca179a92afa40dfc3c49990f1cfc71</id>
<content type='text'>
Calling handle_futex_death in exit_robust_list for the different robust
mutexes of a thread basically frees the mutex.  Another thread might grab
the lock immediately which updates the next pointer of the mutex.
fetch_robust_entry over the next pointer might therefore branch into the
robust mutex list of a different thread.  This can cause two problems: 1)
some mutexes held by the dead thread are not getting freed and 2) some
mutexs held by a different thread are freed.

The next point need to be read before calling handle_futex_death.

Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>futex_compat: fix list traversal bugs</title>
<updated>2007-09-12T00:21:20Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2007-09-11T22:23:49Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=179c85ea53bef807621f335767e41e23f86f01df'/>
<id>urn:sha1:179c85ea53bef807621f335767e41e23f86f01df</id>
<content type='text'>
The futex list traversal on the compat side appears to have
a bug.

It's loop termination condition compares:

        while (compat_ptr(uentry) != &amp;head-&gt;list)

But that can't be right because "uentry" has the special
"pi" indicator bit still potentially set at bit 0.  This
is cleared by fetch_robust_entry() into the "entry"
return value.

What this seems to mean is that the list won't terminate
when list iteration gets back to the the head.  And we'll
also process the list head like a normal entry, which could
cause all kinds of problems.

So we should check for equality with "entry".  That pointer
is of the non-compat type so we have to do a little casting
to keep the compiler and sparse happy.

The same problem can in theory occur with the 'pending'
variable, although that has not been reported from users
so far.

Based on the original patch from David Miller.

Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "futex_requeue_pi optimization"</title>
<updated>2007-06-18T16:48:41Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2007-06-17T19:11:10Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=bd197234b0a616c8f04f6b682326a5a24b33ca92'/>
<id>urn:sha1:bd197234b0a616c8f04f6b682326a5a24b33ca92</id>
<content type='text'>
This reverts commit d0aa7a70bf03b9de9e995ab272293be1f7937822.

It not only introduced user space visible changes to the futex syscall,
it is also non-functional and there is no way to fix it proper before
the 2.6.22 release.

The breakage report ( http://lkml.org/lkml/2007/5/12/17 ) went
unanswered, and unfortunately it turned out that the concept is not
feasible at all.  It violates the rtmutex semantics badly by introducing
a virtual owner, which hacks around the coupling of the user-space
pi_futex and the kernel internal rt_mutex representation.

At the moment the only safe option is to remove it fully as it contains
user-space visible changes to broken kernel code, which we do not want
to expose in the 2.6.22 release.

The patch reverts the original patch mostly 1:1, but contains a couple
of trivial manual cleanups which were necessary due to patches, which
touched the same area of code later.

Verified against the glibc tests and my own PI futex tests.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Acked-by: Ulrich Drepper &lt;drepper@redhat.com&gt;
Cc: Pierre Peiffer &lt;pierre.peiffer@bull.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
