<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/ipc, branch v3.11.5</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/ipc?h=v3.11.5</id>
<link rel='self' href='https://git.amat.us/linux/atom/ipc?h=v3.11.5'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2013-10-14T01:14:29Z</updated>
<entry>
<title>ipc: fix race with LSMs</title>
<updated>2013-10-14T01:14:29Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr@hp.com</email>
</author>
<published>2013-09-24T00:04:45Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=38d7d8fe82963c73447177676d32494140652690'/>
<id>urn:sha1:38d7d8fe82963c73447177676d32494140652690</id>
<content type='text'>
commit 53dad6d3a8e5ac1af8bacc6ac2134ae1a8b085f1 upstream.

Currently, IPC mechanisms do security and auditing related checks under
RCU.  However, since security modules can free the security structure,
for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
race if the structure is freed before other tasks are done with it,
creating a use-after-free condition.  Manfred illustrates this nicely,
for instance with shared mem and selinux:

 -&gt; do_shmat calls rcu_read_lock()
 -&gt; do_shmat calls shm_object_check().
     Checks that the object is still valid - but doesn't acquire any locks.
     Then it returns.
 -&gt; do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
 -&gt; selinux_shm_shmat calls ipc_has_perm()
 -&gt; ipc_has_perm accesses ipc_perms-&gt;security

shm_close()
 -&gt; shm_close acquires rw_mutex &amp; shm_lock
 -&gt; shm_close calls shm_destroy
 -&gt; shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
 -&gt; selinux_shm_free_security calls ipc_free_security(&amp;shp-&gt;shm_perm)
 -&gt; ipc_free_security calls kfree(ipc_perms-&gt;security)

This patch delays the freeing of the security structures after all RCU
readers are done.  Furthermore it aligns the security life cycle with
that of the rest of IPC - freeing them based on the reference counter.
For situations where we need not free security, the current behavior is
kept.  Linus states:

 "... the old behavior was suspect for another reason too: having the
  security blob go away from under a user sounds like it could cause
  various other problems anyway, so I think the old code was at least
  _prone_ to bugs even if it didn't have catastrophic behavior."

I have tested this patch with IPC testcases from LTP on both my
quad-core laptop and on a 64 core NUMA server.  In both cases selinux is
enabled, and tests pass for both voluntary and forced preemption models.
While the mentioned races are theoretical (at least no one as reported
them), I wanted to make sure that this new logic doesn't break anything
we weren't aware of.

Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Acked-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>ipc,msg: prevent race with rmid in msgsnd,msgrcv</title>
<updated>2013-10-14T01:14:29Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr@hp.com</email>
</author>
<published>2013-09-30T20:45:26Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f4e9906e4bf46437d7bc59f093190a8de643c91b'/>
<id>urn:sha1:f4e9906e4bf46437d7bc59f093190a8de643c91b</id>
<content type='text'>
commit 4271b05a227dc6175b66c3d9941aeab09048aeb2 upstream.

This fixes a race in both msgrcv() and msgsnd() between finding the msg
and actually dealing with the queue, as another thread can delete shmid
underneath us if we are preempted before acquiring the
kern_ipc_perm.lock.

Manfred illustrates this nicely:

Assume a preemptible kernel that is preempted just after

    msq = msq_obtain_object_check(ns, msqid)

in do_msgrcv().  The only lock that is held is rcu_read_lock().

Now the other thread processes IPC_RMID.  When the first task is
resumed, then it will happily wait for messages on a deleted queue.

Fix this by checking for if the queue has been deleted after taking the
lock.

Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Reported-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>ipc/sem.c: fix race in sem_lock()</title>
<updated>2013-10-14T01:14:29Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2013-09-30T20:45:04Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a4e89936a1067a220c2e3946a89233fb8b7abd61'/>
<id>urn:sha1:a4e89936a1067a220c2e3946a89233fb8b7abd61</id>
<content type='text'>
commit 5e9d527591421ccdb16acb8c23662231135d8686 upstream.

The exclusion of complex operations in sem_lock() is insufficient: after
acquiring the per-semaphore lock, a simple op must first check that
sem_perm.lock is not locked and only after that test check
complex_count.  The current code does it the other way around - and that
creates a race.  Details are below.

The patch is a complete rewrite of sem_lock(), based in part on the code
from Mike Galbraith.  It removes all gotos and all loops and thus the
risk of livelocks.

I have tested the patch (together with the next one) on my i3 laptop and
it didn't cause any problems.

The bug is probably also present in 3.10 and 3.11, but for these kernels
it might be simpler just to move the test of sma-&gt;complex_count after
the spin_is_locked() test.

Details of the bug:

Assume:
 - sma-&gt;complex_count = 0.
 - Thread 1: semtimedop(complex op that must sleep)
 - Thread 2: semtimedop(simple op).

Pseudo-Trace:

Thread 1: sem_lock(): acquire sem_perm.lock
Thread 1: sem_lock(): check for ongoing simple ops
			Nothing ongoing, thread 2 is still before sem_lock().
Thread 1: try_atomic_semop()
	&lt;&lt;&lt; preempted.

Thread 2: sem_lock():
        static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
                                      int nsops)
        {
                int locknum;
         again:
                if (nsops == 1 &amp;&amp; !sma-&gt;complex_count) {
                        struct sem *sem = sma-&gt;sem_base + sops-&gt;sem_num;

                        /* Lock just the semaphore we are interested in. */
                        spin_lock(&amp;sem-&gt;lock);

                        /*
                         * If sma-&gt;complex_count was set while we were spinning,
                         * we may need to look at things we did not lock here.
                         */
                        if (unlikely(sma-&gt;complex_count)) {
                                spin_unlock(&amp;sem-&gt;lock);
                                goto lock_array;
                        }
        &lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;
	&lt;&lt;&lt; complex_count is still 0.
	&lt;&lt;&lt;
        &lt;&lt;&lt; Here it is preempted
        &lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;

Thread 1: try_atomic_semop() returns, notices that it must sleep.
Thread 1: increases sma-&gt;complex_count.
Thread 1: drops sem_perm.lock
Thread 2:
                /*
                 * Another process is holding the global lock on the
                 * sem_array; we cannot enter our critical section,
                 * but have to wait for the global lock to be released.
                 */
                if (unlikely(spin_is_locked(&amp;sma-&gt;sem_perm.lock))) {
                        spin_unlock(&amp;sem-&gt;lock);
                        spin_unlock_wait(&amp;sma-&gt;sem_perm.lock);
                        goto again;
                }
	&lt;&lt;&lt; sem_perm.lock already dropped, thus no "goto again;"

                locknum = sops-&gt;sem_num;

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Mike Galbraith &lt;bitbucket@online.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>ipc/msg.c: Fix lost wakeup in msgsnd().</title>
<updated>2013-09-27T00:21:22Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2013-09-03T14:00:08Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f33e422584bed14b9f194f6a7bf61db68dd95e12'/>
<id>urn:sha1:f33e422584bed14b9f194f6a7bf61db68dd95e12</id>
<content type='text'>
commit bebcb928c820d0ee83aca4b192adc195e43e66a2 upstream.

The check if the queue is full and adding current to the wait queue of
pending msgsnd() operations (ss_add()) must be atomic.

Otherwise:
 - the thread that performs msgsnd() finds a full queue and decides to
   sleep.
 - the thread that performs msgrcv() first reads all messages from the
   queue and then sleeps, because the queue is empty.
 - the msgrcv() calls do not perform any wakeups, because the msgsnd()
   task has not yet called ss_add().
 - then the msgsnd()-thread first calls ss_add() and then sleeps.

Net result: msgsnd() and msgrcv() both sleep forever.

Observed with msgctl08 from ltp with a preemptible kernel.

Fix: Call ipc_lock_object() before performing the check.

The patch also moves security_msg_queue_msgsnd() under ipc_lock_object:
 - msgctl(IPC_SET) explicitely mentions that it tries to expunge any
   pending operations that are not allowed anymore with the new
   permissions.  If security_msg_queue_msgsnd() is called without locks,
   then there might be races.
 - it makes the patch much simpler.

Reported-and-tested-by: Vineet Gupta &lt;Vineet.Gupta1@synopsys.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>IPC: bugfix for msgrcv with msgtyp &lt; 0</title>
<updated>2013-08-29T02:26:38Z</updated>
<author>
<name>Svenning Sørensen</name>
<email>sss@secomea.dk</email>
</author>
<published>2013-08-28T23:35:17Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=368ae537e056acd3f751fa276f48423f06803922'/>
<id>urn:sha1:368ae537e056acd3f751fa276f48423f06803922</id>
<content type='text'>
According to 'man msgrcv': "If msgtyp is less than 0, the first message of
the lowest type that is less than or equal to the absolute value of msgtyp
shall be received."

Bug: The kernel only returns a message if its type is 1; other messages
with type &lt; abs(msgtype) will never get returned.

Fix: After having traversed the list to find the first message with the
lowest type, we need to actually return that message.

This regression was introduced by commit daaf74cf0867 ("ipc: refactor
msg list search into separate function")

Signed-off-by: Svenning Soerensen &lt;sss@secomea.dk&gt;
Reviewed-by: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/sem.c: rename try_atomic_semop() to perform_atomic_semop(), docu update</title>
<updated>2013-07-09T17:33:28Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2013-07-08T23:01:26Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=758a6ba39ef6df4cdc615e5edd7bd86eab81a5f7'/>
<id>urn:sha1:758a6ba39ef6df4cdc615e5edd7bd86eab81a5f7</id>
<content type='text'>
Cleanup: Some minor points that I noticed while writing the previous
patches

1) The name try_atomic_semop() is misleading: The function performs the
   operation (if it is possible).

2) Some documentation updates.

No real code change, a rename and documentation changes.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/sem.c: replace shared sem_otime with per-semaphore value</title>
<updated>2013-07-09T17:33:28Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2013-07-08T23:01:25Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d12e1e50e47e0900dbbf52237b7e171f4f15ea1e'/>
<id>urn:sha1:d12e1e50e47e0900dbbf52237b7e171f4f15ea1e</id>
<content type='text'>
sem_otime contains the time of the last semaphore operation that
completed successfully.  Every operation updates this value, thus access
from multiple cpus can cause thrashing.

Therefore the patch replaces the variable with a per-semaphore variable.
The per-array sem_otime is only calculated when required.

No performance improvement on a single-socket i3 - only important for
larger systems.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/sem.c: always use only one queue for alter operations</title>
<updated>2013-07-09T17:33:28Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2013-07-08T23:01:24Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f269f40ad5aeee229ed70044926f44318abe41ef'/>
<id>urn:sha1:f269f40ad5aeee229ed70044926f44318abe41ef</id>
<content type='text'>
There are two places that can contain alter operations:
 - the global queue: sma-&gt;pending_alter
 - the per-semaphore queues: sma-&gt;sem_base[].pending_alter.

Since one of the queues must be processed first, this causes an odd
priorization of the wakeups: complex operations have priority over
simple ops.

The patch restores the behavior of linux &lt;=3.0.9: The longest waiting
operation has the highest priority.

This is done by using only one queue:
 - if there are complex ops, then sma-&gt;pending_alter is used.
 - otherwise, the per-semaphore queues are used.

As a side effect, do_smart_update_queue() becomes much simpler: no more
goto logic.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/sem: separate wait-for-zero and alter tasks into seperate queues</title>
<updated>2013-07-09T17:33:28Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2013-07-08T23:01:23Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=1a82e9e1d0f1b45f47a97c9e2349020536ff8987'/>
<id>urn:sha1:1a82e9e1d0f1b45f47a97c9e2349020536ff8987</id>
<content type='text'>
Introduce separate queues for operations that do not modify the
semaphore values.  Advantages:

 - Simpler logic in check_restart().
 - Faster update_queue(): Right now, all wait-for-zero operations are
   always tested, even if the semaphore value is not 0.
 - wait-for-zero gets again priority, as in linux &lt;=3.0.9

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/sem.c: cacheline align the semaphore structures</title>
<updated>2013-07-09T17:33:28Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2013-07-08T23:01:22Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f5c936c0f267ec58641451cf8b8d39b4c207ee4d'/>
<id>urn:sha1:f5c936c0f267ec58641451cf8b8d39b4c207ee4d</id>
<content type='text'>
As now each semaphore has its own spinlock and parallel operations are
possible, give each semaphore its own cacheline.

On a i3 laptop, this gives up to 28% better performance:

  #semscale 10 | grep "interleave 2"
  - before:
  Cpus 1, interleave 2 delay 0: 36109234 in 10 secs
  Cpus 2, interleave 2 delay 0: 55276317 in 10 secs
  Cpus 3, interleave 2 delay 0: 62411025 in 10 secs
  Cpus 4, interleave 2 delay 0: 81963928 in 10 secs

  -after:
  Cpus 1, interleave 2 delay 0: 35527306 in 10 secs
  Cpus 2, interleave 2 delay 0: 70922909 in 10 secs &lt;&lt;&lt; + 28%
  Cpus 3, interleave 2 delay 0: 80518538 in 10 secs
  Cpus 4, interleave 2 delay 0: 89115148 in 10 secs &lt;&lt;&lt; + 8.7%

i3, with 2 cores and with hyperthreading enabled.  Interleave 2 in order
use first the full cores.  HT partially hides the delay from cacheline
trashing, thus the improvement is "only" 8.7% if 4 threads are running.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
