<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/ipc/util.h, branch v3.12.14</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/ipc/util.h?h=v3.12.14</id>
<link rel='self' href='https://git.amat.us/linux/atom/ipc/util.h?h=v3.12.14'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2013-12-04T19:05:22Z</updated>
<entry>
<title>ipc, msg: fix message length check for negative values</title>
<updated>2013-12-04T19:05:22Z</updated>
<author>
<name>Mathias Krause</name>
<email>minipli@googlemail.com</email>
</author>
<published>2013-11-12T23:11:47Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=620ff33d7cb1d446c1e24fbfe1a033106b99e5c4'/>
<id>urn:sha1:620ff33d7cb1d446c1e24fbfe1a033106b99e5c4</id>
<content type='text'>
commit 4e9b45a19241354daec281d7a785739829b52359 upstream.

On 64 bit systems the test for negative message sizes is bogus as the
size, which may be positive when evaluated as a long, will get truncated
to an int when passed to load_msg().  So a long might very well contain a
positive value but when truncated to an int it would become negative.

That in combination with a small negative value of msg_ctlmax (which will
be promoted to an unsigned type for the comparison against msgsz, making
it a big positive value and therefore make it pass the check) will lead to
two problems: 1/ The kmalloc() call in alloc_msg() will allocate a too
small buffer as the addition of alen is effectively a subtraction.  2/ The
copy_from_user() call in load_msg() will first overflow the buffer with
userland data and then, when the userland access generates an access
violation, the fixup handler copy_user_handle_tail() will try to fill the
remainder with zeros -- roughly 4GB.  That almost instantly results in a
system crash or reset.

  ,-[ Reproducer (needs to be run as root) ]--
  | #include &lt;sys/stat.h&gt;
  | #include &lt;sys/msg.h&gt;
  | #include &lt;unistd.h&gt;
  | #include &lt;fcntl.h&gt;
  |
  | int main(void) {
  |     long msg = 1;
  |     int fd;
  |
  |     fd = open("/proc/sys/kernel/msgmax", O_WRONLY);
  |     write(fd, "-1", 2);
  |     close(fd);
  |
  |     msgsnd(0, &amp;msg, 0xfffffff0, IPC_NOWAIT);
  |
  |     return 0;
  | }
  '---

Fix the issue by preventing msgsz from getting truncated by consistently
using size_t for the message length.  This way the size checks in
do_msgsnd() could still be passed with a negative value for msg_ctlmax but
we would fail on the buffer allocation in that case and error out.

Also change the type of m_ts from int to size_t to avoid similar nastiness
in other code paths -- it is used in similar constructs, i.e.  signed vs.
unsigned checks.  It should never become negative under normal
circumstances, though.

Setting msg_ctlmax to a negative value is an odd configuration and should
be prevented.  As that might break existing userland, it will be handled
in a separate commit so it could easily be reverted and reworked without
reintroducing the above described bug.

Hardening mechanisms for user copy operations would have catched that bug
early -- e.g.  checking slab object sizes on user copy operations as the
usercopy feature of the PaX patch does.  Or, for that matter, detect the
long vs.  int sign change due to truncation, as the size overflow plugin
of the very same patch does.

[akpm@linux-foundation.org: fix i386 min() warnings]
Signed-off-by: Mathias Krause &lt;minipli@googlemail.com&gt;
Cc: Pax Team &lt;pageexec@freemail.hu&gt;
Cc: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Cc: Brad Spengler &lt;spender@grsecurity.net&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>ipc: fix race with LSMs</title>
<updated>2013-09-24T16:36:53Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr@hp.com</email>
</author>
<published>2013-09-24T00:04:45Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=53dad6d3a8e5ac1af8bacc6ac2134ae1a8b085f1'/>
<id>urn:sha1:53dad6d3a8e5ac1af8bacc6ac2134ae1a8b085f1</id>
<content type='text'>
Currently, IPC mechanisms do security and auditing related checks under
RCU.  However, since security modules can free the security structure,
for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
race if the structure is freed before other tasks are done with it,
creating a use-after-free condition.  Manfred illustrates this nicely,
for instance with shared mem and selinux:

 -&gt; do_shmat calls rcu_read_lock()
 -&gt; do_shmat calls shm_object_check().
     Checks that the object is still valid - but doesn't acquire any locks.
     Then it returns.
 -&gt; do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
 -&gt; selinux_shm_shmat calls ipc_has_perm()
 -&gt; ipc_has_perm accesses ipc_perms-&gt;security

shm_close()
 -&gt; shm_close acquires rw_mutex &amp; shm_lock
 -&gt; shm_close calls shm_destroy
 -&gt; shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
 -&gt; selinux_shm_free_security calls ipc_free_security(&amp;shp-&gt;shm_perm)
 -&gt; ipc_free_security calls kfree(ipc_perms-&gt;security)

This patch delays the freeing of the security structures after all RCU
readers are done.  Furthermore it aligns the security life cycle with
that of the rest of IPC - freeing them based on the reference counter.
For situations where we need not free security, the current behavior is
kept.  Linus states:

 "... the old behavior was suspect for another reason too: having the
  security blob go away from under a user sounds like it could cause
  various other problems anyway, so I think the old code was at least
  _prone_ to bugs even if it didn't have catastrophic behavior."

I have tested this patch with IPC testcases from LTP on both my
quad-core laptop and on a 64 core NUMA server.  In both cases selinux is
enabled, and tests pass for both voluntary and forced preemption models.
While the mentioned races are theoretical (at least no one as reported
them), I wanted to make sure that this new logic doesn't break anything
we weren't aware of.

Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Acked-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc: drop ipc_lock_check</title>
<updated>2013-09-11T22:59:45Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr.bueso@hp.com</email>
</author>
<published>2013-09-11T21:26:31Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=20b8875abcf2daa1dda5cf70bd6369df5e85d4c1'/>
<id>urn:sha1:20b8875abcf2daa1dda5cf70bd6369df5e85d4c1</id>
<content type='text'>
No remaining users, we now use ipc_obtain_object_check().

Signed-off-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Cc: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc: drop ipc_lock_by_ptr</title>
<updated>2013-09-11T22:59:44Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr.bueso@hp.com</email>
</author>
<published>2013-09-11T21:26:29Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=32a2750010981216fb788c5190fb0e646abfab30'/>
<id>urn:sha1:32a2750010981216fb788c5190fb0e646abfab30</id>
<content type='text'>
After previous cleanups and optimizations, this function is no longer
heavily used and we don't have a good reason to keep it.  Update the few
remaining callers and get rid of it.

Signed-off-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Cc: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc: rename ids-&gt;rw_mutex</title>
<updated>2013-09-11T22:59:42Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr.bueso@hp.com</email>
</author>
<published>2013-09-11T21:26:24Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d9a605e40b1376eb02b067d7690580255a0df68f'/>
<id>urn:sha1:d9a605e40b1376eb02b067d7690580255a0df68f</id>
<content type='text'>
Since in some situations the lock can be shared for readers, we shouldn't
be calling it a mutex, rename it to rwsem.

Signed-off-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Tested-by: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc: drop ipcctl_pre_down</title>
<updated>2013-09-11T22:59:39Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr.bueso@hp.com</email>
</author>
<published>2013-09-11T21:26:17Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3b1c4ad37741e53804ffe0a30dd01e08b2ab6241'/>
<id>urn:sha1:3b1c4ad37741e53804ffe0a30dd01e08b2ab6241</id>
<content type='text'>
Now that sem, msgque and shm, through *_down(), all use the lockless
variant of ipcctl_pre_down(), go ahead and delete it.

[akpm@linux-foundation.org: fix function name in kerneldoc, cleanups]
Signed-off-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Tested-by: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc: close open coded spin lock calls</title>
<updated>2013-07-09T17:33:27Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr.bueso@hp.com</email>
</author>
<published>2013-07-08T23:01:11Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=cf9d5d78d05bca96df7618dfc3a5ee4414dcae58'/>
<id>urn:sha1:cf9d5d78d05bca96df7618dfc3a5ee4414dcae58</id>
<content type='text'>
Signed-off-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc: introduce ipc object locking helpers</title>
<updated>2013-07-09T17:33:27Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr.bueso@hp.com</email>
</author>
<published>2013-07-08T23:01:10Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=1ca7003ab41152d673d9e359632283d05294f3d6'/>
<id>urn:sha1:1ca7003ab41152d673d9e359632283d05294f3d6</id>
<content type='text'>
Simple helpers around the (kern_ipc_perm *)-&gt;lock spinlock.

Signed-off-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc,sem: fine grained locking for semtimedop</title>
<updated>2013-05-01T15:12:58Z</updated>
<author>
<name>Rik van Riel</name>
<email>riel@surriel.com</email>
</author>
<published>2013-05-01T02:15:44Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=6062a8dc0517bce23e3c2f7d2fea5e22411269a3'/>
<id>urn:sha1:6062a8dc0517bce23e3c2f7d2fea5e22411269a3</id>
<content type='text'>
Introduce finer grained locking for semtimedop, to handle the common case
of a program wanting to manipulate one semaphore from an array with
multiple semaphores.

If the call is a semop manipulating just one semaphore in an array with
multiple semaphores, only take the lock for that semaphore itself.

If the call needs to manipulate multiple semaphores, or another caller is
in a transaction that manipulates multiple semaphores, the sem_array lock
is taken, as well as all the locks for the individual semaphores.

On a 24 CPU system, performance numbers with the semop-multi
test with N threads and N semaphores, look like this:

	vanilla		Davidlohr's	Davidlohr's +	Davidlohr's +
threads			patches		rwlock patches	v3 patches
10	610652		726325		1783589		2142206
20	341570		365699		1520453		1977878
30	288102		307037		1498167		2037995
40	290714		305955		1612665		2256484
50	288620		312890		1733453		2650292
60	289987		306043		1649360		2388008
70	291298		306347		1723167		2717486
80	290948		305662		1729545		2763582
90	290996		306680		1736021		2757524
100	292243		306700		1773700		3059159

[davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
[davidlohr.bueso@hp.com: make refcounter atomic]
Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Cc: Chegu Vinod &lt;chegu_vinod@hp.com&gt;
Cc: Jason Low &lt;jason.low2@hp.com&gt;
Reviewed-by: Michel Lespinasse &lt;walken@google.com&gt;
Cc: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Cc: Stanislav Kinsbursky &lt;skinsbursky@parallels.com&gt;
Tested-by: Emmanuel Benisty &lt;benisty.e@gmail.com&gt;
Tested-by: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc,sem: do not hold ipc lock more than necessary</title>
<updated>2013-05-01T15:12:58Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr.bueso@hp.com</email>
</author>
<published>2013-05-01T02:15:29Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=16df3674efe39f3ab63e7052f1244dd3d50e7f84'/>
<id>urn:sha1:16df3674efe39f3ab63e7052f1244dd3d50e7f84</id>
<content type='text'>
Instead of holding the ipc lock for permissions and security checks, among
others, only acquire it when necessary.

Some numbers....

1) With Rik's semop-multi.c microbenchmark we can see the following
   results:

Baseline (3.9-rc1):
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 151452270, ops/sec 5048409

+  59.40%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
+   6.14%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
+   3.84%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
+   3.64%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
+   2.06%            a.out  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string
+   1.86%            a.out  [kernel.kallsyms]  [k] ipc_lock

With this patchset:
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 273156400, ops/sec 9105213

+  18.54%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
+  11.72%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
+   7.70%            a.out  [kernel.kallsyms]  [k] ipc_has_perm.isra.21
+   6.58%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
+   6.54%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
+   4.71%            a.out  [kernel.kallsyms]  [k] ipc_obtain_object_check

2) While on an Oracle swingbench DSS (data mining) workload the
   improvements are not as exciting as with Rik's benchmark, we can see
   some positive numbers.  For an 8 socket machine the following are the
   percentages of %sys time incurred in the ipc lock:

Baseline (3.9-rc1):
100 swingbench users: 8,74%
400 swingbench users: 21,86%
800 swingbench users: 84,35%

With this patchset:
100 swingbench users: 8,11%
400 swingbench users: 19,93%
800 swingbench users: 77,69%

[riel@redhat.com: fix two locking bugs]
[sasha.levin@oracle.com: prevent releasing RCU read lock twice in semctl_main]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Reviewed-by: Chegu Vinod &lt;chegu_vinod@hp.com&gt;
Acked-by: Michel Lespinasse &lt;walken@google.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Jason Low &lt;jason.low2@hp.com&gt;
Cc: Emmanuel Benisty &lt;benisty.e@gmail.com&gt;
Cc: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Cc: Stanislav Kinsbursky &lt;skinsbursky@parallels.com&gt;
Tested-by: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
