<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel, branch v3.10.16</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/kernel?h=v3.10.16</id>
<link rel='self' href='https://git.amat.us/linux/atom/kernel?h=v3.10.16'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2013-10-13T23:08:34Z</updated>
<entry>
<title>irq: Force hardirq exit's softirq processing on its own stack</title>
<updated>2013-10-13T23:08:34Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2013-09-23T22:50:25Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=6dcdd5759f76c88ae86f9a98a232199520b6cc22'/>
<id>urn:sha1:6dcdd5759f76c88ae86f9a98a232199520b6cc22</id>
<content type='text'>
commit ded797547548a5b8e7b92383a41e4c0e6b0ecb7f upstream.

The commit facd8b80c67a3cf64a467c4a2ac5fb31f2e6745b
("irq: Sanitize invoke_softirq") converted irq exit
calls of do_softirq() to __do_softirq() on all architectures,
assuming it was only used there for its irq disablement
properties.

But as a side effect, the softirqs processed in the end
of the hardirq are always called on the inline current
stack that is used by irq_exit() instead of the softirq
stack provided by the archs that override do_softirq().

The result is mostly safe if the architecture runs irq_exit()
on a separate irq stack because then softirqs are processed
on that same stack that is near empty at this stage (assuming
hardirq aren't nesting).

Otherwise irq_exit() runs in the task stack and so does the softirq
too. The interrupted call stack can be randomly deep already and
the softirq can dig through it even further. To add insult to the
injury, this softirq can be interrupted by a new hardirq, maximizing
the chances for a stack overrun as reported in powerpc for example:

	do_IRQ: stack overflow: 1920
	CPU: 0 PID: 1602 Comm: qemu-system-ppc Not tainted 3.10.4-300.1.fc19.ppc64p7 #1
	Call Trace:
	[c0000000050a8740] .show_stack+0x130/0x200 (unreliable)
	[c0000000050a8810] .dump_stack+0x28/0x3c
	[c0000000050a8880] .do_IRQ+0x2b8/0x2c0
	[c0000000050a8930] hardware_interrupt_common+0x154/0x180
	--- Exception: 501 at .cp_start_xmit+0x3a4/0x820 [8139cp]
		LR = .cp_start_xmit+0x390/0x820 [8139cp]
	[c0000000050a8d40] .dev_hard_start_xmit+0x394/0x640
	[c0000000050a8e00] .sch_direct_xmit+0x110/0x260
	[c0000000050a8ea0] .dev_queue_xmit+0x260/0x630
	[c0000000050a8f40] .br_dev_queue_push_xmit+0xc4/0x130 [bridge]
	[c0000000050a8fc0] .br_dev_xmit+0x198/0x270 [bridge]
	[c0000000050a9070] .dev_hard_start_xmit+0x394/0x640
	[c0000000050a9130] .dev_queue_xmit+0x428/0x630
	[c0000000050a91d0] .ip_finish_output+0x2a4/0x550
	[c0000000050a9290] .ip_local_out+0x50/0x70
	[c0000000050a9310] .ip_queue_xmit+0x148/0x420
	[c0000000050a93b0] .tcp_transmit_skb+0x4e4/0xaf0
	[c0000000050a94a0] .__tcp_ack_snd_check+0x7c/0xf0
	[c0000000050a9520] .tcp_rcv_established+0x1e8/0x930
	[c0000000050a95f0] .tcp_v4_do_rcv+0x21c/0x570
	[c0000000050a96c0] .tcp_v4_rcv+0x734/0x930
	[c0000000050a97a0] .ip_local_deliver_finish+0x184/0x360
	[c0000000050a9840] .ip_rcv_finish+0x148/0x400
	[c0000000050a98d0] .__netif_receive_skb_core+0x4f8/0xb00
	[c0000000050a99d0] .netif_receive_skb+0x44/0x110
	[c0000000050a9a70] .br_handle_frame_finish+0x2bc/0x3f0 [bridge]
	[c0000000050a9b20] .br_nf_pre_routing_finish+0x2ac/0x420 [bridge]
	[c0000000050a9bd0] .br_nf_pre_routing+0x4dc/0x7d0 [bridge]
	[c0000000050a9c70] .nf_iterate+0x114/0x130
	[c0000000050a9d30] .nf_hook_slow+0xb4/0x1e0
	[c0000000050a9e00] .br_handle_frame+0x290/0x330 [bridge]
	[c0000000050a9ea0] .__netif_receive_skb_core+0x34c/0xb00
	[c0000000050a9fa0] .netif_receive_skb+0x44/0x110
	[c0000000050aa040] .napi_gro_receive+0xe8/0x120
	[c0000000050aa0c0] .cp_rx_poll+0x31c/0x590 [8139cp]
	[c0000000050aa1d0] .net_rx_action+0x1dc/0x310
	[c0000000050aa2b0] .__do_softirq+0x158/0x330
	[c0000000050aa3b0] .irq_exit+0xc8/0x110
	[c0000000050aa430] .do_IRQ+0xdc/0x2c0
	[c0000000050aa4e0] hardware_interrupt_common+0x154/0x180
	 --- Exception: 501 at .bad_range+0x1c/0x110
		 LR = .get_page_from_freelist+0x908/0xbb0
	[c0000000050aa7d0] .list_del+0x18/0x50 (unreliable)
	[c0000000050aa850] .get_page_from_freelist+0x908/0xbb0
	[c0000000050aa9e0] .__alloc_pages_nodemask+0x21c/0xae0
	[c0000000050aaba0] .alloc_pages_vma+0xd0/0x210
	[c0000000050aac60] .handle_pte_fault+0x814/0xb70
	[c0000000050aad50] .__get_user_pages+0x1a4/0x640
	[c0000000050aae60] .get_user_pages_fast+0xec/0x160
	[c0000000050aaf10] .__gfn_to_pfn_memslot+0x3b0/0x430 [kvm]
	[c0000000050aafd0] .kvmppc_gfn_to_pfn+0x64/0x130 [kvm]
	[c0000000050ab070] .kvmppc_mmu_map_page+0x94/0x530 [kvm]
	[c0000000050ab190] .kvmppc_handle_pagefault+0x174/0x610 [kvm]
	[c0000000050ab270] .kvmppc_handle_exit_pr+0x464/0x9b0 [kvm]
	[c0000000050ab320]  kvm_start_lightweight+0x1ec/0x1fc [kvm]
	[c0000000050ab4f0] .kvmppc_vcpu_run_pr+0x168/0x3b0 [kvm]
	[c0000000050ab9c0] .kvmppc_vcpu_run+0xc8/0xf0 [kvm]
	[c0000000050aba50] .kvm_arch_vcpu_ioctl_run+0x5c/0x1a0 [kvm]
	[c0000000050abae0] .kvm_vcpu_ioctl+0x478/0x730 [kvm]
	[c0000000050abc90] .do_vfs_ioctl+0x4ec/0x7c0
	[c0000000050abd80] .SyS_ioctl+0xd4/0xf0
	[c0000000050abe30] syscall_exit+0x0/0x98

Since this is a regression, this patch proposes a minimalistic
and low-risk solution by blindly forcing the hardirq exit processing of
softirqs on the softirq stack. This way we should reduce significantly
the opportunities for task stack overflow dug by softirqs.

Longer term solutions may involve extending the hardirq stack coverage to
irq_exit(), etc...

Reported-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@au1.ibm.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul Mackerras &lt;paulus@au1.ibm.com&gt;
Cc: James Hogan &lt;james.hogan@imgtec.com&gt;
Cc: James E.J. Bottomley &lt;jejb@parisc-linux.org&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>audit: fix endless wait in audit_log_start()</title>
<updated>2013-10-01T16:17:48Z</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@openvz.org</email>
</author>
<published>2013-09-24T22:27:42Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3ed3690eac0c839c6216fd7f7ce0add6a1f593b2'/>
<id>urn:sha1:3ed3690eac0c839c6216fd7f7ce0add6a1f593b2</id>
<content type='text'>
commit 8ac1c8d5deba65513b6a82c35e89e73996c8e0d6 upstream.

After commit 829199197a43 ("kernel/audit.c: avoid negative sleep
durations") audit emitters will block forever if userspace daemon cannot
handle backlog.

After the timeout the waiting loop turns into busy loop and runs until
daemon dies or returns back to work.  This is a minimal patch for that
bug.

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Cc: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Cc: Richard Guy Briggs &lt;rgb@redhat.com&gt;
Cc: Eric Paris &lt;eparis@redhat.com&gt;
Cc: Chuck Anderson &lt;chuck.anderson@oracle.com&gt;
Cc: Dan Duval &lt;dan.duval@oracle.com&gt;
Cc: Dave Kleikamp &lt;dave.kleikamp@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Jonghwan Choi &lt;jhbird.choi@samsung.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/fair: Fix small race where child-&gt;se.parent,cfs_rq might point to invalid ones</title>
<updated>2013-10-01T16:17:45Z</updated>
<author>
<name>Daisuke Nishimura</name>
<email>nishimura@mxp.nes.nec.co.jp</email>
</author>
<published>2013-09-10T09:16:36Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=51f5294797f14185f9b25ad0972a4abbbeff70c6'/>
<id>urn:sha1:51f5294797f14185f9b25ad0972a4abbbeff70c6</id>
<content type='text'>
commit 6c9a27f5da9609fca46cb2b183724531b48f71ad upstream.

There is a small race between copy_process() and cgroup_attach_task()
where child-&gt;se.parent,cfs_rq points to invalid (old) ones.

        parent doing fork()      | someone moving the parent to another cgroup
  -------------------------------+---------------------------------------------
    copy_process()
      + dup_task_struct()
        -&gt; parent-&gt;se is copied to child-&gt;se.
           se.parent,cfs_rq of them point to old ones.

                                     cgroup_attach_task()
                                       + cgroup_task_migrate()
                                         -&gt; parent-&gt;cgroup is updated.
                                       + cpu_cgroup_attach()
                                         + sched_move_task()
                                           + task_move_group_fair()
                                             +- set_task_rq()
                                                -&gt; se.parent,cfs_rq of parent
                                                   are updated.

      + cgroup_fork()
        -&gt; parent-&gt;cgroup is copied to child-&gt;cgroup. (*1)
      + sched_fork()
        + task_fork_fair()
          -&gt; se.parent,cfs_rq of child are accessed
             while they point to old ones. (*2)

In the worst case, this bug can lead to "use-after-free" and cause a panic,
because it's new cgroup's refcount that is incremented at (*1),
so the old cgroup(and related data) can be freed before (*2).

In fact, a panic caused by this bug was originally caught in RHEL6.4.

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [&lt;ffffffff81051e3e&gt;] sched_slice+0x6e/0xa0
    [...]
    Call Trace:
     [&lt;ffffffff81051f25&gt;] place_entity+0x75/0xa0
     [&lt;ffffffff81056a3a&gt;] task_fork_fair+0xaa/0x160
     [&lt;ffffffff81063c0b&gt;] sched_fork+0x6b/0x140
     [&lt;ffffffff8106c3c2&gt;] copy_process+0x5b2/0x1450
     [&lt;ffffffff81063b49&gt;] ? wake_up_new_task+0xd9/0x130
     [&lt;ffffffff8106d2f4&gt;] do_fork+0x94/0x460
     [&lt;ffffffff81072a9e&gt;] ? sys_wait4+0xae/0x100
     [&lt;ffffffff81009598&gt;] sys_clone+0x28/0x30
     [&lt;ffffffff8100b393&gt;] stub_clone+0x13/0x20
     [&lt;ffffffff8100b072&gt;] ? system_call_fastpath+0x16/0x1b

Signed-off-by: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/039601ceae06$733d3130$59b79390$@mxp.nes.nec.co.jp
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/cputime: Do not scale when utime == 0</title>
<updated>2013-10-01T16:17:45Z</updated>
<author>
<name>Stanislaw Gruszka</name>
<email>sgruszka@redhat.com</email>
</author>
<published>2013-09-04T13:16:03Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=013b14c3067f0d55ab115405647d9f035f67737e'/>
<id>urn:sha1:013b14c3067f0d55ab115405647d9f035f67737e</id>
<content type='text'>
commit 5a8e01f8fa51f5cbce8f37acc050eb2319d12956 upstream.

scale_stime() silently assumes that stime &lt; rtime, otherwise
when stime == rtime and both values are big enough (operations
on them do not fit in 32 bits), the resulting scaling stime can
be bigger than rtime. In consequence utime = rtime - stime
results in negative value.

User space visible symptoms of the bug are overflowed TIME
values on ps/top, for example:

 $ ps aux | grep rcu
 root         8  0.0  0.0      0     0 ?        S    12:42   0:00 [rcuc/0]
 root         9  0.0  0.0      0     0 ?        S    12:42   0:00 [rcub/0]
 root        10 62422329  0.0  0     0 ?        R    12:42 21114581:37 [rcu_preempt]
 root        11  0.1  0.0      0     0 ?        S    12:42   0:02 [rcuop/0]
 root        12 62422329  0.0  0     0 ?        S    12:42 21114581:35 [rcuop/1]
 root        10 62422329  0.0  0     0 ?        R    12:42 21114581:37 [rcu_preempt]

or overflowed utime values read directly from /proc/$PID/stat

Reference:

  https://lkml.org/lkml/2013/8/20/259

Reported-and-tested-by: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Signed-off-by: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Cc: stable@vger.kernel.org
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Link: http://lkml.kernel.org/r/20130904131602.GC2564@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>timekeeping: Fix HRTICK related deadlock from ntp lock changes</title>
<updated>2013-10-01T16:17:45Z</updated>
<author>
<name>John Stultz</name>
<email>john.stultz@linaro.org</email>
</author>
<published>2013-09-11T23:50:56Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=63946e8616205dafe39b4d88f9fc3dc7c4fd79aa'/>
<id>urn:sha1:63946e8616205dafe39b4d88f9fc3dc7c4fd79aa</id>
<content type='text'>
commit 7bd36014460f793c19e7d6c94dab67b0afcfcb7f upstream.

Gerlando Falauto reported that when HRTICK is enabled, it is
possible to trigger system deadlocks. These were hard to
reproduce, as HRTICK has been broken in the past, but seemed
to be connected to the timekeeping_seq lock.

Since seqlock/seqcount's aren't supported w/ lockdep, I added
some extra spinlock based locking and triggered the following
lockdep output:

[   15.849182] ntpd/4062 is trying to acquire lock:
[   15.849765]  (&amp;(&amp;pool-&gt;lock)-&gt;rlock){..-...}, at: [&lt;ffffffff810aa9b5&gt;] __queue_work+0x145/0x480
[   15.850051]
[   15.850051] but task is already holding lock:
[   15.850051]  (timekeeper_lock){-.-.-.}, at: [&lt;ffffffff810df6df&gt;] do_adjtimex+0x7f/0x100

&lt;snip&gt;

[   15.850051] Chain exists of: &amp;(&amp;pool-&gt;lock)-&gt;rlock --&gt; &amp;p-&gt;pi_lock --&gt; timekeeper_lock
[   15.850051]  Possible unsafe locking scenario:
[   15.850051]
[   15.850051]        CPU0                    CPU1
[   15.850051]        ----                    ----
[   15.850051]   lock(timekeeper_lock);
[   15.850051]                                lock(&amp;p-&gt;pi_lock);
[   15.850051] lock(timekeeper_lock);
[   15.850051] lock(&amp;(&amp;pool-&gt;lock)-&gt;rlock);
[   15.850051]
[   15.850051]  *** DEADLOCK ***

The deadlock was introduced by 06c017fdd4dc48451a ("timekeeping:
Hold timekeepering locks in do_adjtimex and hardpps") in 3.10

This patch avoids this deadlock, by moving the call to
schedule_delayed_work() outside of the timekeeper lock
critical section.

Reported-by: Gerlando Falauto &lt;gerlando.falauto@keymile.com&gt;
Tested-by: Lin Ming &lt;minggr@gmail.com&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Link: http://lkml.kernel.org/r/1378943457-27314-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>pidns: fix vfork() after unshare(CLONE_NEWPID)</title>
<updated>2013-09-27T00:18:28Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-09-11T21:19:38Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f608ebd760cb9b48a333700fdc10245e257d9fce'/>
<id>urn:sha1:f608ebd760cb9b48a333700fdc10245e257d9fce</id>
<content type='text'>
commit e79f525e99b04390ca4d2366309545a836c03bf1 upstream.

Commit 8382fcac1b81 ("pidns: Outlaw thread creation after
unshare(CLONE_NEWPID)") nacks CLONE_VM if the forking process unshared
pid_ns, this obviously breaks vfork:

	int main(void)
	{
		assert(unshare(CLONE_NEWUSER | CLONE_NEWPID) == 0);
		assert(vfork() &gt;= 0);
		_exit(0);
		return 0;
	}

fails without this patch.

Change this check to use CLONE_SIGHAND instead.  This also forbids
CLONE_THREAD automatically, and this is what the comment implies.

We could probably even drop CLONE_SIGHAND and use CLONE_THREAD, but it
would be safer to not do this.  The current check denies CLONE_SIGHAND
implicitely and there is no reason to change this.

Eric said "CLONE_SIGHAND is fine.  CLONE_THREAD would be even better.
Having shared signal handling between two different pid namespaces is
the case that we are fundamentally guarding against."

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reported-by: Colin Walters &lt;walters@redhat.com&gt;
Acked-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Reviewed-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup</title>
<updated>2013-09-27T00:18:27Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2013-08-29T20:56:50Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=5a48788ca4d6dc78d1856d9ddeea1b1160097cc9'/>
<id>urn:sha1:5a48788ca4d6dc78d1856d9ddeea1b1160097cc9</id>
<content type='text'>
commit a606488513543312805fab2b93070cefe6a3016c upstream.

Serge Hallyn &lt;serge.hallyn@ubuntu.com&gt; writes:

&gt; Since commit af4b8a83add95ef40716401395b44a1b579965f4 it's been
&gt; possible to get into a situation where a pidns reaper is
&gt; &lt;defunct&gt;, reparented to host pid 1, but never reaped.  How to
&gt; reproduce this is documented at
&gt;
&gt; https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526
&gt; (and see
&gt; https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/comments/13)
&gt; In short, run repeated starts of a container whose init is
&gt;
&gt; Process.exit(0);
&gt;
&gt; sysrq-t when such a task is playing zombie shows:
&gt;
&gt; [  131.132978] init            x ffff88011fc14580     0  2084   2039 0x00000000
&gt; [  131.132978]  ffff880116e89ea8 0000000000000002 ffff880116e89fd8 0000000000014580
&gt; [  131.132978]  ffff880116e89fd8 0000000000014580 ffff8801172a0000 ffff8801172a0000
&gt; [  131.132978]  ffff8801172a0630 ffff88011729fff0 ffff880116e14650 ffff88011729fff0
&gt; [  131.132978] Call Trace:
&gt; [  131.132978]  [&lt;ffffffff816f6159&gt;] schedule+0x29/0x70
&gt; [  131.132978]  [&lt;ffffffff81064591&gt;] do_exit+0x6e1/0xa40
&gt; [  131.132978]  [&lt;ffffffff81071eae&gt;] ? signal_wake_up_state+0x1e/0x30
&gt; [  131.132978]  [&lt;ffffffff8106496f&gt;] do_group_exit+0x3f/0xa0
&gt; [  131.132978]  [&lt;ffffffff810649e4&gt;] SyS_exit_group+0x14/0x20
&gt; [  131.132978]  [&lt;ffffffff8170102f&gt;] tracesys+0xe1/0xe6
&gt;
&gt; Further debugging showed that every time this happened, zap_pid_ns_processes()
&gt; started with nr_hashed being 3, while we were expecting it to drop to 2.
&gt; Any time it didn't happen, nr_hashed was 1 or 2.  So the reaper was
&gt; waiting for nr_hashed to become 2, but free_pid() only wakes the reaper
&gt; if nr_hashed hits 1.

The issue is that when the task group leader of an init process exits
before other tasks of the init process when the init process finally
exits it will be a secondary task sleeping in zap_pid_ns_processes and
waiting to wake up when the number of hashed pids drops to two.  This
case waits forever as free_pid only sends a wake up when the number of
hashed pids drops to 1.

To correct this the simple strategy of sending a possibly unncessary
wake up when the number of hashed pids drops to 2 is adopted.

Sending one extraneous wake up is relatively harmless, at worst we
waste a little cpu time in the rare case when a pid namespace
appropaches exiting.

We can detect the case when the pid namespace drops to just two pids
hashed race free in free_pid.

Dereferencing pid_ns-&gt;child_reaper with the pidmap_lock held is safe
without out the tasklist_lock because it is guaranteed that the
detach_pid will be called on the child_reaper before it is freed and
detach_pid calls __change_pid which calls free_pid which takes the
pidmap_lock.  __change_pid only calls free_pid if this is the
last use of the pid.  For a thread that is not the thread group leader
the threads pid will only ever have one user because a threads pid
is not allowed to be the pid of a process, of a process group or
a session.  For a thread that is a thread group leader all of
the other threads of that process will be reaped before it is allowed
for the thread group leader to be reaped ensuring there will only
be one user of the threads pid as a process pid.  Furthermore
because the thread is the init process of a pid namespace all of the
other processes in the pid namespace will have also been already freed
leading to the fact that the pid will not be used as a session pid or
a process group pid for any other running process.

Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Tested-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Reported-by: Serge Hallyn &lt;serge.hallyn@ubuntu.com&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>uprobes: Fix utask-&gt;depth accounting in handle_trampoline()</title>
<updated>2013-09-27T00:18:27Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-09-11T15:47:26Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=73e2c2b7c105e46344eb409575a4508f24a82eee'/>
<id>urn:sha1:73e2c2b7c105e46344eb409575a4508f24a82eee</id>
<content type='text'>
commit 878b5a6efd38030c7a90895dc8346e8fb1e09b4c upstream.

Currently utask-&gt;depth is simply the number of allocated/pending
return_instance's in uprobe_task-&gt;return_instances list.

handle_trampoline() should decrement this counter every time we
handle/free an instance, but due to typo it does this only if
-&gt;chained == T. This means that in the likely case this counter
is never decremented and the probed task can't report more than
MAX_URETPROBE_DEPTH events.

Reported-by: Mikhail Kulemin &lt;Mikhail.Kulemin@ru.ibm.com&gt;
Reported-by: Hemant Kumar Shaw &lt;hkshaw@linux.vnet.ibm.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Anton Arapov &lt;anton@redhat.com&gt;
Cc: masami.hiramatsu.pt@hitachi.com
Cc: srikar@linux.vnet.ibm.com
Cc: systemtap@sourceware.org
Link: http://lkml.kernel.org/r/20130911154726.GA8093@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: cond_resched() after processing each work item</title>
<updated>2013-09-08T05:09:58Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-08-28T21:33:37Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=6ff96f7340e8a0b7b7e2c40a26bc47fb320e6475'/>
<id>urn:sha1:6ff96f7340e8a0b7b7e2c40a26bc47fb320e6475</id>
<content type='text'>
commit b22ce2785d97423846206cceec4efee0c4afd980 upstream.

If !PREEMPT, a kworker running work items back to back can hog CPU.
This becomes dangerous when a self-requeueing work item which is
waiting for something to happen races against stop_machine.  Such
self-requeueing work item would requeue itself indefinitely hogging
the kworker and CPU it's running on while stop_machine would wait for
that CPU to enter stop_machine while preventing anything else from
happening on all other CPUs.  The two would deadlock.

Jamie Liu reports that this deadlock scenario exists around
scsi_requeue_run_queue() and libata port multiplier support, where one
port may exclude command processing from other ports.  With the right
timing, scsi_requeue_run_queue() can end up requeueing itself trying
to execute an IO which is asked to be retried while another device has
an exclusive access, which in turn can't make forward progress due to
stop_machine.

Fix it by invoking cond_resched() after executing each work item.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Jamie Liu &lt;jamieliu@google.com&gt;
References: http://thread.gmane.org/gmane.linux.kernel/1552567
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>timer_list: correct the iterator for timer_list</title>
<updated>2013-09-08T05:09:58Z</updated>
<author>
<name>Nathan Zimmer</name>
<email>nzimmer@sgi.com</email>
</author>
<published>2013-08-28T23:35:14Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=b3772c81e3490f1ddc0547ee479598f3f4221699'/>
<id>urn:sha1:b3772c81e3490f1ddc0547ee479598f3f4221699</id>
<content type='text'>
commit 84a78a6504f5c5394a8e558702e5b54131f01d14 upstream.

Correct an issue with /proc/timer_list reported by Holger.

When reading from the proc file with a sufficiently small buffer, 2k so
not really that small, there was one could get hung trying to read the
file a chunk at a time.

The timer_list_start function failed to account for the possibility that
the offset was adjusted outside the timer_list_next.

Signed-off-by: Nathan Zimmer &lt;nzimmer@sgi.com&gt;
Reported-by: Holger Hans Peter Freyther &lt;holger@freyther.de&gt;
Cc: John Stultz &lt;john.stultz@linaro.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Berke Durak &lt;berke.durak@xiphos.com&gt;
Cc: Jeff Layton &lt;jlayton@redhat.com&gt;
Tested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
