aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2009-04-27sched: do not count frozen tasks toward loadNathan Lynch
upstream commit: e3c8ca8336707062f3f7cb1cd7e6b3c753baccdd Freezing tasks via the cgroup freezer causes the load average to climb because the freezer's current implementation puts frozen tasks in uninterruptible sleep (D state). Some applications which perform job-scheduling functions consult the load average when making decisions. If a cgroup is frozen, the load average does not provide a useful measure of the system's utilization to such applications. This is especially inconvenient if the job scheduler employs the cgroup freezer as a mechanism for preempting low priority jobs. Contrast this with using SIGSTOP for the same purpose: the stopped tasks do not count toward system load. Change task_contributes_to_load() to return false if the task is frozen. This results in /proc/loadavg behavior that better meets users' expectations. Signed-off-by: Nathan Lynch <ntl@pobox.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Nigel Cunningham <nigel@tuxonice.net> Tested-by: Nigel Cunningham <nigel@tuxonice.net> Cc: <stable@kernel.org> Cc: containers@lists.linux-foundation.org Cc: linux-pm@lists.linux-foundation.org Cc: Matt Helsley <matthltc@us.ibm.com> LKML-Reference: <20090408194512.47a99b95@manatee.lan> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27dm kcopyd: fix callback raceMikulas Patocka
upstream commit: 340cd44451fb0bfa542365e6b4b565bbd44836e2 If the thread calling dm_kcopyd_copy is delayed due to scheduling inside split_job/segment_complete and the subjobs complete before the loop in split_job completes, the kcopyd callback could be invoked from the thread that called dm_kcopyd_copy instead of the kcopyd workqueue. dm_kcopyd_copy -> split_job -> segment_complete -> job->fn() Snapshots depend on the fact that callbacks are called from the singlethreaded kcopyd workqueue and expect that there is no racing between individual callbacks. The racing between callbacks can lead to corruption of exception store and it can also mean that exception store callbacks are called twice for the same exception - a likely reason for crashes reported inside pending_complete() / remove_exception(). This patch fixes two problems: 1. job->fn being called from the thread that submitted the job (see above). - Fix: hand over the completion callback to the kcopyd thread. 2. job->fn(read_err, write_err, job->context); in segment_complete reports the error of the last subjob, not the union of all errors. - Fix: pass job->write_err to the callback to report all error bits (it is done already in run_complete_job) Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27dm kcopyd: prepare for callback race fixMikulas Patocka
upstream commit: 73830857bca6f6c9dbd48e906daea50bea42d676 Use a variable in segment_complete() to point to the dm_kcopyd_client struct and only release job->pages in run_complete_job() if any are defined. These changes are needed by the next patch. Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27posix-timers: fix RLIMIT_CPU && setitimer(CPUCLOCK_PROF)Oleg Nesterov
upstream commit: 8f2e586567b1bad72dac7c3810fe9a2ef7117506 update_rlimit_cpu() tries to optimize out set_process_cpu_timer() in case when we already have CPUCLOCK_PROF timer which should expire first. But it uses cputime_lt() instead of cputime_gt(). Test case: int main(void) { struct itimerval it = { .it_value = { .tv_sec = 1000 }, }; assert(!setitimer(ITIMER_PROF, &it, NULL)); struct rlimit rl = { .rlim_cur = 1, .rlim_max = 1, }; assert(!setrlimit(RLIMIT_CPU, &rl)); for (;;) ; return 0; } Without this patch, the task is not killed as RLIMIT_CPU demands. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Peter Lojkin <ia6432@inbox.ru> Cc: Roland McGrath <roland@redhat.com> Cc: stable@kernel.org LKML-Reference: <20090327000610.GA10108@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27posix-timers: fix RLIMIT_CPU && fork()Oleg Nesterov
upstream commit: 6279a751fe096a21dc7704e918d570d3ff06e769 See http://bugzilla.kernel.org/show_bug.cgi?id=12911 copy_signal() copies signal->rlim, but RLIMIT_CPU is "lost". Because posix_cpu_timers_init_group() sets cputime_expires.prof_exp = 0 and thus fastpath_timer_check() returns false unless we have other expired cpu timers. Change copy_signal() to set cputime_expires.prof_exp if we have RLIMIT_CPU. Also, set cputimer.running = 1 in that case. This is not strictly necessary, but imho makes sense. Reported-by: Peter Lojkin <ia6432@inbox.ru> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Peter Lojkin <ia6432@inbox.ru> Cc: Roland McGrath <roland@redhat.com> Cc: stable@kernel.org LKML-Reference: <20090327000607.GA10104@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27posixtimers, sched: Fix posix clock monotonicityHidetoshi Seto
upstream commit: c5f8d99585d7b5b7e857fabf8aefd0174903a98c Impact: Regression fix (against clock_gettime() backwarding bug) This patch re-introduces a couple of functions, task_sched_runtime and thread_group_sched_runtime, which was once removed at the time of 2.6.28-rc1. These functions protect the sampling of thread/process clock with rq lock. This rq lock is required not to update rq->clock during the sampling. i.e. The clock_gettime() may return ((accounted runtime before update) + (delta after update)) that is less than what it should be. v2 -> v3: - Rename static helper function __task_delta_exec() to do_task_delta_exec() since -tip tree already has a __task_delta_exec() of different version. v1 -> v2: - Revises comments of function and patch description. - Add note about accuracy of thread group's runtime. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org [2.6.28.x][2.6.29.x] LKML-Reference: <49D1CC93.4080401@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27cap_prctl: don't set error to 0 at 'no_change'Serge E. Hallyn
upstream commit: 5bf37ec3e0f5eb79f23e024a7fbc8f3557c087f0 One-liner: capsh --print is broken without this patch. In certain cases, cap_prctl returns error > 0 for success. However, the 'no_change' label was always setting error to 0. As a result, for example, 'prctl(CAP_BSET_READ, N)' would always return 0. It should return 1 if a process has N in its bounding set (as by default it does). I'm keeping the no_change label even though it's now functionally the same as 'error'. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27SCSI: libiscsi: fix iscsi pool error pathJean Delvare
upstream commit: fd6e1c14b73dbab89cb76af895d5612e4a8b5522 Le lundi 30 mars 2009, Chris Wright a écrit : > q->queue could be ERR_PTR(-ENOMEM) which will break unwinding > on error. Make iscsi_pool_free more defensive. > Making the freeing of q->queue dependent on q->pool being set looks really weird (although it is correct at the moment. But this seems to be fixable in a much simpler way. With the benefit that only the error case is slowed down. In both cases we have a problem if q->queue contains an error value but it's not -ENOMEM. Apparently this can't happen today, but it doesn't feel right to assume this will always be true. Maybe it's the right time to fix this as well. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> [chrisw: this is a fixlet to f474a37b, also in -stable] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27SCSI: libiscsi: fix iscsi pool error pathJean Delvare
upstream commit: f474a37bc48667595b5653a983b635c95ed82a3b Memory freeing in iscsi_pool_free() looks wrong to me. Either q->pool can be NULL and this should be tested before dereferencing it, or it can't be NULL and it shouldn't be tested at all. As far as I can see, the only case where q->pool is NULL is on early error in iscsi_pool_init(). One possible way to fix the bug is thus to not call iscsi_pool_free() in this case (nothing needs to be freed anyway) and then we can get rid of the q->pool check. Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27sparc64: Fix bug in ("sparc64: Flush TLB before releasing pages.")David Miller
[ No upstream commit, this regression was added only to 2.6.29.1 ] Unfortunately I merged an earlier version of commit b6816b706138c3870f03115071872cad824f90b4 ("sparc64: Flush TLB before releasing pages.") than what I actually tested and merged upstream. Simply diffing asm/tlb_64.h in Linus's tree vs. what ended up in 2.6.29.1 confirms this. Sync things up to fix BUG() triggers some users are seeing. Reported-by: Dennis Gilmore <dennis@ausil.us> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27ALSA: hda - add missing comma in ad1884_slave_volsAkinobu Mita
upstream commit: bca68467b59a24396554d8dd5979ee363c174854 Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27splice: fix deadlock in splicing to fileMiklos Szeredi
upstream commit: 7bfac9ecf0585962fe13584f5cf526d8c8e76f17 There's a possible deadlock in generic_file_splice_write(), splice_from_pipe() and ocfs2_file_splice_write(): - task A calls generic_file_splice_write() - this calls inode_double_lock(), which locks i_mutex on both pipe->inode and target inode - ordering depends on inode pointers, can happen that pipe->inode is locked first - __splice_from_pipe() needs more data, calls pipe_wait() - this releases lock on pipe->inode, goes to interruptible sleep - task B calls generic_file_splice_write(), similarly to the first - this locks pipe->inode, then tries to lock inode, but that is already held by task A - task A is interrupted, it tries to lock pipe->inode, but fails, as it is already held by task B - ABBA deadlock Fix this by explicitly ordering locks: the outer lock must be on target inode and the inner lock (which is later unlocked and relocked) must be on pipe->inode. This is OK, pipe inodes and target inodes form two nonoverlapping sets, generic_file_splice_write() and friends are not called with a target which is a pipe. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27netfilter: {ip, ip6, arp}_tables: fix incorrect loop detectionPatrick McHardy
upstream commit: 1f9352ae2253a97b07b34dcf16ffa3b4ca12c558 Commit e1b4b9f ([NETFILTER]: {ip,ip6,arp}_tables: fix exponential worst-case search for loops) introduced a regression in the loop detection algorithm, causing sporadic incorrectly detected loops. When a chain has already been visited during the check, it is treated as having a standard target containing a RETURN verdict directly at the beginning in order to not check it again. The real target of the first rule is then incorrectly treated as STANDARD target and checked not to contain invalid verdicts. Fix by making sure the rule does actually contain a standard target. Based on patch by Francis Dupont <Francis_Dupont@isc.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27kprobes: Fix locking imbalance in kretprobesAnanth N Mavinakayanahalli
upstream commit: f02b8624fedca39886b0eef770dca70c2f0749b3 Fix locking imbalance in kretprobes: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- kthreadd/2 is trying to release lock (&rp->lock) at: [<c06b3080>] pre_handler_kretprobe+0xea/0xf4 but there are no more locks to release! other info that might help us debug this: 1 lock held by kthreadd/2: #0: (rcu_read_lock){..--}, at: [<c06b2b24>] __atomic_notifier_call_chain+0x0/0x5a stack backtrace: Pid: 2, comm: kthreadd Not tainted 2.6.29-rc8 #1 Call Trace: [<c06ae498>] ? printk+0xf/0x17 [<c06b3080>] ? pre_handler_kretprobe+0xea/0xf4 [<c044ce6c>] print_unlock_inbalance_bug+0xc3/0xce [<c0444d4b>] ? clocksource_read+0x7/0xa [<c04450a4>] ? getnstimeofday+0x5f/0xf6 [<c044a9ca>] ? register_lock_class+0x17/0x293 [<c044b72c>] ? mark_lock+0x1e/0x30b [<c0448956>] ? tick_dev_program_event+0x4a/0xbc [<c0498100>] ? __slab_alloc+0xa5/0x415 [<c06b2fbe>] ? pre_handler_kretprobe+0x28/0xf4 [<c06b3080>] ? pre_handler_kretprobe+0xea/0xf4 [<c044cf1b>] lock_release_non_nested+0xa4/0x1a5 [<c06b3080>] ? pre_handler_kretprobe+0xea/0xf4 [<c044d15d>] lock_release+0x141/0x166 [<c06b07dd>] _spin_unlock_irqrestore+0x19/0x50 [<c06b3080>] pre_handler_kretprobe+0xea/0xf4 [<c06b20b5>] kprobe_exceptions_notify+0x1c9/0x43e [<c06b2b02>] notifier_call_chain+0x26/0x48 [<c06b2b5b>] __atomic_notifier_call_chain+0x37/0x5a [<c06b2b24>] ? __atomic_notifier_call_chain+0x0/0x5a [<c06b2b8a>] atomic_notifier_call_chain+0xc/0xe [<c0442d0d>] notify_die+0x2d/0x2f [<c06b0f9c>] do_int3+0x1f/0x71 [<c06b0e84>] int3+0x2c/0x34 [<c042d476>] ? do_fork+0x1/0x288 [<c040221b>] ? kernel_thread+0x71/0x79 [<c043ed1b>] ? kthread+0x0/0x60 [<c043ed1b>] ? kthread+0x0/0x60 [<c04040b8>] ? kernel_thread_helper+0x0/0x10 [<c043ec7f>] kthreadd+0xac/0x148 [<c043ebd3>] ? kthreadd+0x0/0x148 [<c04040bf>] kernel_thread_helper+0x7/0x10 Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> [2.6.29.x, 2.6.28.x, 2.6.27.x] LKML-Reference: <20090318113621.GB4129@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27acer-wmi: Blacklist Acer Aspire OneCarlos Corbacho
upstream commit: a74dd5fdabcd34c93e17e9c7024eeb503c92b048 The Aspire One's ACPI-WMI interface is a placeholder that does nothing, and the invalid results that we get from it are now causing userspace problems as acer-wmi always returns that the rfkill is enabled (i.e. the radio is off, when it isn't). As it's hardware controlled, acer-wmi isn't needed on the Aspire One either. Thanks to Andy Whitcroft at Canonical for tracking down Ubuntu's userspace issues to this. Signed-off-by: Carlos Corbacho <carlos@strangeworlds.co.uk> Reported-by: Andy Whitcroft <apw@canonical.com> Cc: stable@kernel.org Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27crypto: shash - Fix unaligned calculation with short lengthYehuda Sadeh
upstream commit: f4f689933c63e0fbfba62f2a80efb2b424b139ae When the total length is shorter than the calculated number of unaligned bytes, the call to shash->update breaks. For example, calling crc32c on unaligned buffer with length of 1 can result in a system crash. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27net/netrom: Fix socket lockingJean Delvare
upstream commit: cc29c70dd581f85ee7a3e7980fb031f90b90a2ab Patch "af_rose/x25: Sanity check the maximum user frame size" (commit 83e0bbcbe2145f160fbaa109b0439dae7f4a38a9) from Alan Cox got locking wrong. If we bail out due to user frame size being too large, we must unlock the socket beforehand. Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27af_rose/x25: Sanity check the maximum user frame sizeAlan Cox
upstream commit: 83e0bbcbe2145f160fbaa109b0439dae7f4a38a9 CVE-2009-0795. Otherwise we can wrap the sizes and end up sending garbage. Closes #10423 Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27dm table: fix upgrade mode raceAlasdair G Kergon
upstream commit: 570b9d968bf9b16974252ef7cbce73fa6dac34f3 upgrade_mode() sets bdev to NULL temporarily, and does not have any locking to exclude anything from seeing that NULL. In dm_table_any_congested() bdev_get_queue() can dereference that NULL and cause a reported oops. Fix this by not changing that field during the mode upgrade. Cc: stable@kernel.org Cc: Neil Brown <neilb@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27dm: path selector use module refcount directlyJun'ichi Nomura
upstream commit: aea9058801c0acfa2831af1714da412dfb0018c2 Fix refcount corruption in dm-path-selector Refcounting with non-atomic ops under shared lock will corrupt the counter in multi-processor system and may trigger BUG_ON(). Use module refcount. # same approach as dm-target-use-module-refcount-directly.patch here # https://www.redhat.com/archives/dm-devel/2008-December/msg00075.html Typical oops: kernel BUG at linux-2.6.29-rc3/drivers/md/dm-path-selector.c:90! Pid: 11148, comm: dmsetup Not tainted 2.6.29-rc3-nm #1 dm_put_path_selector+0x4d/0x61 [dm_multipath] Call Trace: [<ffffffffa031d3f9>] free_priority_group+0x33/0xb3 [dm_multipath] [<ffffffffa031d4aa>] free_multipath+0x31/0x67 [dm_multipath] [<ffffffffa031d50d>] multipath_dtr+0x2d/0x32 [dm_multipath] [<ffffffffa015d6c2>] dm_table_destroy+0x64/0xd8 [dm_mod] [<ffffffffa015b73a>] __unbind+0x46/0x4b [dm_mod] [<ffffffffa015b79f>] dm_swap_table+0x60/0x14d [dm_mod] [<ffffffffa015f963>] dev_suspend+0xfd/0x177 [dm_mod] [<ffffffffa0160250>] dm_ctl_ioctl+0x24c/0x29c [dm_mod] [<ffffffff80288cd3>] ? get_page_from_freelist+0x49c/0x61d [<ffffffffa015f866>] ? dev_suspend+0x0/0x177 [dm_mod] [<ffffffff802bf05c>] vfs_ioctl+0x2a/0x77 [<ffffffff802bf4f1>] do_vfs_ioctl+0x448/0x4a0 [<ffffffff802bf5a0>] sys_ioctl+0x57/0x7a [<ffffffff8020c05b>] system_call_fastpath+0x16/0x1b Cc: stable@kernel.org Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27dm target: use module refcount directlyCheng Renquan
upstream commit: 5642b8a61a15436231adf27b2b1bd96901b623dd The tt_internal's 'use' field is superfluous: the module's refcount can do the work properly. An acceptable side-effect is that this increases the reference counts reported by 'lsmod'. Remove the superfluous test when removing a target module. [Crash possible without this on SMP - agk] Cc: stable@kernel.org Signed-off-by: Cheng Renquan <crquan@gmail.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Reviewed-by: Alasdair G Kergon <agk@redhat.com> Reviewed-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27dm snapshot: avoid having two exceptions for the same chunkMikulas Patocka
upstream commit: 35bf659b008e83e725dcd30f542e38461dbb867c We need to check if the exception was completed after dropping the lock. After regaining the lock, __find_pending_exception checks if the exception was already placed into &s->pending hash. But we don't check if the exception was already completed and placed into &s->complete hash. If the process waiting in alloc_pending_exception was delayed at this point because of a scheduling latency and the exception was meanwhile completed, we'd miss that and allocate another pending exception for already completed chunk. It would lead to a situation where two records for the same chunk exist and potential data corruption because multiple snapshot I/Os to the affected chunk could be redirected to different locations in the snapshot. Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27dm snapshot: avoid dropping lock in __find_pending_exceptionMikulas Patocka
upstream commit: c66213921c816f6b1b16a84911618ba9a363b134 It is uncommon and bug-prone to drop a lock in a function that is called with the lock held, so this is moved to the caller. Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27dm snapshot: refactor __find_pending_exceptionMikulas Patocka
upstream commit: 2913808eb56a6445a7b277eb8d17651c8defb035 Move looking-up of a pending exception from __find_pending_exception to another function. Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27dm io: make sync_io uninterruptibleMikulas Patocka
upstream commit: b64b6bf4fd8b678a9f8477c11773c38a0a246a6d If someone sends signal to a process performing synchronous dm-io call, the kernel may crash. The function sync_io attempts to exit with -EINTR if it has pending signal, however the structure "io" is allocated on stack, so already submitted io requests end up touching unallocated stack space and corrupting kernel memory. sync_io sets its state to TASK_UNINTERRUPTIBLE, so the signal can't break out of io_schedule() --- however, if the signal was pending before sync_io entered while (1) loop, the corruption of kernel memory will happen. There is no way to cancel in-progress IOs, so the best solution is to ignore signals at this point. Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27dm raid1: switch read_record from kmalloc to slab to save memoryMikulas Patocka
upstream commit: 95f8fac8dc6139fedfb87746e0c8fda9b803cb46 With my previous patch to save bi_io_vec, the size of dm_raid1_read_record is significantly increased (the vector list takes 3072 bytes on 32-bit machines and 4096 bytes on 64-bit machines). The structure dm_raid1_read_record used to be allocated with kmalloc, but kmalloc aligns the size on the next power-of-two so an object slightly greater than 4096 will allocate 8192 bytes of memory and half of that memory will be wasted. This patch turns kmalloc into a slab cache which doesn't have this padding so it will reduce the memory consumed. Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27vfs: skip I_CLEAR state inodesWu Fengguang
upstream commit: b6fac63cc1f52ec27f29fe6c6c8494a2ffac33fd clear_inode() will switch inode state from I_FREEING to I_CLEAR, and do so _outside_ of inode_lock. So any I_FREEING testing is incomplete without a coupled testing of I_CLEAR. So add I_CLEAR tests to drop_pagecache_sb(), generic_sync_sb_inodes() and add_dquot_ref(). Masayoshi MIZUMA discovered the bug in drop_pagecache_sb() and Jan Kara reminds fixing the other two cases. Masayoshi MIZUMA has a nice panic flow: ===================================================================== [process A] | [process B] | | | prune_icache() | drop_pagecache() | spin_lock(&inode_lock) | drop_pagecache_sb() | inode->i_state |= I_FREEING; | | | spin_unlock(&inode_lock) | V | | | spin_lock(&inode_lock) | V | | | dispose_list() | | | list_del() | | | clear_inode() | | | inode->i_state = I_CLEAR | | | | | V | | | if (inode->i_state & (I_FREEING|I_WILL_FREE)) | | | continue; <==== NOT MATCH | | | | | | (DANGER from here on! Accessing disposing inode!) | | | | | | __iget() | | | list_move() <===== PANIC on poisoned list !! V V | (time) ===================================================================== Reported-by: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [chrisw: backport to 2.6.29] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27dm: preserve bi_io_vec when resubmitting biosMikulas Patocka
upstream commit: a920f6b3accc77d9dddbc98a7426be23ee479625 Device mapper saves and restores various fields in the bio, but it doesn't save bi_io_vec. If the device driver modifies this after a partially successful request, dm-raid1 and dm-multipath may attempt to resubmit a bio that has bi_size inconsistent with the size of vector. To make requests resubmittable in dm-raid1 and dm-multipath, we must save and restore the bio vector as well. To reduce the memory overhead involved in this, we do not save the pages in a vector and use a 16-bit field size if the page size is less than 65536. Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27ixgbe: Fix potential memory leak/driver panic issue while setting up Tx & Rx ↵Mallikarjuna R Chilakala
ring parameters upstream commit: f9ed88549e2ec73922b788e3865282d221233662 While setting up the ring parameters using ethtool the driver can panic or leak memory as ixgbe_open tries to setup tx & rx resources. The updated logic will use ixgbe_down/up after successful allocation of tx & rx resources Signed-off-by: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com> Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27mm: do_xip_mapping_read: fix length calculationMartin Schwidefsky
upstream commit: 58984ce21d315b70df1a43644df7416ea7c9bfd8 The calculation of the value nr in do_xip_mapping_read is incorrect. If the copy required more than one iteration in the do while loop the copies variable will be non-zero. The maximum length that may be passed to the call to copy_to_user(buf+copied, xip_mem+offset, nr) is len-copied but the check only compares against (nr > len). This bug is the cause for the heap corruption Carsten has been chasing for so long: *** glibc detected *** /bin/bash: free(): invalid next size (normal): 0x00000000800e39f0 *** ======= Backtrace: ========= /lib64/libc.so.6[0x200000b9b44] /lib64/libc.so.6(cfree+0x8e)[0x200000bdade] /bin/bash(free_buffered_stream+0x32)[0x80050e4e] /bin/bash(close_buffered_stream+0x1c)[0x80050ea4] /bin/bash(unset_bash_input+0x2a)[0x8001c366] /bin/bash(make_child+0x1d4)[0x8004115c] /bin/bash[0x8002fc3c] /bin/bash(execute_command_internal+0x656)[0x8003048e] /bin/bash(execute_command+0x5e)[0x80031e1e] /bin/bash(execute_command_internal+0x79a)[0x800305d2] /bin/bash(execute_command+0x5e)[0x80031e1e] /bin/bash(reader_loop+0x270)[0x8001efe0] /bin/bash(main+0x1328)[0x8001e960] /lib64/libc.so.6(__libc_start_main+0x100)[0x200000592a8] /bin/bash(clearerr+0x5e)[0x8001c092] With this bug fix the commit 0e4a9b59282914fe057ab17027f55123964bc2e2 "ext2/xip: refuse to change xip flag during remount with busy inodes" can be removed again. Cc: Carsten Otte <cotte@de.ibm.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Jared Hulbert <jaredeh@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27mm: define a UNIQUE value for AS_UNEVICTABLE flagLee Schermerhorn
upstream commit: 9a896c9a48ac6704c0ce8ee081b836644d0afe40 A new "address_space flag"--AS_MM_ALL_LOCKS--was defined to use the next available AS flag while the Unevictable LRU was under development. The Unevictable LRU was using the same flag and "no one" noticed. Current mainline, since 2.6.28, has same value for two symbolic flag names. So, define a unique flag value for AS_UNEVICTABLE--up close to the other flags, [at the cost of an additional #ifdef] so we'll notice next time. Note that #ifdef is not actually required, if we don't mind having the unused flag value defined. Replace #defines with an enum. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: <stable@kernel.org> [2.6.28.x, 2.6.29.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27sysctl: fix suid_dumpable and lease-break-time sysctlsMatthew Wilcox
upstream commit: 8e654fba4a376f436bdfe361fc5cdbc87ac09b35 Arne de Bruijn points out that commit 76fdbb25f963de5dc1e308325f0578a2f92b1c2d ("coredump masking: bound suid_dumpable sysctl") mistakenly limits lease-break-time instead of suid_dumpable. Signed-off-by: Matthew Wilcox <matthew@wil.cx> Reported-by: Arne de Bruijn <kernelbt@arbruijn.dds.nl> Cc: Kawai, Hidehiro <hidehiro.kawai.ez@hitachi.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27cpumask: fix slab corruption caused by alloc_cpumask_var_node()Jack Steiner
upstream commit: 4f032ac4122a77dbabf7a24b2739b2790448180f Fix slab corruption caused by alloc_cpumask_var_node() overwriting the tail end of an off-stack cpumask. The function zeros out cpumask bits beyond the last possible cpu. The starting point for zeroing should be the beginning of the mask offset by a byte count derived from the number of possible cpus. The offset was calculated in bits instead of bytes. This resulted in overwriting the end of the cpumask. Signed-off-by: Jack Steiner <steiner@sgi.com> Acked-by: Mike Travis <travis.sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: <stable@kernel.org> [2.6.29.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27ide-atapi: start DMA after issuing a packet commandBorislav Petkov
upstream commit: 2eba08270990b99fb5429b76ee97184ddd272f7f Apparently¹, some ATAPI devices want to see the packet command first before enabling DMA otherwise they simply hang indefinitely. Reorder the two steps and start DMA only after having issued the command first. [1] http://marc.info/?l=linux-kernel&m=123835520317235&w=2 Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Reported-by: Michael Roth <mroth@nessie.de> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27ide: drivers/ide/ide-atapi.c needs <linux/scatterlist.h>Geert Uytterhoeven
upstream commit: 479edf065576aeed7ac99d10838bb3b4f870b5f9 On m68k: | drivers/ide/ide-atapi.c: In function 'ide_io_buffers': | drivers/ide/ide-atapi.c:87: error: implicit declaration of function 'sg_page' | drivers/ide/ide-atapi.c:87: warning: passing argument 1 of 'PageHighMem' makes pointer from integer without a cast | drivers/ide/ide-atapi.c:91: warning: passing argument 1 of 'kmap_atomic' makes pointer from integer without a cast | drivers/ide/ide-atapi.c:96: error: implicit declaration of function 'sg_virt' | drivers/ide/ide-atapi.c:96: warning: assignment makes pointer from integer without a cast | drivers/ide/ide-atapi.c:107: error: implicit declaration of function 'sg_next' | drivers/ide/ide-atapi.c:107: warning: assignment makes pointer from integer without a cast [bart: Dmitri Vorobiev submitted similar patch fixing MIPS] Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Dmitri Vorobiev <dmitri.vorobiev@movial.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27V4L/DVB (10943): cx88: Prevent general protection fault on rmmodJean Delvare
upstream commit: 569b7ec73abf576f9a9e4070d213aadf2cce73cb When unloading the cx8800 driver I sometimes get a general protection fault. Analysis revealed a race in cx88_ir_stop(). It can be solved by using a delayed work instead of a timer for infrared input polling. Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27r8169: Reset IntrStatus after chip resetFrancois Romieu
upstream commit: d78ad8cbfe73ad568de38814a75e9c92ad0a907c Original comment (Karsten): On a MSI MS-6702E mainboard, when in rtl8169_init_one() for the first time after BIOS has run, IntrStatus reads 5 after chip has been reset. IntrStatus should equal 0 there, so patch changes IntrStatus reset to happen after chip reset instead of before. Remark (Francois): Assuming that the loglevel of the driver is increased above NETIF_MSG_INTR, the bug reveals itself with a typical "interrupt 0025 in poll" message at startup. In retrospect, the message should had been read as an hint of an unexpected hardware state several months ago :o( Fixes (at least part of) https://bugzilla.redhat.com/show_bug.cgi?id=460747 Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Tested-by: Josep <josep.puigdemont@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27md/raid1 - don't assume newly allocated bvecs are initialised.NeilBrown
upstream commit: 303a0e11d0ee136ad8f53f747f3c377daece763b Since commit d3f761104b097738932afcc310fbbbbfb007ef92 newly allocated bvecs aren't initialised to NULL, so we have to be more careful about freeing a bio which only managed to get a few pages allocated to it. Otherwise the resync process crashes. This patch is appropriate for 2.6.29-stable. Cc: stable@kernel.org Cc: "Jens Axboe" <jens.axboe@oracle.com> Reported-by: Gabriele Tozzi <gabriele@tozzi.eu> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27SCSI: sg: fix iovec bugs introduced by the block layer conversionFUJITA Tomonori
upstream commit: 0fdf96b67ac2649cc1ddb29b316a0db11586c6a8 - needs to use copy_from_user for iovec before passing it to blk_rq_map_user_iov(). - before the block layer conversion, if ->dxfer_len and sum of iovec disagrees, the shorter one wins. However, currently sg returns -EINVAL. This restores the old behavior. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Cc: stable@kernel.org Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27drm/i915: fix TV mode setting in property changeZhenyu Wang
upstream commit: 7d6ff7851c23740c3813bdf457be638381774b69 Only set TV DAC in property change seems doesn't work, we have to setup whole crtc pipe which assigned to TV alone. Signed-off-by: Zhenyu Wang <zhenyu.z.wang@intel.com> [anholt: Note that this should also fix the oops at startup with new 2D] Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27drm/i915: only set TV mode when any property changedZhenyu Wang
upstream commit: ebcc8f2eade76946dbb5d5c545b91f8157051aa8 If there's no real property change, don't need to set TV mode again. Signed-off-by: Zhenyu Wang <zhenyu.z.wang@intel.com> [anholt: checkpatch.pl fix] Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27drm: Use pgprot_writecombine in GEM GTT mapping to get the right bits for !PAT.Jesse Barnes
upstream commit: 1055f9ddad093f54dfd708a0f976582034d4ce1a Otherwise, the PAGE_CACHE_WC would end up getting us a UC-only mapping, and the write performance of GTT maps dropped 10x. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> [anholt: cleaned up unused var] Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27drm/i915: check for -EINVAL from vm_insert_pfnJesse Barnes
upstream commit: 959b887cf42fd63cf10e28a7f26126f78aa1c0b0 Indicates something is wrong with the mapping; and apparently triggers in current kernels. Signed-off-by: Jesse Barnes <jbarnes@virtuosugeek.org> Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27drm/i915: Check for dev->primary->master before dereference.Chris Wilson
upstream commit: 98787c057fdefdce6230ff46f2c1105835005a4c I've hit the occasional oops inside i915_wait_ring() with an indication of a NULL derefence of dev->primary->master. Adding a NULL check is consistent with the other potential users of dev->primary->master. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27drm/i915: Sync crt hotplug detection with intel video driverZhao Yakui
upstream commit: 771cb081354161eea21534ba58e5cc1a2db94a25 This covers: Use long crt hotplug activation time on GM45. Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27drm/i915: Read the right SDVO register when detecting SVDO/HDMI.Kristian Høgsberg
upstream commit: 13520b051e8888dd3af9bda639d83e7df76613d1 This fixes incorrect detection of the second SDVO/HDMI output on G4X, and extra boot time on pre-G4X. Signed-off-by: Kristian Høgsberg <krh@redhat.com> Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27drm/i915: Change DCC tiling detection case to cover only mobile parts.Eric Anholt
upstream commit: 568d9a8f6d4bf81e0672c74573dc02981d31e3ea Later spec investigation has revealed that every 9xx mobile part has had this register in this format. Also, no non-mobile parts have been shown to have this register. So make all mobile use the same code, and all non-mobile use the hack 965 detection. Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27dock: fix dereference after kfree()Dan Carpenter
upstream commit: f240729832dff3785104d950dad2d3ced4387f6d dock_remove() calls kfree() on dock_station so we should use list_for_each_entry_safe() to avoid dereferencing freed memory. Found by smatch (http://repo.or.cz/w/smatch.git/). Compile tested. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27ACPI: cap off P-state transition latency from buggy BIOSesPallipadi, Venkatesh
upstream commit: a59d1637eb0e0a37ee0e5c92800c60abe3624e24 Some BIOSes report very high frequency transition latency which are plainly wrong on CPus that can change frequency using native MSR interface. One such system is IBM T42 (2327-8ZU) as reported by Owen Taylor and Rik van Riel. cpufreq_ondemand driver uses this transition latency to come up with a reasonable sampling interval to sample CPU usage and with such high latency value, ondemand sampling interval ends up being very high (0.5 sec, in this particular case), resulting in performance impact due to slow response to increasing frequency. Fix it by capping-off the transition latency to 20uS for native MSR based frequency transitions. mjg: We've confirmed that this also helps on the X31 Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Acked-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2009-04-27x86, setup: mark %esi as clobbered in E820 BIOS callMichael K. Johnson
upstream commit: 01522df346f846906eaf6ca57148641476209909 Jordan Hargrave diagnosed a BIOS clobbering %esi in the E820 call. That particular BIOS has been fixed, but there is a possibility that this is responsible for other occasional reports of early boot failure, and it does not hurt to add %esi to the clobbers. -stable candidate patch. Cc: Justin Forbes <jmforbes@linuxtx.org> Signed-off-by: Michael K Johnson <johnsonm@rpath.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: stable@kernel.org Signed-off-by: Chris Wright <chrisw@sous-sol.org>