linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2009-07-24	Linux 2.6.30.3v2.6.30.3	Greg Kroah-Hartman

2009-07-24	fbmon: work around compiler bug in gcc-4.2.4	Linus Torvalds
	commit 3730793d457fed79a6d49bae72996d458c8e4f2d upstream. There's some odd bug in gcc-4.2 where it miscompiles a simple loop whent he loop counter is of type 'unsigned char' and it should count to 128. The compiler will incorrectly decide that a trivial loop like this: unsigned char i, ... for (i = 0; i < 128; i++) { .. is endless, and will compile it to a single instruction that just branches to itself. This was triggered by the addition of '-fno-strict-overflow', and we could play games with compiler versions and go back to '-fwrapv' instead, but the trivial way to avoid it is to just make the loop induction variable be an 'int' instead. Thanks to Krzysztof Oledzki for reporting and testing and to Troy Moure for digging through assembler differences and finding it. Reported-and-tested-by: Krzysztof Oledzki <olel@ans.pl> Found-by: Troy Moure <twmoure@szypr.net> Gcc-bug-acked-by: Ian Lance Taylor <iant@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	Linux 2.6.30.2v2.6.30.2	Greg Kroah-Hartman

2009-07-19	Don't use '-fwrapv' compiler option: it's buggy in gcc-4.1.x	Linus Torvalds
	commit a137802ee839ace40079bebde24cfb416f73208a upstream. This causes kernel images that don't run init to completion with certain broken gcc versions. This fixes kernel bugzilla entry: http://bugzilla.kernel.org/show_bug.cgi?id=13012 I suspect the gcc problem is this: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28230 Fix the problem by using the -fno-strict-overflow flag instead, which not only does not exist in the known-to-be-broken versions of gcc (it was introduced later than fwrapv), but seems to be much less disturbing to gcc too: the difference in the generated code by -fno-strict-overflow are smaller (compared to using neither flag) than when using -fwrapv. Reported-by: Barry K. Nathan <barryn@pobox.com> Pushed-by: Frans Pop <elendil@planet.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	fuse: fix return value of fuse_dev_write()	Csaba Henk
	commit b4c458b3a23d76936e76678f2074b1528f129f7a upstream. On 64 bit systems -- where sizeof(ssize_t) > sizeof(int) -- the following test exposes a bug due to a non-careful return of an int or unsigned value: implement a FUSE filesystem which sends an unsolicited notification to the kernel with invalid opcode. The respective write to /dev/fuse will return (1 << 32) - EINVAL with errno == 0 instead of -1 with errno == EINVAL. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	fuse: fix bad return value in fuse_file_poll()	Miklos Szeredi
	commit 201fa69a2849536ef2912e8e971ec0b01c04eff4 upstream. Fix fuse_file_poll() which returned a -errno value instead of a poll mask. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	Fix iommu address space allocation	David Woodhouse
	commit a15a519ed6e5e644f5a33c213c00b0c1d3cfe683 upstream. This fixes kernel.org bug #13584. The IOVA code attempted to optimise the insertion of new ranges into the rbtree, with the unfortunate result that some ranges just didn't get inserted into the tree at all. Then those ranges would be handed out more than once, and things kind of go downhill from there. Introduced after 2.6.25 by ddf02886cbe665d67ca750750196ea5bf524b10b ("PCI: iova RB tree setup tweak"). Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: mark gross <mgross@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	Fix pci_unmap_addr() et al on i386.	David Woodhouse
	commit 788d84bba47ea3eb377f7a3ae4fd1ee84b84877b upstream. We can run a 32-bit kernel on boxes with an IOMMU, so we need pci_unmap_addr() etc. to work -- without it, drivers will leak mappings. To be honest, this whole thing looks like it's more pain than it's worth; I'm half inclined to remove the no-op #else case altogether. But this is the minimal fix, which just does the right thing if CONFIG_DMAR is set. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	floppy: fix lock imbalance	Jiri Slaby
	commit 8516a500029890a72622d245f8ed32c4e30969b7 upstream. A crappy macro prevents us unlocking on a fail path. Expand the macro and unlock appropriatelly. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	Revert "ipv4: arp announce, arp_proxy and windows ip conflict verification"	Eric W. Biederman
	commit f8a68e752bc4e39644843403168137663c984524 upstream. This reverts commit 73ce7b01b4496a5fbf9caf63033c874be692333f. After discovering that we don't listen to gratuitious arps in 2.6.30 I tracked the failure down to this commit. The patch makes absolutely no sense. RFC2131 RFC3927 and RFC5227. are all in agreement that an arp request with sip == 0 should be used for the probe (to prevent learning) and an arp request with sip == tip should be used for the gratitous announcement that people can learn from. It appears the author of the broken patch got those two cases confused and modified the code to drop all gratuitous arp traffic. Ouch! Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	md: avoid dereferencing NULL pointer when accessing suspend_* sysfs attributes.	NeilBrown
	commit b8d966efd9a46a9a35beac50cbff6e30565125ef upstream. If we try to modify one of the md/ sysfs files suspend_lo or suspend_hi when the array is not active, we dereference a NULL. Protect against that. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	md: fix error path when duplicate name is found on md device creation.	NeilBrown
	commit 1ec22eb2b4a2e1a763106bce36b11c02eaa84e61 upstream. When an md device is created by name (rather than number) we need to check that the name is not already in use. If this check finds a duplicate, we return an error without dropping the lock or freeing the newly create mddev. This patch fixes that. Found-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	md/raid5: suspend shouldn't affect read requests.	NeilBrown
	commit a5c308d4d1659b1f4833b863394e3e24cdbdfc6e upstream. md allows write to regions on an array to be suspended temporarily. This allows user-space to participate is aspects of reshape. In particular, data can be copied with not risk of a race. We should not be blocking read requests though, so don't. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	blocK: Restore barrier support for md and probably other virtual devices.	NeilBrown
	commit db64f680ba4b5c56c4be59f0698000df89ff0281 upstream. The next_ordered flag is only meaningful for devices that use __make_request. So move the test against next_ordered out of generic code and in to __make_request Since this test was added, barriers have not worked on md or any devices that don't use __make_request and so don't bother to set next_ordered. (dm explicitly sets something other than QUEUE_ORDERED_NONE since commit 99360b4c18f7675b50d283301d46d755affe75fd but notes in the comments that it is otherwise meaningless). Cc: Ken Milmore <ken.milmore@googlemail.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	dma-debug: fix off-by-one error in overlap function	Joerg Roedel
	commit c79ee4e466dd12347f112e2af306dca35198458f upstream. This patch fixes a bug in the overlap function which returned true if one region ends exactly before the second region begins. This is no overlap but the function returned true in that case. Reported-by: Andrew Randrianasulu <randrik@mail.ru> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	alpha: fix percpu build breakage	Tejun Heo
	commit b01e8dc34379f4ba2f454390e340a025edbaaa7e upstream. alpha percpu access requires custom SHIFT_PERCPU_PTR() definition for modules to work around addressing range limitation. This is done via generating inline assembly using C preprocessing which forces the assembler to generate external reference. This happens behind the compiler's back and makes the compiler think that static percpu variables in modules are unused. This used to be worked around by using __unused attribute for percpu variables which prevent the compiler from omitting the variable; however, recent declare/definition attribute unification change broke this as __used can't be used for declaration. Also, in the process, PER_CPU_ATTRIBUTES definition in alpha percpu.h got broken. This patch adds PER_CPU_DEF_ATTRIBUTES which is only used for definitions and make alpha use it to add __used for percpu variables in modules. This also fixes the PER_CPU_ATTRIBUTES double definition bug. Signed-off-by: Tejun Heo <tj@kernel.org> Tested-by: maximilian attems <max@stro.at> Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	kernel/resource.c: fix sign extension in reserve_setup()	Zhang Rui
	commit 8bc1ad7dd301b7ca7454013519fa92e8c53655ff upstream. When the 32-bit signed quantities get assigned to the u64 resource_size_t, they are incorrectly sign-extended. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13253 Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9905 Signed-off-by: Zhang Rui <rui.zhang@intel.com> Reported-by: Leann Ogasawara <leann@ubuntu.com> Cc: Pierre Ossman <drzeus@drzeus.cx> Reported-by: <pablomme@googlemail.com> Tested-by: <pablomme@googlemail.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	futexes: Fix infinite loop in get_futex_key() on huge page	Sonny Rao
	commit ce2ae53b750abfaa012ce408e93da131a5b5649b upstream. get_futex_key() can infinitely loop if it is called on a virtual address that is within a huge page but not aligned to the beginning of that page. The call to get_user_pages_fast will return the struct page for a sub-page within the huge page and the check for page->mapping will always fail. The fix is to call compound_head on the page before checking that it's mapped. Signed-off-by: Sonny Rao <sonnyrao@us.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: anton@samba.org Cc: rajamony@us.ibm.com Cc: speight@us.ibm.com Cc: mstephen@us.ibm.com Cc: grimm@us.ibm.com Cc: mikey@ozlabs.au.ibm.com LKML-Reference: <20090710231313.GA23572@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	futex: Fix the write access fault problem for real	Thomas Gleixner
	commit d0725992c8a6fb63a16bc9e8b2a50094cc4db3cd and aa715284b4d28cabde6c25c568d769a6be712bc8 upstream commit 64d1304a64 (futex: setup writeable mapping for futex ops which modify user space data) did address only half of the problem of write access faults. The patch was made on two wrong assumptions: 1) access_ok(VERIFY_WRITE,...) would actually check write access. On x86 it does _NOT_. It's a pure address range check. 2) a RW mapped region can not go away under us. That's wrong as well. Nobody can prevent another thread to call mprotect(PROT_READ) on that region where the futex resides. If that call hits between the get_user_pages_fast() verification and the actual write access in the atomic region we are toast again. The solution is to not rely on access_ok and get_user() for any write access related fault on private and shared futexes. Instead we need to fault it in with verification of write access. There is no generic non destructive write mechanism which would fault the user page in trough a #PF, but as we already know that we will fault we can as well call get_user_pages() directly and avoid the #PF overhead. If get_user_pages() returns -EFAULT we know that we can not fix it anymore and need to bail out to user space. Remove a bunch of confusing comments on this issue as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	Blackfin: fix command line corruption with DEBUG_DOUBLEFAULT	Mike Frysinger
	commit 37082511f06108129bd5f96d625a6fae2d5a4ab4 upstream. Commit 6b3087c6 (which introduced Blackfin SMP) broke command line passing when the DEBUG_DOUBLEFAULT config option was enabled. Switch the code to using a scratch register and not R7 which holds the command line. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	Blackfin: fix deadlock in SMP IPI handler	Sonic Zhang
	commit 86f2008bf546af9a434f480710e8d33891616bf5 upstream. When a low priority interrupt (like ethernet) is triggered between 2 high priority IPI messages, a deadlock in disable_irq() is hit by the second IPI handler. This is because the second IPI message is queued within the first IPI handler, but the handler doesn't process all messages, and new ones are inserted rather than appended. So now we process all the pending messages, and append new ones to the pending list. URL: http://blackfin.uclinux.org/gf/tracker/5226 Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	Blackfin: redo handling of bad irqs	Mike Frysinger
	commit 26579216f3cdf1ae05f0af8412b444870a167510 upstream. With the common IRQ code initializing much more of the irq_desc state, we can't blindly initialize it ourselves to the local bad_irq state. If we do, we end up wrongly clobbering many fields. So punt most of the bad irq code as the common layers will handle the default state, and simply call handle_bad_irq() directly when the IRQ we are processing is invalid. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	Blackfin: fix accidental reset in some boot modes	Sonic Zhang
	commit 0de4adfb8c9674fa1572b0ff1371acc94b0be901 upstream. We read the SWRST (Software Reset) register to get at the last reset state, and then we may configure the DOUBLE_FAULT bit to control behavior when a double fault occurs. But if the lower bits of the register is already set (like UART boot mode on a BF54x), we inadvertently make the system reset by writing to the SYSTEM_RESET field at the same time. So make sure the lower 4 bits are always cleared. Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	personality: fix PER_CLEAR_ON_SETID (CVE-2009-1895)	Julien Tinnes
	commit f9fabcb58a6d26d6efde842d1703ac7cfa9427b6 upstream. We have found that the current PER_CLEAR_ON_SETID mask on Linux doesn't include neither ADDR_COMPAT_LAYOUT, nor MMAP_PAGE_ZERO. The current mask is READ_IMPLIES_EXEC\|ADDR_NO_RANDOMIZE. We believe it is important to add MMAP_PAGE_ZERO, because by using this personality it is possible to have the first page mapped inside a process running as setuid root. This could be used in those scenarios: - Exploiting a NULL pointer dereference issue in a setuid root binary - Bypassing the mmap_min_addr restrictions of the Linux kernel: by running a setuid binary that would drop privileges before giving us control back (for instance by loading a user-supplied library), we could get the first page mapped in a process we control. By further using mremap and mprotect on this mapping, we can then completely bypass the mmap_min_addr restrictions. Less importantly, we believe ADDR_COMPAT_LAYOUT should also be added since on x86 32bits it will in practice disable most of the address space layout randomization (only the stack will remain randomized). Signed-off-by: Julien Tinnes <jt@cr0.org> Signed-off-by: Tavis Ormandy <taviso@sdf.lonestar.org> Acked-by: Christoph Hellwig <hch@infradead.org> Acked-by: Kees Cook <kees@ubuntu.com> Acked-by: Eugene Teo <eugene@redhat.com> [ Shortened lines and fixed whitespace as per Christophs' suggestion ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	tun/tap: Fix crashes if open() /dev/net/tun and then poll() it. (CVE-2009-1897)	Mariusz Kozlowski
	commit 3c8a9c63d5fd738c261bd0ceece04d9c8357ca13 upstream. Fix NULL pointer dereference in tun_chr_pool() introduced by commit 33dccbb050bbe35b88ca8cf1228dcf3e4d4b3554 ("tun: Limit amount of queued packets per device") and triggered by this code: int fd; struct pollfd pfd; fd = open("/dev/net/tun", O_RDWR); pfd.fd = fd; pfd.events = POLLIN \| POLLOUT; poll(&pfd, 1, 0); Reported-by: Eugene Kapun <abacabadabacaba@gmail.com> Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	security: use mmap_min_addr indepedently of security models	Christoph Lameter
	commit e0a94c2a63f2644826069044649669b5e7ca75d3 upstream. This patch removes the dependency of mmap_min_addr on CONFIG_SECURITY. It also sets a default mmap_min_addr of 4096. mmapping of addresses below 4096 will only be possible for processes with CAP_SYS_RAWIO. Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Eric Paris <eparis@redhat.com> Looks-ok-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	Add '-fno-delete-null-pointer-checks' to gcc CFLAGS	Eugene Teo
	commit a3ca86aea507904148870946d599e07a340b39bf upstream. Turning on this flag could prevent the compiler from optimising away some "useless" checks for null pointers. Such bugs can sometimes become exploitable at compile time because of the -O2 optimisation. See http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Optimize-Options.html An example that clearly shows this 'problem' is commit 6bf67672. static void __devexit agnx_pci_remove(struct pci_dev pdev) { struct ieee80211_hw dev = pci_get_drvdata(pdev); - struct agnx_priv priv = dev->priv; + struct agnx_priv priv; AGNX_TRACE; if (!dev) return; + priv = dev->priv; By reverting this patch, and compile it with and without -fno-delete-null-pointer-checks flag, we can see that the check for dev is compiled away. call printk # - testq %r12, %r12 # dev - je .L94 #, movq %r12, %rdi # dev, Clearly the 'fix' is to stop using dev before it is tested, but building with -fno-delete-null-pointer-checks flag at least makes it harder to abuse. Signed-off-by: Eugene Teo <eugeneteo@kernel.sg> Acked-by: Eric Paris <eparis@redhat.com> Acked-by: Wang Cong <amwang@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	Linux 2.6.30.1v2.6.30.1	Greg Kroah-Hartman

2009-07-02	bsdacct: fix access to invalid filp in acct_on()	Renaud Lottiaux
	commit df279ca8966c3de83105428e3391ab17690802a9 upstream. The file opened in acct_on and freshly stored in the ns->bacct struct can be closed in acct_file_reopen by a concurrent call after we release acct_lock and before we call mntput(file->f_path.mnt). Record file->f_path.mnt in a local variable and use this variable only. Signed-off-by: Renaud Lottiaux <renaud.lottiaux@kerlabs.com> Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	xfs: fix freeing memory in xfs_getbmap()	Felix Blyakher
	commit 7747a0b0af5976ba3828796b4f7a7adc3bb76dbd upstream. Regression from commit 28e211700a81b0a934b6c7a4b8e7dda843634d2f. Need to free temporary buffer allocated in xfs_getbmap(). Signed-off-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Hedi Berriche <hedi@sgi.com> Reported-by: Justin Piszcz <jpiszcz@lucidpixels.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	KVM: x86: silence preempt warning on kvm_write_guest_time	Matt T. Yourst
	commit 2dea4c84bc936731668b5a7a9fba5b436a422668 upstream. This issue just appeared in kvm-84 when running on 2.6.28.7 (x86-64) with PREEMPT enabled. We're getting syslog warnings like this many (but not all) times qemu tells KVM to run the VCPU: BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-x86/28938 caller is kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm] Pid: 28938, comm: qemu-system-x86 2.6.28.7-mtyrel-64bit Call Trace: debug_smp_processor_id+0xf7/0x100 kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm] ? __wake_up+0x4e/0x70 ? wake_futex+0x27/0x40 kvm_vcpu_ioctl+0x2e9/0x5a0 [kvm] enqueue_hrtimer+0x8a/0x110 _spin_unlock_irqrestore+0x27/0x50 vfs_ioctl+0x31/0xa0 do_vfs_ioctl+0x74/0x480 sys_futex+0xb4/0x140 sys_ioctl+0x99/0xa0 system_call_fastpath+0x16/0x1b As it turns out, the call trace is messed up due to gcc's inlining, but I isolated the problem anyway: kvm_write_guest_time() is being used in a non-thread-safe manner on preemptable kernels. Basically kvm_write_guest_time()'s body needs to be surrounded by preempt_disable() and preempt_enable(), since the kernel won't let us query any per-CPU data (indirectly using smp_processor_id()) without preemption disabled. The attached patch fixes this issue by disabling preemption inside kvm_write_guest_time(). [marcelo: surround only __get_cpu_var calls since the warning is harmless] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	drm/i915: correct suspend/resume ordering	Jesse Barnes
	commit 9e06dd39f2b6d7e35981e0d7aded618686b32ccb upstream. We need to save register state after idling GEM, clearing the ring, and uninstalling the IRQ handler, or we might end up saving bogus fence regs, for one. Our restore ordering should already be correct, since we do GEM, ring and IRQ init after restoring the last register state, which prevents us from clobbering things. I put this together to potentially address a bug, but I haven't heard back if it fixes it yet. However I think it stands on its own, so I'm sending it in. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Eric Anholt <eric@anholt.net> Cc: Jie Luo <clotho67@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	ide-cd: prevent null pointer deref via cdrom_newpc_intr	Rainer Weikusat
	commit 39c58f37a10198054c656c28202fb1e6d22fd505 upstream. With 2.6.30, the error handling code in cdrom_newpc_intr was changed to deal with partial request failures by normally completing the 'good' parts of a request and only 'error' the last (and presumably, incompletely transferred) bio associated with a particular request. In order to do this, ide_complete_rq is called over ide_cd_error_cmd() to partially complete the rq. The block layer does partial completion only for requests with bio's and if the rq doesn't have one (eg 'GPCMD_READ_DISC_INFO') the request is completed as a whole and the drive->hwif->rq pointer set to NULL afterwards. When calling ide_complete_rq again to report the error, this null pointer is derefenced, resulting in a kernel crash. This fixes http://bugzilla.kernel.org/show_bug.cgi?id=13399. Signed-off-by: Rainer Weikusat <rweikusat@mssgmbh.com> Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	ocfs2: Fix ocfs2_osb_dump()	Sunil Mushran
	commit c3d38840abaa45c1c5a5fabbb8ffc9a0d1a764d1 upstream. Skip printing information that is not valid for local mounts. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	serial: bfin_5xx: fix building as module when early printk is enabled	Mike Frysinger
	commit 607c268ef9a4675287e77f732071e426e62c2d86 upstream. Since early printk only makes sense/works when the serial driver is built into the kernel, disable the option for this driver when it is going to be built as a module. Otherwise we get build failures due to the ifdef handling. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	CONFIG_FILE_LOCKING should not depend on CONFIG_BLOCK	Tomas Szepe
	commit 69050eee8e08a6234f29fe71a56f8c7c7d4d7186 upstream. CONFIG_FILE_LOCKING should not depend on CONFIG_BLOCK. This makes it possible to run complete systems out of a CONFIG_BLOCK=n initramfs on current kernels again (this last worked on 2.6.27.*). Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	lib/genalloc.c: remove unmatched write_lock() in gen_pool_destroy	Zygo Blaxell
	commit 8e8a2dea0ca91fe2cb7de7ea212124cfe8c82c35 upstream. There is a call to write_lock() in gen_pool_destroy which is not balanced by any corresponding write_unlock(). This causes problems with preemption because the preemption-disable counter is incremented in the write_lock() call, but never decremented by any call to write_unlock(). This bug is gen_pool_destroy, and one of them is non-x86 arch-specific code. Signed-off-by: Zygo Blaxell <zygo.blaxell@xandros.com> Cc: Jiri Kosina <trivial@kernel.org> Cc: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	vmscan: count the number of times zone_reclaim() scans and fails	Mel Gorman
	commit 24cf72518c79cdcda486ed26074ff8151291cf65 upstream. On NUMA machines, the administrator can configure zone_reclaim_mode that is a more targetted form of direct reclaim. On machines with large NUMA distances for example, a zone_reclaim_mode defaults to 1 meaning that clean unmapped pages will be reclaimed if the zone watermarks are not being met. There is a heuristic that determines if the scan is worthwhile but it is possible that the heuristic will fail and the CPU gets tied up scanning uselessly. Detecting the situation requires some guesswork and experimentation so this patch adds a counter "zreclaim_failed" to /proc/vmstat. If during high CPU utilisation this counter is increasing rapidly, then the resolution to the problem may be to set /proc/sys/vm/zone_reclaim_mode to 0. [akpm@linux-foundation.org: name things consistently] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	vmscan: properly account for the number of page cache pages zone_reclaim() ↵	Mel Gorman
	can reclaim commit 90afa5de6f3fa89a733861e843377302479fcf7e upstream. A bug was brought to my attention against a distro kernel but it affects mainline and I believe problems like this have been reported in various guises on the mailing lists although I don't have specific examples at the moment. The reported problem was that malloc() stalled for a long time (minutes in some cases) if a large tmpfs mount was occupying a large percentage of memory overall. The pages did not get cleaned or reclaimed by zone_reclaim() because the zone_reclaim_mode was unsuitable, but the lists are uselessly scanned frequencly making the CPU spin at near 100%. This patchset intends to address that bug and bring the behaviour of zone_reclaim() more in line with expectations which were noticed during investigation. It is based on top of mmotm and takes advantage of Kosaki's work with respect to zone_reclaim(). Patch 1 fixes the heuristics that zone_reclaim() uses to determine if the scan should go ahead. The broken heuristic is what was causing the malloc() stall as it uselessly scanned the LRU constantly. Currently, zone_reclaim is assuming zone_reclaim_mode is 1 and historically it could not deal with tmpfs pages at all. This fixes up the heuristic so that an unnecessary scan is more likely to be correctly avoided. Patch 2 notes that zone_reclaim() returning a failure automatically means the zone is marked full. This is not always true. It could have failed because the GFP mask or zone_reclaim_mode were unsuitable. Patch 3 introduces a counter zreclaim_failed that will increment each time the zone_reclaim scan-avoidance heuristics fail. If that counter is rapidly increasing, then zone_reclaim_mode should be set to 0 as a temporarily resolution and a bug reported because the scan-avoidance heuristic is still broken. This patch: On NUMA machines, the administrator can configure zone_reclaim_mode that is a more targetted form of direct reclaim. On machines with large NUMA distances for example, a zone_reclaim_mode defaults to 1 meaning that clean unmapped pages will be reclaimed if the zone watermarks are not being met. There is a heuristic that determines if the scan is worthwhile but the problem is that the heuristic is not being properly applied and is basically assuming zone_reclaim_mode is 1 if it is enabled. The lack of proper detection can manfiest as high CPU usage as the LRU list is scanned uselessly. Historically, once enabled it was depending on NR_FILE_PAGES which may include swapcache pages that the reclaim_mode cannot deal with. Patch vmscan-change-the-number-of-the-unmapped-files-in-zone-reclaim.patch by Kosaki Motohiro noted that zone_page_state(zone, NR_FILE_PAGES) included pages that were not file-backed such as swapcache and made a calculation based on the inactive, active and mapped files. This is far superior when zone_reclaim==1 but if RECLAIM_SWAP is set, then NR_FILE_PAGES is a reasonable starting figure. This patch alters how zone_reclaim() works out how many pages it might be able to reclaim given the current reclaim_mode. If RECLAIM_SWAP is set in the reclaim_mode it will either consider NR_FILE_PAGES as potential candidates or else use NR_{IN}ACTIVE}_PAGES-NR_FILE_MAPPED to discount swapcache and other non-file-backed pages. If RECLAIM_WRITE is not set, then NR_FILE_DIRTY number of pages are not candidates. If RECLAIM_SWAP is not set, then NR_FILE_MAPPED are not. [kosaki.motohiro@jp.fujitsu.com: Estimate unmapped pages minus tmpfs pages] [fengguang.wu@intel.com: Fix underflow problem in Kosaki's estimate] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	dm: use i_size_read	Mikulas Patocka
	commit 5657e8fa45cf230df278040c420fb80e06309d8f upstream. Use i_size_read() instead of reading i_size. If someone changes the size of the device simultaneously, i_size_read is guaranteed to return a valid value (either the old one or the new one). i_size can return some intermediate invalid value (on 32-bit computers with 64-bit i_size, the reads to both halves of i_size can be interleaved with updates to i_size, resulting in garbage being returned). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	dm exception store: really fix type lookup	Milan Broz
	commit 874d2f61d31e596c36af7732dc1b3aa2dc233824 upstream. Fix exception store name handling. We need to reference exception store by zero terminated string. Fixes regression introduced in commit f6bd4eb73cdf2a5bf954e497972842f39cabb7e3 Cc: Yi Yang <yi.y.yang@intel.com> Cc: Jonathan Brassow <jbrassow@redhat.com> Cc: stable@kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	dm exception store: fix exstore lookup to be case insensitive	Jonathan Brassow
	commit f6bd4eb73cdf2a5bf954e497972842f39cabb7e3 upstream. When snapshots are created using 'p' instead of 'P' as the exception store type, the device-mapper table loading fails. This patch makes the code case insensitive as intended and fixes some regressions reported with device-mapper snapshots. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	dm mpath: flush keventd queue in destructor	Mikulas Patocka
	commit 53b351f972a882ea8b6cdb19602535f1057c884a upstream. The commit fe9cf30eb8186ef267d1868dc9f12f2d0f40835a moves dm table event submission from kmultipath queue to kernel kevent queue to avoid a deadlock. There is a possibility of race condition because kevent queue is not flushed in the multipath destructor. The scenario is: - some event happens and is queued to keventd - keventd thread is delayed due to scheuling latency or some other work - multipath device is destroyed - keventd now attempts to process work_struct that is residing in already released memory. The patch flushes the keventd queue in multipath constructor. I've already fixed similar bug in dm-raid1. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	dm: sysfs skip output when device is being destroyed	Milan Broz
	commit 4d89b7b4e4726893453d0fb4ddbb5b3e16353994 upstream. Do not process sysfs attributes when device is being destroyed. Otherwise code can cause BUG_ON(test_bit(DMF_FREEING, &md->flags)); in dm_put() call. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	dm mpath: validate table argument count	Mikulas Patocka
	commit 0e0497c0c017664994819f4602dc07fd95896c52 upstream. The parser reads the argument count as a number but doesn't check that sufficient arguments are supplied. This command triggers the bug: dmsetup create mpath --table "0 `blockdev --getsize /dev/mapper/cr0` multipath 0 0 2 1 round-robin 1000 0 1 1 /dev/mapper/cr0 round-robin 0 1 1 /dev/mapper/cr1 1000" kernel BUG at drivers/md/dm-mpath.c:530! Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	dm mpath: validate hw_handler argument count	Mikulas Patocka
	commit e094f4f15f5169526c7200b9bde44b900548a81e upstream. Fix arg count parsing error in hw handlers. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	mm: fix handling of pagesets for downed cpus	Dimitri Sivanich
	commit 364df0ebfbbb1330bfc6ca159f4d6020efc15a12 upstream. After downing/upping a cpu, an attempt to set /proc/sys/vm/percpu_pagelist_fraction results in an oops in percpu_pagelist_fraction_sysctl_handler(). If a processor is downed then we need to set the pageset pointer back to the boot pageset. Updates of the high water marks should not access pagesets of unpopulated zones (those pointer go to the boot pagesets which would be no longer functional if their size would be increased beyond zero). Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	qla2xxx: Correct (again) overflow during dump-processing on large-memory ↵	Andrew Vasquez
	ISP23xx parts. commit e18e963b7e533149676b5d387d0a56160df9f111 upstream. Commit 7b867cf76fbcc8d77867cbec6f509f71dce8a98f ([SCSI] qla2xxx: Refactor qla data structures) inadvertently reverted e792121ec85672c1fa48f79d13986a3f4f56c590 ([SCSI] qla2xxx: Correct overflow during dump-processing on large-memory ISP23xx parts.). Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	PCI PM: Follow PCI_PM_CTRL_NO_SOFT_RESET during transitions from D3	Rafael J. Wysocki
	commit f62795f1e892ca9269849fa83de97621da7e02c0 upstream. According to the PCI PM specification (PCI Bus Power Management Interface Specification, Rev. 1.2, Section 5.4.1) we are supposed to reinitialize devices that have PCI_PM_CTRL_NO_SOFT_RESET clear during all transitions from PCI_D3hot to PCI_D0, but we only do it if the device's current_state field is equal to PCI_UNKNOWN. This may lead to problems if a device with PCI_PM_CTRL_NO_SOFT_RESET unset is put into PCI_D3hot at run time by its driver and pci_set_power_state() is used to put it back into PCI_D0, because in that case the device will remain uninitialized after pci_set_power_state() has returned. Prevent that from happening by modifying pci_raw_set_power_state() to reinitialize devices with PCI_PM_CTRL_NO_SOFT_RESET unset during all transitions from D3 to D0. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	PCI PM: Fix handling of devices without PM support by pci_target_state()	Rafael J. Wysocki
	commit d2abdf62882d982c58e7a6b09ecdcfcc28075e2e upstream. If a PCI device is not power-manageable either by the platform, or with the help of the native PCI PM interface, pci_target_state() will return either PCI_D3hot, or PCI_POWER_ERROR for it, depending on whether or not the device is configured to wake up the system. Alas, none of these return values is correct, because each of them causes pci_prepare_to_sleep() to return error code, although it should complete successfully in such a case. Fix this problem by making pci_target_state() always return PCI_D0 for devices that cannot be power managed. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>