aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2009-10-05Linux 2.6.30.9v2.6.30.9Greg Kroah-Hartman
2009-10-05mmap: avoid unnecessary anon_vma lock acquisition in vma_adjust()Lee Schermerhorn
commit 252c5f94d944487e9f50ece7942b0fbf659c5c31 upstream. We noticed very erratic behavior [throughput] with the AIM7 shared workload running on recent distro [SLES11] and mainline kernels on an 8-socket, 32-core, 256GB x86_64 platform. On the SLES11 kernel [2.6.27.19+] with Barcelona processors, as we increased the load [10s of thousands of tasks], the throughput would vary between two "plateaus"--one at ~65K jobs per minute and one at ~130K jpm. The simple patch below causes the results to smooth out at the ~130k plateau. But wait, there's more: We do not see this behavior on smaller platforms--e.g., 4 socket/8 core. This could be the result of the larger number of cpus on the larger platform--a scalability issue--or it could be the result of the larger number of interconnect "hops" between some nodes in this platform and how the tasks for a given load end up distributed over the nodes' cpus and memories--a stochastic NUMA effect. The variability in the results are less pronounced [on the same platform] with Shanghai processors and with mainline kernels. With 31-rc6 on Shanghai processors and 288 file systems on 288 fibre attached storage volumes, the curves [jpm vs load] are both quite flat with the patched kernel consistently producing ~3.9% better throughput [~80K jpm vs ~77K jpm] than the unpatched kernel. Profiling indicated that the "slow" runs were incurring high[er] contention on an anon_vma lock in vma_adjust(), apparently called from the sbrk() system call. The patch: A comment in mm/mmap.c:vma_adjust() suggests that we don't really need the anon_vma lock when we're only adjusting the end of a vma, as is the case for brk(). The comment questions whether it's worth while to optimize for this case. Apparently, on the newer, larger x86_64 platforms, with interesting NUMA topologies, it is worth while--especially considering that the patch [if correct!] is quite simple. We can detect this condition--no overlap with next vma--by noting a NULL "importer". The anon_vma pointer will also be NULL in this case, so simply avoid loading vma->anon_vma to avoid the lock. However, we DO need to take the anon_vma lock when we're inserting a vma ['insert' non-NULL] even when we have no overlap [NULL "importer"], so we need to check for 'insert', as well. And Hugh points out that we should also take it when adjusting vm_start (so that rmap.c can rely upon vma_address() while it holds the anon_vma lock). akpm: Zhang Yanmin reprts a 150% throughput improvement with aim7, so it might be -stable material even though thiss isn't a regression: "this issue is not clear on dual socket Nehalem machine (2*4*2 cpu), but is severe on large machine (4*8*2 cpu)" [hugh.dickins@tiscali.co.uk: test vma start too] Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Nick Piggin <npiggin@suse.de> Cc: Eric Whitney <eric.whitney@hp.com> Tested-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05mm: fix anonymous dirtyingHugh Dickins
commit 1ac0cb5d0e22d5e483f56b2bc12172dec1cf7536 upstream. do_anonymous_page() has been wrong to dirty the pte regardless. If it's not going to mark the pte writable, then it won't help to mark it dirty here, and clogs up memory with pages which will need swap instead of being thrown away. Especially wrong if no overcommit is chosen, and this vma is not yet VM_ACCOUNTed - we could exceed the limit and OOM despite no overcommit. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Acked-by: Rik van Riel <riel@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05thinkpad-acpi: fix incorrect use of TPACPI_BRGHT_MODE_ECNVRAMHenrique de Moraes Holschuh
HBRV-based default selection of backlight control strategy didn't work well, at least the X41 defines it but doesn't use it and I don't think it will stop there. Switch to a blacklist, and make sure only Radeon- based models get ECNVRAM. Symptoms of incorrect backlight mode selection are: 1. Non-working backlight control through sysfs; 2. Backlight gets reset to the lowest level at every shutdown, reboot and when thinkpad-acpi gets unloaded; This fixes a regression in 2.6.30, bugzilla #13826. This fix is already present on 2.6.31. This is a minimal patch for 2.6.30-stable, based on mainline commits: 050df107c408a3df048524b3783a5fc6d4dccfdb, 7d95a3d564901e88ed42810f054e579874151999, 59fe4fe34d7afdf63208124f313be9056feaa2f4, 6da25bf51689a5cc60370d30275dbb9e6852e0cb Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Reported-by: Tobias Diedrich <ranma+kernel@tdiedrich.de> Reported-by: Robert de Rooy <robert.de.rooy@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05PM / yenta: Fix cardbus suspend/resume regressionRafael J. Wysocki
commit 0c570cdeb8fdfcb354a3e9cd81bfc6a09c19de0c upstream. Since 2.6.29 the PCI PM core have been restoring the standard configuration registers of PCI devices in the early phase of resume. In particular, PCI devices without drivers have been handled this way since commit 355a72d75b3b4f4877db4c9070c798238028ecb5 (PCI: Rework default handling of suspend and resume). Unfortunately, this leads to post-resume problems with CardBus devices which cannot be accessed in the early phase of resume, because the sockets they are on have not been woken up yet at that point. To solve this problem, move the yenta socket resume to the early phase of resume and, analogously, move the suspend of it to the late phase of suspend. Additionally, remove some unnecessary PCI code from the yenta socket's resume routine. Fixes http://bugzilla.kernel.org/show_bug.cgi?id=13092, which is a post-2.6.28 regression. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reported-by: Florian <fs-kernelbugzilla@spline.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05PM / PCMCIA: Drop second argument of pcmcia_socket_dev_suspend()Rafael J. Wysocki
commit 827b4649d4626bf97b203b4bcd69476bb9b4e760 upstream. pcmcia_socket_dev_suspend() doesn't use its second argument, so it may be dropped safely. This change is necessary for the subsequent yenta suspend/resume fix. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05/proc/kcore: work around a BUG()KAMEZAWA Hiroyuki
Not upstream due to other fixes in .32 Works around a BUG() which is triggered when the kernel accesses holes in vmalloc regions. BUG: unable to handle kernel paging request at fa54c000 IP: [<c04f687a>] read_kcore+0x260/0x31a *pde = 3540b067 *pte = 00000000 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:1c.2/0000:03:00.0/ieee80211/phy0/rfkill0/state Modules linked in: fuse sco bridge stp llc bnep l2cap bluetooth sunrpc nf_conntrack_ftp ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq dm_multipath uinput usb_storage arc4 ecb snd_hda_codec_realtek snd_hda_intel ath5k snd_hda_codec snd_hwdep iTCO_wdt snd_pcm iTCO_vendor_support pcspkr i2c_i801 mac80211 joydev snd_timer serio_raw r8169 snd soundcore mii snd_page_alloc ath cfg80211 ata_generic i915 drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan] Sep 4 12:45:16 tuxedu kernel: Pid: 2266, comm: cat Not tainted (2.6.31-rc8 #2) Joybook Lite U101 EIP: 0060:[<c04f687a>] EFLAGS: 00010286 CPU: 0 EIP is at read_kcore+0x260/0x31a EAX: f5e5ea00 EBX: fa54d000 ECX: 00000400 EDX: 00001000 ESI: fa54c000 EDI: f44ad000 EBP: e4533f4c ESP: e4533f24 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process cat (pid: 2266, ti=e4532000 task=f09d19a0 task.ti=e4532000) Stack: 00005000 00000000 f44ad000 09d9c000 00003000 fa54c000 00001000 f6d16f60 e4520b80 fffffffb e4533f70 c04ef8eb e4533f98 00008000 09d97000 c04f661a e4520b80 09d97000 c04ef88c e4533f8c c04ba531 e4533f98 c04c0930 e4520b80 Call Trace: [<c04ef8eb>] ? proc_reg_read+0x5f/0x73 [<c04f661a>] ? read_kcore+0x0/0x31a [<c04ef88c>] ? proc_reg_read+0x0/0x73 [<c04ba531>] ? vfs_read+0x82/0xe1 [<c04c0930>] ? path_put+0x1a/0x1d [<c04ba62e>] ? sys_read+0x40/0x62 [<c0403298>] ? sysenter_do_call+0x12/0x2d Code: 39 f3 89 ca 0f 43 f3 89 fb 29 f2 29 f3 39 cf 0f 46 d3 29 55 dc 8d 1c 32 f6 40 0c 01 75 18 89 d1 89 f7 c1 e9 02 2b 7d ec 03 7d e0 <f3> a5 89 d1 83 e1 03 74 02 f3 a4 8b 00 83 7d dc 00 74 04 85 c0 EIP: [<c04f687a>] read_kcore+0x260/0x31a SS:ESP 0068:e4533f24 CR2: 00000000fa54c000 To access vmalloc area which may have memory holes, copy_from_user is useful. So this: # cat /proc/kcore > /dev/null will not panic. This is a minimal fix, suitable for 2.6.30.x and 2.6.31. More extensive /proc/kcore changes are planned for 2.6.32. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Nick Craig-Wood <nick@craig-wood.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Reported-by: <kbowa@tuxedu.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05powerpc: Fix incorrect setting of __HAVE_ARCH_PTE_SPECIALWeirich, Bernhard
[I'm going to fix upstream differently, by having all CPU types actually support _PAGE_SPECIAL, but I prefer the simple and obvious fix for -stable. -- Ben] The test that decides whether to define __HAVE_ARCH_PTE_SPECIAL on powerpc is bogus and will end up always defining it, even when _PAGE_SPECIAL is not supported (in which case it's 0) such as on 8xx or 40x processors. Signed-off-by: Bernhard Weirich <bernhard.weirich@riedel.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05powerpc/8xx: Fix regression introduced by cache coherency rewriteRex Feany
commit e0908085fc2391c85b85fb814ae1df377c8e0dcb upstream. After upgrading to the latest kernel on my mpc875 userspace started running incredibly slow (hours to get to a shell, even!). I tracked it down to commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36, that patch removed a work-around for the 8xx. Adding it back makes my problem go away. Signed-off-by: Rex Feany <rfeany@mrv.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05hugetlb: restore interleaving of bootmem huge pages (2.6.31)Lee Schermerhorn
Not upstream as it is fixed differently in .32 I noticed that alloc_bootmem_huge_page() will only advance to the next node on failure to allocate a huge page. I asked about this on linux-mm and linux-numa, cc'ing the usual huge page suspects. Mel Gorman responded: I strongly suspect that the same node being used until allocation failure instead of round-robin is an oversight and not deliberate at all. It appears to be a side-effect of a fix made way back in commit 63b4613c3f0d4b724ba259dc6c201bb68b884e1a ["hugetlb: fix hugepage allocation with memoryless nodes"]. Prior to that patch it looked like allocations would always round-robin even when allocation was successful. Andy Whitcroft countered that the existing behavior looked like Andi Kleen's original implementation and suggested that we ask him. We did and Andy replied that his intention was to interleave the allocations. So, ... This patch moves the advance of the hstate next node from which to allocate up before the test for success of the attempted allocation. This will unconditionally advance the next node from which to alloc, interleaving successful allocations over the nodes with sufficient contiguous memory, and skipping over nodes that fail the huge page allocation attempt. Note that alloc_bootmem_huge_page() will only be called for huge pages of order > MAX_ORDER. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05Fix idle time field in /proc/uptimeMichael Abbott
commit 96830a57de1197519b62af6a4c9ceea556c18c3d upstream. Git commit 79741dd changes idle cputime accounting, but unfortunately the /proc/uptime file hasn't caught up. Here the idle time calculation from /proc/stat is copied over. Signed-off-by: Michael Abbott <michael.abbott@diamond.ac.uk> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05netfilter: nf_nat: fix inverted logic for persistent NAT mappingsPatrick McHardy
netfilter: nf_nat: fix inverted logic for persistent NAT mappings Upstream commit cce5a5c3: Kernel 2.6.30 introduced a patch [1] for the persistent option in the netfilter SNAT target. This is exactly what we need here so I had a quick look at the code and noticed that the patch is wrong. The logic is simply inverted. The patch below fixes this. Also note that because of this the default behavior of the SNAT target has changed since kernel 2.6.30 as it now ignores the destination IP in choosing the source IP for nating (which should only be the case if the persistent option is set). [1] http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=98d500d66cb7940747b424b245fc6a51ecfbf005 Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05netfilter: ebt_ulog: fix checkentry return valuePatrick McHardy
netfilter: ebt_ulog: fix checkentry return value Upstream commit 8a56df0a: Commit 19eda87 (netfilter: change return types of check functions for Ebtables extensions) broke the ebtables ulog module by missing a return value conversion. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05netfilter: bridge: refcount fixPatrick McHardy
netfilter: bridge: refcount fix Upstream commit f3abc9b9: commit f216f082b2b37c4943f1e7c393e2786648d48f6f ([NETFILTER]: bridge netfilter: deal with martians correctly) added a refcount leak on in_dev. Instead of using in_dev_get(), we can use __in_dev_get_rcu(), as netfilter hooks are running under rcu_read_lock(), as pointed by Patrick. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05Fix NULL ptr regression in powernow-k8Kurt Roeckx
commit f0adb134d8dc9993a9998dc50845ec4f6ff4fadc upstream. Fixes bugzilla #13780 From: Kurt Roeckx <kurt@roeckx.be> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05net: Make the copy length in af_packet sockopt handler unsignedArjan van de Ven
fixed upstream in commit b7058842c940ad2c08dd829b21e5c92ebe3b8758 in a different way The length of the to-copy data structure is currently stored in a signed integer. However many comparisons are done with sizeof(..) which is unsigned. It's more suitable for this variable to be unsigned to make these comparisons more naturally right. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05net ax25: Fix signed comparison in the sockopt handlerArjan van de Ven
fixed upstream in commit b7058842c940ad2c08dd829b21e5c92ebe3b8758 in a different way The ax25 code tried to use if (optlen < sizeof(int)) return -EINVAL; as a security check against optlen being negative (or zero) in the set socket option. Unfortunately, "sizeof(int)" is an unsigned property, with the result that the whole comparison is done in unsigned, letting negative values slip through. This patch changes this to if (optlen < (int)sizeof(int)) return -EINVAL; so that the comparison is done as signed, and negative values get properly caught. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05ahci: restore pci_intx() handlingTejun Heo
commit 31b239ad1ba7225435e13f5afc47e48eb674c0cc upstream. Commit a5bfc4714b3f01365aef89a92673f2ceb1ccf246 dropped explicit pci_intx() manipulation from ahci because it seemed unnecessary and ahci doesn't seem to be the right place to be tweaking it if it were. This was largely okay but there are exceptions. There was one on an embedded platform which was fixed via firmware and now bko#14124 reports it on a HP DL320. http://bugzilla.kernel.org/show_bug.cgi?id=14124 I still think this isn't something libata drivers should be caring about (the only ones which are calling pci_intx() explicitly are libata ones and one other driver) but for now reverting the change seems to be the right thing to do. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05Revert "KVM: x86: check for cr3 validity in ioctl_set_sregs"Marcelo Tosatti
(cherry picked from commit dc7e795e3dd2a763e5ceaa1615f307e808cf3932) This reverts commit 6c20e1442bb1c62914bb85b7f4a38973d2a423ba. To my understanding, it became obsolete with the advent of the more robust check in mmu_alloc_roots (89da4ff17f). Moreover, it prevents the conceptually safe pattern 1. set sregs 2. register mem-slots 3. run vcpu by setting a sticky triple fault during step 1. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: fix cpuid E2BIG handling for extended request typesMark McLoughlin
(cherry picked from commit cb007648de83cf226d69ec76e1c01848b4e8e49f) If we run out of cpuid entries for extended request types we should return -E2BIG, just like we do for the standard request types. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM guest: fix bogus wallclock physical address calculationGlauber Costa
(cherry picked from commit a20316d2aa41a8f4fd171648bad8f044f6060826) The use of __pa() to calculate the address of a C-visible symbol is wrong, and can lead to unpredictable results. See arch/x86/include/asm/page.h for details. It should be replaced with __pa_symbol(), that does the correct math here, by taking relocations into account. This ensures the correct wallclock data structure physical address is passed to the hypervisor. Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: limit lapic periodic timer frequencyMarcelo Tosatti
(cherry picked from commit 1444885a045fe3b1905a14ea1b52540bf556578b) Otherwise its possible to starve the host by programming lapic timer with a very high frequency. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: MMU: fix bogus alloc_mmu_pages assignmentMarcelo Tosatti
(cherry picked from commit b90c062c65cc8839edfac39778a37a55ca9bda36) Remove the bogus n_free_mmu_pages assignment from alloc_mmu_pages. It breaks accounting of mmu pages, since n_free_mmu_pages is modified but the real number of pages remains the same. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: MMU: fix missing locking in alloc_mmu_pagesMarcelo Tosatti
(cherry picked from commit 6a1ac77110ee3e8d8dfdef8442f3b30b3d83e6a2) n_requested_mmu_pages/n_free_mmu_pages are used by kvm_mmu_change_mmu_pages to calculate the number of pages to zap. alloc_mmu_pages, called from the vcpu initialization path, modifies this variables without proper locking, which can result in a negative value in kvm_mmu_change_mmu_pages (say, with cpu hotplug). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: x86: Disallow hypercalls for guest callers in rings > 0Jan Kiszka
(cherry picked from commit 07708c4af1346ab1521b26a202f438366b7bcffd) So far unprivileged guest callers running in ring 3 can issue, e.g., MMU hypercalls. Normally, such callers cannot provide any hand-crafted MMU command structure as it has to be passed by its physical address, but they can still crash the guest kernel by passing random addresses. To close the hole, this patch considers hypercalls valid only if issued from guest ring 0. This may still be relaxed on a per-hypercall base in the future once required. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: MMU: make __kvm_mmu_free_some_pages handle empty listIzik Eidus
(cherry picked from commit 3b80fffe2b31fb716d3ebe729c54464ee7856723) First check if the list is empty before attempting to look at list entries. Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: VMX: Fix cr8 exiting control clobbering by EPTGleb Natapov
(cherry picked from commit 5fff7d270bd6a4759b6d663741b729cdee370257) Don't call adjust_vmx_controls() two times for the same control. It restores options that were dropped earlier. This loses us the cr8 exit control, which causes a massive performance regression Windows x64. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: VMX: Check cpl before emulating debug register accessAvi Kivity
(cherry picked from commit 0a79b009525b160081d75cef5dbf45817956acf2) Debug registers may only be accessed from cpl 0. Unfortunately, vmx will code to emulate the instruction even though it was issued from guest userspace, possibly leading to an unexpected trap later. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05Re-enable Lanman securityChuck Ebbert
commit 20d1752f3d6bd32beb90949559e0d14a0b234445 upstream. commit ac68392460ffefed13020967bae04edc4d3add06 ("[CIFS] Allow raw ntlmssp code to be enabled with sec=ntlmssp") added a new bit to the allowed security flags mask but seems to have inadvertently removed Lanman security from the allowed flags. Add it back. Signed-off-by: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05p54usb: add Zcomax XG-705A usbidChristian Lamparter
commit f7f71173ea69d4dabf166533beffa9294090b7ef upstream. This patch adds a new usbid for Zcomax XG-705A to the device table. Reported-by: Jari Jaakola <jari.jaakola@gmail.com> Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05nilfs2: fix missing zero-fill initialization of btree node cacheRyusuke Konishi
commit 1f28fcd925b2b3157411bbd08f0024b55b70d8dd upstream. This will fix file system corruption which infrequently happens after mount. The problem was reported from users with the title "[NILFS users] Fail to mount NILFS." (Message-ID: <200908211918.34720.yuri@itinteg.net>), and so forth. I've also experienced the corruption multiple times on kernel 2.6.30 and 2.6.31. The problem turned out to be caused due to discordance between mapping->nrpages of a btree node cache and the actual number of pages hung on the cache; if the mapping->nrpages becomes zero even as it has pages, truncate_inode_pages() returns without doing anything. Usually this is harmless except it may cause page leak, but garbage collection fairly infrequently sees a stale page remained in the btree node cache of DAT (i.e. disk address translation file of nilfs), and induces the corruption. I identified a missing initialization in btree node caches was the root cause. This corrects the bug. I've tested this for kernel 2.6.30 and 2.6.31. Reported-by: Yuri Chislov <yuri@itinteg.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05kallsyms: fix segfault in prefix_underscores_count()Paul Mundt
commit a9ece53c4089ef23d4002d34c4c7148d94622a40 upstream. Commit b478b782e110fdb4135caa3062b6d687e989d994 "kallsyms, tracing: output more proper symbol name" introduces a "bugfix" that introduces a segfault in kallsyms in my configurations. The cause is the introduction of prefix_underscores_count() which attempts to count underscores, even in symbols that do not have them. As a result, it just uselessly runs past the end of the buffer until it crashes: CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 KSYM .tmp_kallsyms1.S /bin/sh: line 1: 16934 Done sh-linux-gnu-nm -n .tmp_vmlinux1 16935 Segmentation fault | scripts/kallsyms > .tmp_kallsyms1.S make: *** [.tmp_kallsyms1.S] Error 139 This simplifies the logic and just does a straightforward count. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Paulo Marques <pmarques@grupopie.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05fs: make sure data stored into inode is properly seen before unlocking new inodeJan Kara
commit 580be0837a7a59b207c3d5c661d044d8dd0a6a30 upstream. In theory it could happen that on one CPU we initialize a new inode but clearing of I_NEW | I_LOCK gets reordered before some of the initialization. Thus on another CPU we return not fully uptodate inode from iget_locked(). This seems to fix a corruption issue on ext3 mounted over NFS. [akpm@linux-foundation.org: add some commentary] Signed-off-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05ACPI: pci_slot.ko wants a 64-bit _SUNAlex Chiang
commit 7e24bc1ce669b2876ffa475ea1147f2bb9ffdc52 upstream. Similar to commit b6adc195 (PCI hotplug: acpiphp wants a 64-bit _SUN), pci_slot.ko reads and creates sysfs directories based on the _SUN method. Certain HP platforms return 64 bits in _SUN. This change to pci_slot.ko allows us to see the correct sysfs directories. Reported-by: Chad Smith <chad.smith@hp.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24Linux 2.6.30.8v2.6.30.8Greg Kroah-Hartman
2009-09-24powerpc/pseries: Fix to handle slb resize across migrationBrian King
commit 46db2f86a3b2a94e0b33e0b4548fb7b7b6bdff66 upstream. The SLB can change sizes across a live migration, which was not being handled, resulting in possible machine crashes during migration if migrating to a machine which has a smaller max SLB size than the source machine. Fix this by first reducing the SLB size to the minimum possible value, which is 32, prior to migration. Then during the device tree update which occurs after migration, we make the call to ensure the SLB gets updated. Also add the slb_size to the lparcfg output so that the migration tools can check to make sure the kernel has this capability before allowing migration in scenarios where the SLB size will change. BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid breaking ppc32 build Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24PCI: Unhide the SMBus on the Compaq Evo D510 USDTJean Delvare
commit 6b5096e4d4496e185cd1ada5d1b8e1d941c805ed upstream. One more form factor for Compaq Evo D510, which needs the same quirk as the other form factors. Apparently there's no hardware monitoring chip on that one, but SPD EEPROMs, so it's still worth unhiding the SMBus. Signed-off-by: Jean Delvare <khali@linux-fr.org> Tested-by: Nuzhna Pomoshch <nuzhna_pomoshch@yahoo.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24libata: fix off-by-one error in ata_tf_read_block()Tejun Heo
commit ac8672ea922bde59acf50eaa1eaa1640a6395fd2 upstream. ata_tf_read_block() has off-by-one error when converting CHS address to LBA. The bug isn't very visible because ata_tf_read_block() is used only when generating sense data for a failed RW command and CHS addressing isn't used too often these days. This problem was spotted by Atsushi Nemoto. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24virtio_blk: don't bounce highmem requestsChristoph Hellwig
commit 4eff3cae9c9809720c636e64bc72f212258e0bd5 upstream virtio_blk: don't bounce highmem requests By default a block driver bounces highmem requests, but virtio-blk is perfectly fine with any request that fit into it's 64 bit addressing scheme, mapped in the kernel virtual space or not. Besides improving performance on highmem systems this also makes the reproducible oops in __bounce_end_io go away (but hiding the real cause). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24V4L: em28xx: set up tda9887_conf in em28xx_card_setup()Franklin Meng
V4L: em28xx: set up tda9887_conf in em28xx_card_setup() (cherry picked from commit ae3340cbf59ea362c2016eea762456cc0969fd9e) Added tda9887_conf set up into em28xx_card_setup() Signed-off-by: Franklin Meng <fmeng2002@yahoo.com> Signed-off-by: Douglas Schilling Landgraf <dougsland@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Tested-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24x86, pat: Fix cacheflush address in change_page_attr_set_clr()Jack Steiner
commit fa526d0d641b5365676a1fb821ce359e217c9b85 upstream. Fix address passed to cpa_flush_range() when changing page attributes from WB to UC. The address (*addr) is modified by __change_page_attr_set_clr(). The result is that the pages being flushed start at the _end_ of the changed range instead of the beginning. This should be considered for 2.6.30-stable and 2.6.31-stable. Signed-off-by: Jack Steiner <steiner@sgi.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24x86/i386: Make sure stack-protector segment base is cache alignedJeremy Fitzhardinge
commit 1ea0d14e480c245683927eecc03a70faf06e80c8 upstream. The Intel Optimization Reference Guide says: In Intel Atom microarchitecture, the address generation unit assumes that the segment base will be 0 by default. Non-zero segment base will cause load and store operations to experience a delay. - If the segment base isn't aligned to a cache line boundary, the max throughput of memory operations is reduced to one [e]very 9 cycles. [...] Assembly/Compiler Coding Rule 15. (H impact, ML generality) For Intel Atom processors, use segments with base set to 0 whenever possible; avoid non-zero segment base address that is not aligned to cache line boundary at all cost. We can't avoid having a non-zero base for the stack-protector segment, but we can make it cache-aligned. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> LKML-Reference: <4AA01893.6000507@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24x86: Fix x86_model test in es7000_apic_is_cluster()Roel Kluin
commit 005155b1f626d2b2d7932e4afdf4fead168c6888 upstream. For the x86_model to be greater than 6 or less than 12 is logically always true. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24sound: oxygen: work around MCE when changing volumeClemens Ladisch
commit f1bc07af9a9edc5c1d4bdd971f7099316ed2e405 upstream. When the volume is changed continuously (e.g., when the user drags a volume slider with the mouse), the driver does lots of I2C writes. Apparently, the sound chip can get confused when we poll the I2C status register too much, and fails to complete a read from it. On the PCI-E models, the PCI-E/PCI bridge gets upset by this and generates a machine check exception. To avoid this, this patch replaces the polling with an unconditional wait that is guaranteed to be long enough. Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Tested-by: Johann Messner <johann.messner at jku.at> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24PCI: apply nv_msi_ht_cap_quirk on resume tooTejun Heo
commit 6dab62ee5a3bf4f71b8320c09db2e6022a19f40e upstream. http://bugzilla.kernel.org/show_bug.cgi?id=12542 reports that with the quirk not applied on resume, msi stops working after resuming and mcp78s ahci fails due to IRQ mis-delivery. Apply it on resume too. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peer Chen <pchen@nvidia.com> Cc: Tj <linux@tjworld.net> Reported-by: Nicolas Derive <kalon33@ubuntu.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24mlx4_core: Allocate and map sufficient ICM memory for EQ contextRoland Dreier
commit fa0681d2129732027355d6b7083dd8932b9b799d upstream. The current implementation allocates a single host page for EQ context memory, which was OK when we only allocated a few EQs. However, since we now allocate an EQ for each CPU core, this patch removes the hard-coded limit (which we exceed with 4 KB pages and 128 byte EQ context entries with 32 CPUs) and uses the same ICM table code as all other context tables, which ends up simplifying the code quite a bit while fixing the problem. This problem was actually hit in practice on a dual-socket Nehalem box with 16 real hardware threads and sufficiently odd ACPI tables that it shows on boot SMP: Allowing 32 CPUs, 16 hotplug CPUs so num_possible_cpus() ends up 32, and mlx4 ends up creating 33 MSI-X interrupts and 33 EQs. This mlx4 bug means that mlx4 can't even initialize at all on this quite mainstream system. Reported-by: Eli Cohen <eli@mellanox.co.il> Tested-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24ASoC: Fix WM835x Out4 capture enumerationMark Brown
commit 87831cb660954356d68cebdb1406f3be09e784e9 upstream. It's the 8th enum of a zero indexed array. This is why I don't let new drivers use these arrays of enums... Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24ARM: 5691/1: fix cache aliasing issues between kmap() and kmap_atomic() with ↵Nicolas Pitre
highmem commit 7929eb9cf643ae416e5081b2a6fa558d37b9854c upstream. Let's suppose a highmem page is kmap'd with kmap(). A pkmap entry is used, the page mapped to it, and the virtual cache is dirtied. Then kunmap() is used which does virtually nothing except for decrementing a usage count. Then, let's suppose the _same_ page gets mapped using kmap_atomic(). It is therefore mapped onto a fixmap entry instead, which has a different virtual address unaware of the dirty cache data for that page sitting in the pkmap mapping. Fortunately it is easy to know if a pkmap mapping still exists for that page and use it directly with kmap_atomic(), thanks to kmap_high_get(). And actual testing with a printk in the added code path shows that this condition is actually met *extremely* frequently. Seems that we've been quite lucky that things have worked so well with highmem so far. Signed-off-by: Nicolas Pitre <nico@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24ALSA: cs46xx - Fix minimum period sizeSophie Hamilton
commit 6148b130eb84edc76e4fa88da1877b27be6c2f06 upstream. Fix minimum period size for cs46xx cards. This fixes a problem in the case where neither a period size nor a buffer size is passed to ALSA; this is the case in Audacious, OpenAL, and others. Signed-off-by: Sophie Hamilton <kernel@theblob.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24udf: Use device size when drive reported bogus number of written blocksJan Kara
commit 24a5d59f3477bcff4c069ff4d0ca9a3e037d0235 upstream. Some drives report 0 as the number of written blocks when there are some blocks recorded. Use device size in such case so that we can automagically mount such media. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>