aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)Author
2014-05-30x86_64: expand kernel stack to 16KMinchan Kim
While I play inhouse patches with much memory pressure on qemu-kvm, 3.14 kernel was randomly crashed. The reason was kernel stack overflow. When I investigated the problem, the callstack was a little bit deeper by involve with reclaim functions but not direct reclaim path. I tried to diet stack size of some functions related with alloc/reclaim so did a hundred of byte but overflow was't disappeard so that I encounter overflow by another deeper callstack on reclaim/allocator path. Of course, we might sweep every sites we have found for reducing stack usage but I'm not sure how long it saves the world(surely, lots of developer start to add nice features which will use stack agains) and if we consider another more complex feature in I/O layer and/or reclaim path, it might be better to increase stack size( meanwhile, stack usage on 64bit machine was doubled compared to 32bit while it have sticked to 8K. Hmm, it's not a fair to me and arm64 already expaned to 16K. ) So, my stupid idea is just let's expand stack size and keep an eye toward stack consumption on each kernel functions via stacktrace of ftrace. For example, we can have a bar like that each funcion shouldn't exceed 200K and emit the warning when some function consumes more in runtime. Of course, it could make false positive but at least, it could make a chance to think over it. I guess this topic was discussed several time so there might be strong reason not to increase kernel stack size on x86_64, for me not knowing so Ccing x86_64 maintainers, other MM guys and virtio maintainers. Here's an example call trace using up the kernel stack: Depth Size Location (51 entries) ----- ---- -------- 0) 7696 16 lookup_address 1) 7680 16 _lookup_address_cpa.isra.3 2) 7664 24 __change_page_attr_set_clr 3) 7640 392 kernel_map_pages 4) 7248 256 get_page_from_freelist 5) 6992 352 __alloc_pages_nodemask 6) 6640 8 alloc_pages_current 7) 6632 168 new_slab 8) 6464 8 __slab_alloc 9) 6456 80 __kmalloc 10) 6376 376 vring_add_indirect 11) 6000 144 virtqueue_add_sgs 12) 5856 288 __virtblk_add_req 13) 5568 96 virtio_queue_rq 14) 5472 128 __blk_mq_run_hw_queue 15) 5344 16 blk_mq_run_hw_queue 16) 5328 96 blk_mq_insert_requests 17) 5232 112 blk_mq_flush_plug_list 18) 5120 112 blk_flush_plug_list 19) 5008 64 io_schedule_timeout 20) 4944 128 mempool_alloc 21) 4816 96 bio_alloc_bioset 22) 4720 48 get_swap_bio 23) 4672 160 __swap_writepage 24) 4512 32 swap_writepage 25) 4480 320 shrink_page_list 26) 4160 208 shrink_inactive_list 27) 3952 304 shrink_lruvec 28) 3648 80 shrink_zone 29) 3568 128 do_try_to_free_pages 30) 3440 208 try_to_free_pages 31) 3232 352 __alloc_pages_nodemask 32) 2880 8 alloc_pages_current 33) 2872 200 __page_cache_alloc 34) 2672 80 find_or_create_page 35) 2592 80 ext4_mb_load_buddy 36) 2512 176 ext4_mb_regular_allocator 37) 2336 128 ext4_mb_new_blocks 38) 2208 256 ext4_ext_map_blocks 39) 1952 160 ext4_map_blocks 40) 1792 384 ext4_writepages 41) 1408 16 do_writepages 42) 1392 96 __writeback_single_inode 43) 1296 176 writeback_sb_inodes 44) 1120 80 __writeback_inodes_wb 45) 1040 160 wb_writeback 46) 880 208 bdi_writeback_workfn 47) 672 144 process_one_work 48) 528 112 worker_thread 49) 416 240 kthread 50) 176 176 ret_from_fork [ Note: the problem is exacerbated by certain gcc versions that seem to generate much bigger stack frames due to apparently bad coalescing of temporaries and generating too many spills. Rusty saw gcc-4.6.4 using 35% more stack on the virtio path than 4.8.2 does, for example. Minchan not only uses such a bad gcc version (4.6.3 in his case), but some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is what causes that kernel_map_pages() frame, for example). But we're clearly getting too close. The VM code also seems to have excessive stack frames partly for the same compiler reason, triggered by excessive inlining and lots of function arguments. We need to improve on our stack use, but in the meantime let's do this simple stack increase too. Unlike most earlier reports, there is nothing simple that stands out as being really horribly wrong here, apart from the fact that the stack frames are just bigger than they should need to be. - Linus ] Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Jones <davej@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S Tsirkin <mst@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-13x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow()Anthony Iliopoulos
The invalidation is required in order to maintain proper semantics under CoW conditions. In scenarios where a process clones several threads, a thread operating on a core whose DTLB entry for a particular hugepage has not been invalidated, will be reading from the hugepage that belongs to the forked child process, even after hugetlb_cow(). The thread will not see the updated page as long as the stale DTLB entry remains cached, the thread attempts to write into the page, the child process exits, or the thread gets migrated to a different processor. Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com> Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp Suggested-by: Shay Goikhman <shay.goikhman@huawei.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v2.6.16+ (!)
2014-05-09x86: Fix typo in MSR_IA32_MISC_ENABLE_LIMIT_CPUID macroAndres Freund
The spuriously added semicolon didn't have any effect because the macro isn't currently in use. c0a639ad0bc6b178b46996bd1f821a04643e2bde Signed-off-by: Andres Freund <andres@anarazel.de> Link: http://lkml.kernel.org/r/1399598957-7011-3-git-send-email-andres@anarazel.de Cc: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-08x86/hpet: Make boot_hpet_disable externFeng Tang
HPET on some platform has accuracy problem. Making "boot_hpet_disable" extern so that we can runtime disable the HPET timer by using quirk to check the platform. Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1398327498-13163-1-git-send-email-feng.tang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-14Merge git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Marcelo Tosatti: - Fix for guest triggerable BUG_ON (CVE-2014-0155) - CR4.SMAP support - Spurious WARN_ON() fix * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: remove WARN_ON from get_kernel_ns() KVM: Rename variable smep to cr4_smep KVM: expose SMAP feature to guest KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode KVM: Add SMAP support when setting CR4 KVM: Remove SMAP bit from CR4_RESERVED_BITS KVM: ioapic: try to recover if pending_eoi goes out of range KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)
2014-04-14KVM: Remove SMAP bit from CR4_RESERVED_BITSFeng Wu
This patch removes SMAP bit from CR4_RESERVED_BITS. Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-12Merge git://git.infradead.org/users/eparis/auditLinus Torvalds
Pull audit updates from Eric Paris. * git://git.infradead.org/users/eparis/audit: (28 commits) AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range audit: do not cast audit_rule_data pointers pointlesly AUDIT: Allow login in non-init namespaces audit: define audit_is_compat in kernel internal header kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c sched: declare pid_alive as inline audit: use uapi/linux/audit.h for AUDIT_ARCH declarations syscall_get_arch: remove useless function arguments audit: remove stray newline from audit_log_execve_info() audit_panic() call audit: remove stray newlines from audit_log_lost messages audit: include subject in login records audit: remove superfluous new- prefix in AUDIT_LOGIN messages audit: allow user processes to log from another PID namespace audit: anchor all pid references in the initial pid namespace audit: convert PPIDs to the inital PID namespace. pid: get pid_t ppid of task in init_pid_ns audit: rename the misleading audit_get_context() to audit_take_context() audit: Add generic compat syscall support audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL ...
2014-04-07x86: use generic early_ioremapMark Salter
Move x86 over to the generic early ioremap implementation. Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07x86/mm: sparse warning fix for early_memremapDave Young
This patch series takes the common bits from the x86 early ioremap implementation and creates a generic implementation which may be used by other architectures. The early ioremap interfaces are intended for situations where boot code needs to make temporary virtual mappings before the normal ioremap interfaces are available. Typically, this means before paging_init() has run. This patch (of 6): There's a lot of sparse warnings for code like below: void *a = early_memremap(phys_addr, size); early_memremap intend to map kernel memory with ioremap facility, the return pointer should be a kernel ram pointer instead of iomem one. For making the function clearer and supressing sparse warnings this patch do below two things: 1. cast to (__force void *) for the return value of early_memremap 2. add early_memunmap function and pass (__force void __iomem *) to iounmap From Boris: "Ingo told me yesterday, it makes sense too. I'd guess we can try it. FWIW, all callers of early_memremap use the memory they get remapped as normal memory so we should be safe" Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07percpu: add raw_cpu_opsChristoph Lameter
The kernel has never been audited to ensure that this_cpu operations are consistently used throughout the kernel. The code generated in many places can be improved through the use of this_cpu operations (which uses a segment register for relocation of per cpu offsets instead of performing address calculations). The patch set also addresses various consistency issues in general with the per cpu macros. A. The semantics of __this_cpu_ptr() differs from this_cpu_ptr only because checks are skipped. This is typically shown through a raw_ prefix. So this patch set changes the places where __this_cpu_ptr() is used to raw_cpu_ptr(). B. There has been the long term wish by some that __this_cpu operations would check for preemption. However, there are cases where preemption checks need to be skipped. This patch set adds raw_cpu operations that do not check for preemption and then adds preemption checks to the __this_cpu operations. C. The use of __get_cpu_var is always a reference to a percpu variable that can also be handled via a this_cpu operation. This patch set replaces all uses of __get_cpu_var with this_cpu operations. D. We can then use this_cpu RMW operations in various places replacing sequences of instructions by a single one. E. The use of this_cpu operations throughout will allow other arches than x86 to implement optimized references and RMV operations to work with per cpu local data. F. The use of this_cpu operations opens up the possibility to further optimize code that relies on synchronization through per cpu data. The patch set works in a couple of stages: I. Patch 1 adds the additional raw_cpu operations and raw_cpu_ptr(). Also converts the existing __this_cpu_xx_# primitive in the x86 code to raw_cpu_xx_#. II. Patch 2-4 use the raw_cpu operations in places that would give us false positives once they are enabled. III. Patch 5 adds preemption checks to __this_cpu operations to allow checking if preemption is properly disabled when these functions are used. IV. Patches 6-20 are patches that simply replace uses of __get_cpu_var with this_cpu_ptr. They do not depend on any changes to the percpu code. No preemption tests are skipped if they are applied. V. Patches 21-46 are conversion patches that use this_cpu operations in various kernel subsystems/drivers or arch code. VI. Patches 47/48 (not included in this series) remove no longer used functions (__this_cpu_ptr and __get_cpu_var). These should only be applied after all the conversion patches have made it and after we have done additional passes through the kernel to ensure that none of the uses of these functions remain. This patch (of 46): The patches following this one will add preemption checks to __this_cpu ops so we need to have an alternative way to use this_cpu operations without preemption checks. raw_cpu_ops will be the basis for all other ops since these will be the operations that do not implement any checks. Primitive operations are renamed by this patch from __this_cpu_xxx to raw_cpu_xxxx. Also change the uses of the x86 percpu primitives in preempt.h. These depend directly on asm/percpu.h (header #include nesting issue). Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Alex Shi <alex.shi@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bryan Wu <cooloney@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: David Daney <david.daney@cavium.com> Cc: David Miller <davem@davemloft.net> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Hedi Berriche <hedi@sgi.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: John Stultz <john.stultz@linaro.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Mike Travis <travis@sgi.com> Cc: Neil Brown <neilb@suse.de> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Robert Richter <rric@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07x86: always define BUG() and HAVE_ARCH_BUG, even with !CONFIG_BUGJosh Triplett
This ensures that BUG() always has a definition that causes a trap (via an undefined instruction), and that the compiler still recognizes the code following BUG() as unreachable, avoiding warnings that would otherwise appear (such as on non-void functions that don't return a value after BUG()). In addition to saving a few bytes over the generic infinite-loop implementation, this implementation traps rather than looping, which potentially allows for better error-recovery behavior (such as by rebooting). Signed-off-by: Josh Triplett <josh@joshtriplett.org> Reported-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-04Merge tag 'random_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull /dev/random changes from Ted Ts'o: "A number of cleanups plus support for the RDSEED instruction, which will be showing up in Intel Broadwell CPU's" * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: random: Add arch_has_random[_seed]() random: If we have arch_get_random_seed*(), try it before blocking random: Use arch_get_random_seed*() at init time and once a second x86, random: Enable the RDSEED instruction random: use the architectural HWRNG for the SHA's IV in extract_buf() random: clarify bits/bytes in wakeup thresholds random: entropy_bytes is actually bits random: simplify accounting code random: tighten bound on random_read_wakeup_thresh random: forget lock in lockless accounting random: simplify accounting logic random: fix comment on "account" random: simplify loop in random_read random: fix description of get_random_bytes random: fix comment on proc_do_uuid random: fix typos / spelling errors in comments
2014-04-03Merge tag 'stable/for-linus-3.15-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen features and fixes from David Vrabel: "Support PCI devices with multiple MSIs, performance improvement for kernel-based backends (by not populated m2p overrides when mapping), and assorted minor bug fixes and cleanups" * tag 'stable/for-linus-3.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/acpi-processor: fix enabling interrupts on syscore_resume xen/grant-table: Refactor gnttab_[un]map_refs to avoid m2p_override xen: remove XEN_PRIVILEGED_GUEST xen: add support for MSI message groups xen-pciback: Use pci_enable_msix_exact() instead of pci_enable_msix() xen/xenbus: remove unused xenbus_bind_evtchn() xen/events: remove unnecessary call to bind_evtchn_to_cpu() xen/events: remove the unused resend_irq_on_evtchn() drivers:xen-selfballoon:reset 'frontswap_inertia_counter' after frontswap_shrink drivers: xen: Include appropriate header file in pcpu.c drivers: xen: Mark function as static in platform-pci.c
2014-04-02Merge tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "PPC and ARM do not have much going on this time. Most of the cool stuff, instead, is in s390 and (after a few releases) x86. ARM has some caching fixes and PPC has transactional memory support in guests. MIPS has some fixes, with more probably coming in 3.16 as QEMU will soon get support for MIPS KVM. For x86 there are optimizations for debug registers, which trigger on some Windows games, and other important fixes for Windows guests. We now expose to the guest Broadwell instruction set extensions and also Intel MPX. There's also a fix/workaround for OS X guests, nested virtualization features (preemption timer), and a couple kvmclock refinements. For s390, the main news is asynchronous page faults, together with improvements to IRQs (floating irqs and adapter irqs) that speed up virtio devices" * tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits) KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8 KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode KVM: PPC: Book3S HV: Return ENODEV error rather than EIO KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state KVM: PPC: Book3S HV: Add transactional memory support KVM: Specify byte order for KVM_EXIT_MMIO KVM: vmx: fix MPX detection KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write KVM: s390: clear local interrupts at cpu initial reset KVM: s390: Fix possible memory leak in SIGP functions KVM: s390: fix calculation of idle_mask array size KVM: s390: randomize sca address KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP KVM: Bump KVM_MAX_IRQ_ROUTES for s390 KVM: s390: irq routing for adapter interrupts. KVM: s390: adapter interrupt sources ...
2014-04-02Merge branch 'x86-nuke-platforms-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 old platform removal from Peter Anvin: "This patchset removes support for several completely obsolete platforms, where the maintainers either have completely vanished or acked the removal. For some of them it is questionable if there even exists functional specimens of the hardware" Geert Uytterhoeven apparently thought this was a April Fool's pull request ;) * 'x86-nuke-platforms-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, platforms: Remove NUMAQ x86, platforms: Remove SGI Visual Workstation x86, apic: Remove support for IBM Summit/EXA chipset x86, apic: Remove support for ia32-based Unisys ES7000
2014-04-02Merge branch 'x86-vdso-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso changes from Peter Anvin: "This is the revamp of the 32-bit vdso and the associated cleanups. This adds timekeeping support to the 32-bit vdso that we already have in the 64-bit vdso. Although 32-bit x86 is legacy, it is likely to remain in the embedded space for a very long time to come. This removes the traditional COMPAT_VDSO support; the configuration variable is reused for simply removing the 32-bit vdso, which will produce correct results but obviously suffer a performance penalty. Only one beta version of glibc was affected, but that version was unfortunately included in one OpenSUSE release. This is not the end of the vdso cleanups. Stefani and Andy have agreed to continue work for the next kernel cycle; in fact Andy has already produced another set of cleanups that came too late for this cycle. An incidental, but arguably important, change is that this ensures that unused space in the VVAR page is properly zeroed. It wasn't before, and would contain whatever garbage was left in memory by BIOS or the bootloader. Since the VVAR page is accessible to user space this had the potential of information leaks" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86, vdso: Fix the symbol versions on the 32-bit vDSO x86, vdso, build: Don't rebuild 32-bit vdsos on every make x86, vdso: Actually discard the .discard sections x86, vdso: Fix size of get_unmapped_area() x86, vdso: Finish removing VDSO32_PRELINK x86, vdso: Move more vdso definitions into vdso.h x86: Load the 32-bit vdso in place, just like the 64-bit vdsos x86, vdso32: handle 32 bit vDSO larger one page x86, vdso32: Disable stack protector, adjust optimizations x86, vdso: Zero-pad the VVAR page x86, vdso: Add 32 bit VDSO time support for 64 bit kernel x86, vdso: Add 32 bit VDSO time support for 32 bit kernel x86, vdso: Patch alternatives in the 32-bit VDSO x86, vdso: Introduce VVAR marco for vdso32 x86, vdso: Cleanup __vdso_gettimeofday() x86, vdso: Replace VVAR(vsyscall_gtod_data) by gtod macro x86, vdso: __vdso_clock_gettime() cleanup x86, vdso: Revamp vclock_gettime.c mm: Add new func _install_special_mapping() to mmap.c x86, vdso: Make vsyscall_gtod_data handling x86 generic ...
2014-04-01Merge tag 'driver-core-3.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and sysfs updates from Greg KH: "Here's the big driver core / sysfs update for 3.15-rc1. Lots of kernfs updates to make it useful for other subsystems, and a few other tiny driver core patches. All have been in linux-next for a while" * tag 'driver-core-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (42 commits) Revert "sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()" kernfs: cache atomic_write_len in kernfs_open_file numa: fix NULL pointer access and memory leak in unregister_one_node() Revert "driver core: synchronize device shutdown" kernfs: fix off by one error. kernfs: remove duplicate dir.c at the top dir x86: align x86 arch with generic CPU modalias handling cpu: add generic support for CPU feature based module autoloading sysfs: create bin_attributes under the requested group driver core: unexport static function create_syslog_header firmware: use power efficient workqueue for unloading and aborting fw load firmware: give a protection when map page failed firmware: google memconsole driver fixes firmware: fix google/gsmi duplicate efivars_sysfs_init() drivers/base: delete non-required instances of include <linux/init.h> kernfs: fix kernfs_node_from_dentry() ACPI / platform: drop redundant ACPI_HANDLE check kernfs: fix hash calculation in kernfs_rename_ns() kernfs: add CONFIG_KERNFS sysfs, kobject: add sysfs wrapper for kernfs_enable_ns() ...
2014-04-01Merge tag 'pci-v3.15-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Enumeration - Increment max correctly in pci_scan_bridge() (Andreas Noever) - Clarify the "scan anyway" comment in pci_scan_bridge() (Andreas Noever) - Assign CardBus bus number only during the second pass (Andreas Noever) - Use request_resource_conflict() instead of insert_ for bus numbers (Andreas Noever) - Make sure bus number resources stay within their parents bounds (Andreas Noever) - Remove pci_fixup_parent_subordinate_busnr() (Andreas Noever) - Check for child busses which use more bus numbers than allocated (Andreas Noever) - Don't scan random busses in pci_scan_bridge() (Andreas Noever) - x86: Drop pcibios_scan_root() check for bus already scanned (Bjorn Helgaas) - x86: Use pcibios_scan_root() instead of pci_scan_bus_with_sysdata() (Bjorn Helgaas) - x86: Use pcibios_scan_root() instead of pci_scan_bus_on_node() (Bjorn Helgaas) - x86: Merge pci_scan_bus_on_node() into pcibios_scan_root() (Bjorn Helgaas) - x86: Drop return value of pcibios_scan_root() (Bjorn Helgaas) NUMA - x86: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus (Bjorn Helgaas) - x86: Use x86_pci_root_bus_node() instead of get_mp_bus_to_node() (Bjorn Helgaas) - x86: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node() (Bjorn Helgaas) - x86: Use NUMA_NO_NODE, not -1, for unknown node (Bjorn Helgaas) - x86: Remove acpi_get_pxm() usage (Bjorn Helgaas) - ia64: Use NUMA_NO_NODE, not MAX_NUMNODES, for unknown node (Bjorn Helgaas) - ia64: Remove acpi_get_pxm() usage (Bjorn Helgaas) - ACPI: Fix acpi_get_node() prototype (Bjorn Helgaas) Resource management - i2o: Fix and refactor PCI space allocation (Bjorn Helgaas) - Add resource_contains() (Bjorn Helgaas) - Add %pR support for IORESOURCE_UNSET (Bjorn Helgaas) - Mark resources as IORESOURCE_UNSET if we can't assign them (Bjorn Helgaas) - Don't clear IORESOURCE_UNSET when updating BAR (Bjorn Helgaas) - Check IORESOURCE_UNSET before updating BAR (Bjorn Helgaas) - Don't try to claim IORESOURCE_UNSET resources (Bjorn Helgaas) - Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit (Bjorn Helgaas) - Don't enable decoding if BAR hasn't been assigned an address (Bjorn Helgaas) - Add "weak" generic pcibios_enable_device() implementation (Bjorn Helgaas) - alpha, microblaze, sh, sparc, tile: Use default pcibios_enable_device() (Bjorn Helgaas) - s390: Use generic pci_enable_resources() (Bjorn Helgaas) - Don't check resource_size() in pci_bus_alloc_resource() (Bjorn Helgaas) - Set type in __request_region() (Bjorn Helgaas) - Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() (Bjorn Helgaas) - Change pci_bus_alloc_resource() type_mask to unsigned long (Bjorn Helgaas) - Log IDE resource quirk in dmesg (Bjorn Helgaas) - Revert "[PATCH] Insert GART region into resource map" (Bjorn Helgaas) PCI device hotplug - Make check_link_active() non-static (Rajat Jain) - Use link change notifications for hot-plug and removal (Rajat Jain) - Enable link state change notifications (Rajat Jain) - Don't disable the link permanently during removal (Rajat Jain) - Don't check adapter or latch status while disabling (Rajat Jain) - Disable link notification across slot reset (Rajat Jain) - Ensure very fast hotplug events are also processed (Rajat Jain) - Add hotplug_lock to serialize hotplug events (Rajat Jain) - Remove a non-existent card, regardless of "surprise" capability (Rajat Jain) - Don't turn slot off when hot-added device already exists (Yijing Wang) MSI - Keep pci_enable_msi() documentation (Alexander Gordeev) - ahci: Fix broken single MSI fallback (Alexander Gordeev) - ahci, vfio: Use pci_enable_msi_range() (Alexander Gordeev) - Check kmalloc() return value, fix leak of name (Greg Kroah-Hartman) - Fix leak of msi_attrs (Greg Kroah-Hartman) - Fix pci_msix_vec_count() htmldocs failure (Masanari Iida) Virtualization - Device-specific ACS support (Alex Williamson) Freescale i.MX6 - Wait for retraining (Marek Vasut) Marvell MVEBU - Use Device ID and revision from underlying endpoint (Andrew Lunn) - Fix incorrect size for PCI aperture resources (Jason Gunthorpe) - Call request_resource() on the apertures (Jason Gunthorpe) - Fix potential issue in range parsing (Jean-Jacques Hiblot) Renesas R-Car - Check platform_get_irq() return code (Ben Dooks) - Add error interrupt handling (Ben Dooks) - Fix bridge logic configuration accesses (Ben Dooks) - Register each instance independently (Magnus Damm) - Break out window size handling (Magnus Damm) - Make the Kconfig dependencies more generic (Magnus Damm) Synopsys DesignWare - Fix RC BAR to be single 64-bit non-prefetchable memory (Mohit Kumar) Miscellaneous - Remove unused SR-IOV VF Migration support (Bjorn Helgaas) - Enable INTx if BIOS left them disabled (Bjorn Helgaas) - Fix hex vs decimal typo in cpqhpc_probe() (Dan Carpenter) - Clean up par-arch object file list (Liviu Dudau) - Set IORESOURCE_ROM_SHADOW only for the default VGA device (Sander Eikelenboom) - ACPI, ARM, drm, powerpc, pcmcia, PCI: Use list_for_each_entry() for bus traversal (Yijing Wang) - Fix pci_bus_b() build failure (Paul Gortmaker)" * tag 'pci-v3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (108 commits) Revert "[PATCH] Insert GART region into resource map" PCI: Log IDE resource quirk in dmesg PCI: Change pci_bus_alloc_resource() type_mask to unsigned long PCI: Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() resources: Set type in __request_region() PCI: Don't check resource_size() in pci_bus_alloc_resource() s390/PCI: Use generic pci_enable_resources() tile PCI RC: Use default pcibios_enable_device() sparc/PCI: Use default pcibios_enable_device() (Leon only) sh/PCI: Use default pcibios_enable_device() microblaze/PCI: Use default pcibios_enable_device() alpha/PCI: Use default pcibios_enable_device() PCI: Add "weak" generic pcibios_enable_device() implementation PCI: Don't enable decoding if BAR hasn't been assigned an address PCI: Enable INTx in pci_reenable_device() only when MSI/MSI-X not enabled PCI: Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit PCI: Don't try to claim IORESOURCE_UNSET resources PCI: Check IORESOURCE_UNSET before updating BAR PCI: Don't clear IORESOURCE_UNSET when updating BAR PCI: Mark resources as IORESOURCE_UNSET if we can't assign them ... Conflicts: arch/x86/include/asm/topology.h drivers/ata/ahci.c
2014-04-01Merge branch 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq code updates from Thomas Gleixner: "The irq department proudly presents: - Another tree wide sweep of irq infrastructure abuse. Clear winner of the trainwreck engineering contest was: #include "../../../kernel/irq/settings.h" - Tree wide update of irq_set_affinity() callbacks which miss a cpu online check when picking a single cpu out of the affinity mask. - Tree wide consolidation of interrupt statistics. - Updates to the threaded interrupt infrastructure to allow explicit wakeup of the interrupt thread and a variant of synchronize_irq() which synchronizes only the hard interrupt handler. Both are needed to replace the homebrewn thread handling in the mmc/sdhci code. - New irq chip callbacks to allow proper support for GPIO based irqs. The GPIO based interrupts need to request/release GPIO resources from request/free_irq. - A few new ARM interrupt chips. No revolutionary new hardware, just differently wreckaged variations of the scheme. - Small improvments, cleanups and updates all over the place" I was hoping that that trainwreck engineering contest was a April Fools' joke. But no. * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits) irqchip: sun7i/sun6i: Disable NMI before registering the handler ARM: sun7i/sun6i: dts: Fix IRQ number for sun6i NMI controller ARM: sun7i/sun6i: irqchip: Update the documentation ARM: sun7i/sun6i: dts: Add NMI irqchip support ARM: sun7i/sun6i: irqchip: Add irqchip driver for NMI controller genirq: Export symbol no_action() arm: omap: Fix typo in ams-delta-fiq.c m68k: atari: Fix the last kernel_stat.h fallout irqchip: sun4i: Simplify sun4i_irq_ack irqchip: sun4i: Use handle_fasteoi_irq for all interrupts genirq: procfs: Make smp_affinity values go+r softirq: Add linux/irq.h to make it compile again m68k: amiga: Add linux/irq.h to make it compile again irqchip: sun4i: Don't ack IRQs > 0, fix acking of IRQ 0 irqchip: sun4i: Fix a comment about mask register initialization irqchip: sun4i: Fix irq 0 not working genirq: Add a new IRQCHIP_EOI_THREADED flag genirq: Document IRQCHIP_ONESHOT_SAFE flag ARM: sunxi: dt: Convert to the new irq controller compatibles irqchip: sunxi: Change compatibles ...
2014-04-01Merge branch 'x86-threadinfo-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 threadinfo changes from Ingo Molnar: "The main change here is the consolidation/unification of 32 and 64 bit thread_info handling methods, from Steve Rostedt" * 'x86-threadinfo-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, threadinfo: Redo "x86: Use inline assembler to get sp" x86: Clean up dumpstack_64.c code x86: Keep thread_info on thread stack in x86_32 x86: Prepare removal of previous_esp from i386 thread_info structure x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386 x86: Nuke the supervisor_stack field in i386 thread_info
2014-04-01Merge branch 'timers-nohz-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Ingo Molnar: "The main purpose is to fix a full dynticks bug related to virtualization, where steal time accounting appears to be zero in /proc/stat even after a few seconds of competing guests running busy loops in a same host CPU. It's not a regression though as it was there since the beginning. The other commits are preparatory work to fix the bug and various cleanups" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arch: Remove stub cputime.h headers sched: Remove needless round trip nsecs <-> tick conversion of steal time cputime: Fix jiffies based cputime assumption on steal accounting cputime: Bring cputime -> nsecs conversion cputime: Default implementation of nsecs -> cputime conversion cputime: Fix nsecs_to_cputime() return type cast
2014-04-01Merge branch 'x86-cpufeature-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpufeature update from Ingo Molnar: "Two refinements to clflushopt support" * 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpufeature: If we disable CLFLUSH, we should disable CLFLUSHOPT x86, cpufeature: Rename X86_FEATURE_CLFLSH to X86_FEATURE_CLFLUSH
2014-03-31Merge branch 'compat' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 compat wrapper rework from Heiko Carstens: "S390 compat system call wrapper simplification work. The intention of this work is to get rid of all hand written assembly compat system call wrappers on s390, which perform proper sign or zero extension, or pointer conversion of compat system call parameters. Instead all of this should be done with C code eg by using Al's COMPAT_SYSCALL_DEFINEx() macro. Therefore all common code and s390 specific compat system calls have been converted to the COMPAT_SYSCALL_DEFINEx() macro. In order to generate correct code all compat system calls may only have eg compat_ulong_t parameters, but no unsigned long parameters. Those patches which change parameter types from unsigned long to compat_ulong_t parameters are separate in this series, but shouldn't cause any harm. The only compat system calls which intentionally have 64 bit parameters (preadv64 and pwritev64) in support of the x86/32 ABI haven't been changed, but are now only available if an architecture defines __ARCH_WANT_COMPAT_SYS_PREADV64/PWRITEV64. System calls which do not have a compat variant but still need proper zero extension on s390, like eg "long sys_brk(unsigned long brk)" will get a proper wrapper function with the new s390 specific COMPAT_SYSCALL_WRAPx() macro: COMPAT_SYSCALL_WRAP1(brk, unsigned long, brk); which generates the following code (simplified): asmlinkage long sys_brk(unsigned long brk); asmlinkage long compat_sys_brk(long brk) { return sys_brk((u32)brk); } Given that the C file which contains all the COMPAT_SYSCALL_WRAP lines includes both linux/syscall.h and linux/compat.h, it will generate build errors, if the declaration of sys_brk() doesn't match, or if there exists a non-matching compat_sys_brk() declaration. In addition this will intentionally result in a link error if somewhere else a compat_sys_brk() function exists, which probably should have been used instead. Two more BUILD_BUG_ONs make sure the size and type of each compat syscall parameter can be handled correctly with the s390 specific macros. I converted the compat system calls step by step to verify the generated code is correct and matches the previous code. In fact it did not always match, however that was always a bug in the hand written asm code. In result we get less code, less bugs, and much more sanity checking" * 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (44 commits) s390/compat: add copyright statement compat: include linux/unistd.h within linux/compat.h s390/compat: get rid of compat wrapper assembly code s390/compat: build error for large compat syscall args mm/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types kexec/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types ipc/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types ipc/compat: convert to COMPAT_SYSCALL_DEFINE fs/compat: convert to COMPAT_SYSCALL_DEFINE security/compat: convert to COMPAT_SYSCALL_DEFINE mm/compat: convert to COMPAT_SYSCALL_DEFINE net/compat: convert to COMPAT_SYSCALL_DEFINE kernel/compat: convert to COMPAT_SYSCALL_DEFINE fs/compat: optional preadv64/pwrite64 compat system calls ipc/compat_sys_msgrcv: change msgtyp type from long to compat_long_t s390/compat: partial parameter conversion within syscall wrappers s390/compat: automatic zero, sign and pointer conversion of syscalls s390/compat: add sync_file_range and fallocate compat syscalls ...
2014-03-31Merge branch 'x86-efi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI changes from Ingo Molnar: "The main changes: - Add debug code to the dump EFI pagetable - Borislav Petkov - Make 1:1 runtime mapping robust when booting on machines with lots of memory - Borislav Petkov - Move the EFI facilities bits out of 'x86_efi_facility' and into efi.flags which is the standard architecture independent place to keep EFI state, by Matt Fleming. - Add 'EFI mixed mode' support: this allows 64-bit kernels to be booted from 32-bit firmware. This needs a bootloader that supports the 'EFI handover protocol'. By Matt Fleming" * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86, efi: Abstract x86 efi_early calls x86/efi: Restore 'attr' argument to query_variable_info() x86/efi: Rip out phys_efi_get_time() x86/efi: Preserve segment registers in mixed mode x86/boot: Fix non-EFI build x86, tools: Fix up compiler warnings x86/efi: Re-disable interrupts after calling firmware services x86/boot: Don't overwrite cr4 when enabling PAE x86/efi: Wire up CONFIG_EFI_MIXED x86/efi: Add mixed runtime services support x86/efi: Firmware agnostic handover entry points x86/efi: Split the boot stub into 32/64 code paths x86/efi: Add early thunk code to go from 64-bit to 32-bit x86/efi: Build our own EFI services pointer table efi: Add separate 32-bit/64-bit definitions x86/efi: Delete dead code when checking for non-native x86/mm/pageattr: Always dump the right page table in an oops x86, tools: Consolidate #ifdef code x86/boot: Cleanup header.S by removing some #ifdefs efi: Use NULL instead of 0 for pointer ...
2014-03-31Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu handling changes from Ingo Molnar: "Bigger changes: - Intel CPU hardware-enablement: new vector instructions support (AVX-512), by Fenghua Yu. - Support the clflushopt instruction and use it in appropriate places. clflushopt is similar to clflush but with more relaxed ordering, by Ross Zwisler. - MSR accessor cleanups, by Borislav Petkov. - 'forcepae' boot flag for those who have way too much time to spend on way too old Pentium-M systems and want to live way too dangerously, by Chris Bainbridge" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC x86, intel: Make MSR_IA32_MISC_ENABLE bit constants systematic x86, Intel: Convert to the new bit access MSR accessors x86, AMD: Convert to the new bit access MSR accessors x86: Add another set of MSR accessor functions x86: Use clflushopt in drm_clflush_virt_range x86: Use clflushopt in drm_clflush_page x86: Use clflushopt in clflush_cache_range x86: Add support for the clflushopt instruction x86, AVX-512: Enable AVX-512 States Context Switch x86, AVX-512: AVX-512 Feature Detection
2014-03-31Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic changes from Ingo Molnar: "An xAPIC CPU hotplug race fix, plus cleanups and minor fixes" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Plug racy xAPIC access of CPU hotplug code x86/apic: Always define nox2apic and define it as initdata x86/apic: Remove unused function prototypes x86/apic: Switch wait_for_init_deassert() to a bool flag x86/apic: Only use default_wait_for_init_deassert()
2014-03-31Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "Bigger changes: - sched/idle restructuring: they are WIP preparation for deeper integration between the scheduler and idle state selection, by Nicolas Pitre. - add NUMA scheduling pseudo-interleaving, by Rik van Riel. - optimize cgroup context switches, by Peter Zijlstra. - RT scheduling enhancements, by Thomas Gleixner. The rest is smaller changes, non-urgnt fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits) sched: Clean up the task_hot() function sched: Remove double calculation in fix_small_imbalance() sched: Fix broken setscheduler() sparc64, sched: Remove unused sparc64_multi_core sched: Remove unused mc_capable() and smt_capable() sched/numa: Move task_numa_free() to __put_task_struct() sched/fair: Fix endless loop in idle_balance() sched/core: Fix endless loop in pick_next_task() sched/fair: Push down check for high priority class task into idle_balance() sched/rt: Fix picking RT and DL tasks from empty queue trace: Replace hardcoding of 19 with MAX_NICE sched: Guarantee task priority in pick_next_task() sched/idle: Remove stale old file sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHED cpuidle/arm64: Remove redundant cpuidle_idle_call() cpuidle/powernv: Remove redundant cpuidle_idle_call() sched, nohz: Exclude isolated cores from load balancing sched: Fix select_task_rq_fair() description comments workqueue: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE sys: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE ...
2014-03-31Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Main changes: Kernel side changes: - Add SNB/IVB/HSW client uncore memory controller support (Stephane Eranian) - Fix various x86/P4 PMU driver bugs (Don Zickus) Tooling, user visible changes: - Add several futex 'perf bench' microbenchmarks (Davidlohr Bueso) - Speed up thread map generation (Don Zickus) - Introduce 'perf kvm --list-cmds' command line option for use by scripts (Ramkumar Ramachandra) - Print the evsel name in the annotate stdio output, prep to fix support outputting annotation for multiple events, not just for the first one (Arnaldo Carvalho de Melo) - Allow setting preferred callchain method in .perfconfig (Jiri Olsa) - Show in what binaries/modules 'perf probe's are set (Masami Hiramatsu) - Support distro-style debuginfo for uprobe in 'perf probe' (Masami Hiramatsu) Tooling, internal changes and fixes: - Use tid in mmap/mmap2 events to find maps (Don Zickus) - Record the reason for filtering an address_location (Namhyung Kim) - Apply all filters to an addr_location (Namhyung Kim) - Merge al->filtered with hist_entry->filtered in report/hists (Namhyung Kim) - Fix memory leak when synthesizing thread records (Namhyung Kim) - Use ui__has_annotation() in 'report' (Namhyung Kim) - hists browser refactorings to reuse code accross UIs (Namhyung Kim) - Add support for the new DWARF unwinder library in elfutils (Jiri Olsa) - Fix build race in the generation of bison files (Jiri Olsa) - Further streamline the feature detection display, trimming it a bit to show just the libraries detected, using VF=1 gets a more verbose output, showing the less interesting feature checks as well (Jiri Olsa). - Check compatible symtab type before loading dso (Namhyung Kim) - Check return value of filename__read_debuglink() (Stephane Eranian) - Move some hashing and fs related code from tools/perf/util/ to tools/lib/ so that it can be used by more tools/ living utilities (Borislav Petkov) - Prepare DWARF unwinding code for using an elfutils alternative unwinding library (Jiri Olsa) - Fix DWARF unwind max_stack processing (Jiri Olsa) - Add dwarf unwind 'perf test' entry (Jiri Olsa) - 'perf probe' improvements including memory leak fixes, sharing the intlist class with other tools, uprobes/kprobes code sharing and use of ref_reloc_sym (Masami Hiramatsu) - Shorten sample symbol resolving by adding cpumode to struct addr_location (Arnaldo Carvalho de Melo) - Fix synthesizing mmaps for threads (Don Zickus) - Fix invalid output on event group stdio report (Namhyung Kim) - Fixup header alignment in 'perf sched latency' output (Ramkumar Ramachandra) - Fix off-by-one error in 'perf timechart record' argv handling (Ramkumar Ramachandra) Tooling, cleanups: - Remove unused thread__find_map function (Jiri Olsa) - Remove unused simple_strtoul() function (Ramkumar Ramachandra) Tooling, documentation updates: - Update function names in debug messages (Ramkumar Ramachandra) - Update some code references in design.txt (Ramkumar Ramachandra) - Clarify load-latency information in the 'perf mem' docs (Andi Kleen) - Clarify x86 register naming in 'perf probe' docs (Andi Kleen)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (96 commits) perf tools: Remove unused simple_strtoul() function perf tools: Update some code references in design.txt perf evsel: Update function names in debug messages perf tools: Remove thread__find_map function perf annotate: Print the evsel name in the stdio output perf report: Use ui__has_annotation() perf tools: Fix memory leak when synthesizing thread records perf tools: Use tid in mmap/mmap2 events to find maps perf report: Merge al->filtered with hist_entry->filtered perf symbols: Apply all filters to an addr_location perf symbols: Record the reason for filtering an address_location perf sched: Fixup header alignment in 'latency' output perf timechart: Fix off-by-one error in 'record' argv handling perf machine: Factor machine__find_thread to take tid argument perf tools: Speed up thread map generation perf kvm: introduce --list-cmds for use by scripts perf ui hists: Pass evsel to hpp->header/width functions explicitly perf symbols: Introduce thread__find_cpumode_addr_location perf session: Change header.misc dump from decimal to hex perf ui/tui: Reuse generic __hpp__fmt() code ...
2014-03-31Merge branch 'core-locking-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking updates from Ingo Molnar: "The biggest change is the MCS spinlock generalization changes from Tim Chen, Peter Zijlstra, Jason Low et al. There's also lockdep fixes/enhancements from Oleg Nesterov, in particular a false negative fix related to lockdep_set_novalidate_class() usage" * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) locking/mutex: Fix debug checks locking/mutexes: Add extra reschedule point locking/mutexes: Introduce cancelable MCS lock for adaptive spinning locking/mutexes: Unlock the mutex without the wait_lock locking/mutexes: Modify the way optimistic spinners are queued locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner() locking: Move mcs_spinlock.h into kernel/locking/ m68k: Skip futex_atomic_cmpxchg_inatomic() test futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test Revert "sched/wait: Suppress Sparse 'variable shadowing' warning" lockdep: Change lockdep_set_novalidate_class() to use _and_name lockdep: Change mark_held_locks() to check hlock->check instead of lockdep_no_validate lockdep: Don't create the wrong dependency on hlock->check == 0 lockdep: Make held_lock->check and "int check" argument bool locking/mcs: Allow architecture specific asm files to be used for contended case locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order sched/wait: Suppress Sparse 'variable shadowing' warning hung_task/Documentation: Fix hung_task_warnings description locking/mcs: Allow architectures to hook in to contended paths locking/mcs: Micro-optimize the MCS code, add extra comments ...
2014-03-28x86: fix boot on uniprocessor systemsArtem Fetishev
On x86 uniprocessor systems topology_physical_package_id() returns -1 which causes rapl_cpu_prepare() to leave rapl_pmu variable uninitialized which leads to GPF in rapl_pmu_init(). See arch/x86/kernel/cpu/perf_event_intel_rapl.c. It turns out that physical_package_id and core_id can actually be retreived for uniprocessor systems too. Enabling them also fixes rapl_pmu code. Signed-off-by: Artem Fetishev <artem_fetishev@epam.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-25Revert "xen: properly account for _PAGE_NUMA during xen pte translations"David Vrabel
This reverts commit a9c8e4beeeb64c22b84c803747487857fe424b68. PTEs in Xen PV guests must contain machine addresses if _PAGE_PRESENT is set and pseudo-physical addresses is _PAGE_PRESENT is clear. This is because during a domain save/restore (migration) the page table entries are "canonicalised" and uncanonicalised". i.e., MFNs are converted to PFNs during domain save so that on a restore the page table entries may be rewritten with the new MFNs on the destination. This canonicalisation is only done for PTEs that are present. This change resulted in writing PTEs with MFNs if _PAGE_PROTNONE (or _PAGE_NUMA) was set but _PAGE_PRESENT was clear. These PTEs would be migrated as-is which would result in unexpected behaviour in the destination domain. Either a) the MFN would be translated to the wrong PFN/page; b) setting the _PAGE_PRESENT bit would clear the PTE because the MFN is no longer owned by the domain; or c) the present bit would not get set. Symptoms include "Bad page" reports when munmapping after migrating a domain. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> [3.12+]
2014-03-20x86, vdso: Finish removing VDSO32_PRELINKAndy Lutomirski
It's a declaration of a nonexistent symbol. We can get rid of the 64-bit versions, too, but that's more intrusive. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/2ce2ce18447d8a0b78d44a278a066b6c0af06b32.1395366931.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-03-20x86, vdso: Move more vdso definitions into vdso.hAndy Lutomirski
This fixes the Xen build and gets rid of a silly header file. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/1df77311795aff75f5742c787d277518314a38d3.1395366931.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-03-20x86: Load the 32-bit vdso in place, just like the 64-bit vdsosAndy Lutomirski
This replaces a decent amount of incomprehensible and buggy code with much more straightforward code. It also brings the 32-bit vdso more in line with the 64-bit vdsos, so maybe someday they can share even more code. This wastes a small amount of kernel .data and .text space, but it avoids a couple of allocations on startup, so it should be more or less a wash memory-wise. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/b8093933fad09ce181edb08a61dcd5d2592e9814.1395352498.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-20audit: use uapi/linux/audit.h for AUDIT_ARCH declarationsEric Paris
The syscall.h headers were including linux/audit.h but really only needed the uapi/linux/audit.h to get the requisite defines. Switch to the uapi headers. Signed-off-by: Eric Paris <eparis@redhat.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@linux-mips.org Cc: linux-s390@vger.kernel.org Cc: x86@kernel.org
2014-03-20syscall_get_arch: remove useless function argumentsEric Paris
Every caller of syscall_get_arch() uses current for the task and no implementors of the function need args. So just get rid of both of those things. Admittedly, since these are inline functions we aren't wasting stack space, but it just makes the prototypes better. Signed-off-by: Eric Paris <eparis@redhat.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@linux-mips.org Cc: linux390@de.ibm.com Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linux-arch@vger.kernel.org
2014-03-19random: Add arch_has_random[_seed]()H. Peter Anvin
Add predicate functions for having arch_get_random[_seed]*(). The only current use is to avoid the loop in arch_random_refill() when arch_get_random_seed_long() is unavailable. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-03-19x86, random: Enable the RDSEED instructionH. Peter Anvin
Upcoming Intel silicon adds a new RDSEED instruction, which is similar to RDRAND but provides a stronger guarantee: unlike RDRAND, RDSEED will always reseed the PRNG from the true random number source between each read. Thus, the output of RDSEED is guaranteed to be 100% entropic, unlike RDRAND which is only architecturally guaranteed to be 1/512 entropic (although in practice is much more.) The RDSEED instruction takes the same time to execute as RDRAND, but RDSEED unlike RDRAND can legitimately return failure (CF=0) due to entropy exhaustion if too many threads on too many cores are hammering the RDSEED instruction at the same time. Therefore, we have to be more conservative and only use it in places where we can tolerate failures. This patch introduces the primitives arch_get_random_seed_{int,long}() but does not use it yet. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-03-18x86, vdso: Add 32 bit VDSO time support for 64 bit kernelStefani Seibold
This patch add the VDSO time support for the IA32 Emulation Layer. Due the nature of the kernel headers and the LP64 compiler where the size of a long and a pointer differs against a 32 bit compiler, there is some type hacking necessary for optimal performance. The vsyscall_gtod_data struture must be a rearranged to serve 32- and 64-bit code access at the same time: - The seqcount_t was replaced by an unsigned, this makes the vsyscall_gtod_data intedepend of kernel configuration and internal functions. - All kernel internal structures are replaced by fix size elements which works for 32- and 64-bit access - The inner struct clock was removed to pack the whole struct. The "unsigned seq" would be handled by functions derivated from seqcount_t. Signed-off-by: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/1395094933-14252-11-git-send-email-stefani@seibold.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-18x86, vdso: Add 32 bit VDSO time support for 32 bit kernelStefani Seibold
This patch add the time support for 32 bit a VDSO to a 32 bit kernel. For 32 bit programs running on a 32 bit kernel, the same mechanism is used as for 64 bit programs running on a 64 bit kernel. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/1395094933-14252-10-git-send-email-stefani@seibold.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-18x86, vdso: Patch alternatives in the 32-bit VDSOAndy Lutomirski
We need the alternatives mechanism for rdtsc_barrier() to work. Signed-off-by: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/1395094933-14252-9-git-send-email-stefani@seibold.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-18x86, vdso: Introduce VVAR marco for vdso32Stefani Seibold
This patch revamps the vvar.h for introduce the VVAR macro for vdso32. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/1395094933-14252-8-git-send-email-stefani@seibold.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-18x86, vdso: Make vsyscall_gtod_data handling x86 genericStefani Seibold
This patch move the vsyscall_gtod_data handling out of vsyscall_64.c into an additonal file vsyscall_gtod.c to make the functionality available for x86 32 bit kernel. It also adds a new vsyscall_32.c which setup the VVAR page. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Stefani Seibold <stefani@seibold.net> Link: http://lkml.kernel.org/r/1395094933-14252-2-git-send-email-stefani@seibold.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-18xen/grant-table: Refactor gnttab_[un]map_refs to avoid m2p_overrideZoltan Kiss
The grant mapping API does m2p_override unnecessarily: only gntdev needs it, for blkback and future netback patches it just cause a lock contention, as those pages never go to userspace. Therefore this series does the following: - the bulk of the original function (everything after the mapping hypercall) is moved to arch-dependent set/clear_foreign_p2m_mapping - the "if (xen_feature(XENFEAT_auto_translated_physmap))" branch goes to ARM - therefore the ARM function could be much smaller, the m2p_override stubs could be also removed - on x86 the set_phys_to_machine calls were moved up to this new funcion from m2p_override functions - and m2p_override functions are only called when there is a kmap_ops param It also removes a stray space from arch/x86/include/asm/xen/page.h. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Suggested-by: Anthony Liguori <aliguori@amazon.com> Suggested-by: David Vrabel <david.vrabel@citrix.com> Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-03-13x86_32, mm: Remove user bit from identity map PDEAndy Lutomirski
The only reason that the user bit was set was to support userspace access to the compat vDSO in the fixmap. The compat vDSO is gone, so the user bit can be removed. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/e240a977f3c7cbd525a091fd6521499ec4b8e94f.1394751608.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-13x86, vdso: Remove compat vdso supportAndy Lutomirski
The compat vDSO is a complicated hack that's needed to maintain compatibility with a small range of glibc versions. This removes it and replaces it with a much simpler hack: a config option to disable the 32-bit vDSO by default. This also changes the default value of CONFIG_COMPAT_VDSO to n -- users configuring kernels from scratch almost certainly want that choice. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/4bb4690899106eb11430b1186d5cc66ca9d1660c.1394751608.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-13x86, intel: Make MSR_IA32_MISC_ENABLE bit constants systematicH. Peter Anvin
Replace somewhat arbitrary constants for bits in MSR_IA32_MISC_ENABLE with verbose but systematic ones. Add _BIT defines for all the rest of them, too. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-13x86, Intel: Convert to the new bit access MSR accessorsBorislav Petkov
... and save some lines of code. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1394384725-10796-4-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-13x86: Add another set of MSR accessor functionsBorislav Petkov
We very often need to set or clear a bit in an MSR as a result of doing some sort of a hardware configuration. Add generic versions of that repeated functionality in order to save us a bunch of duplicated code in the early CPU vendor detection/config code. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1394384725-10796-2-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-13arch: Remove stub cputime.h headersFrederic Weisbecker
Many architectures have a stub cputime.h that only include the default cputime.h Lets remove the useless headers, we only need to mention that we want the default headers on the Kbuild files. Cc: Archs <linux-arch@vger.kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>