Age | Commit message (Collapse) | Author |
|
Ben Tebulin reported:
"Since v3.7.2 on two independent machines a very specific Git
repository fails in 9/10 cases on git-fsck due to an SHA1/memory
failures. This only occurs on a very specific repository and can be
reproduced stably on two independent laptops. Git mailing list ran
out of ideas and for me this looks like some very exotic kernel issue"
and bisected the failure to the backport of commit 53a59fc67f97 ("mm:
limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT").
That commit itself is not actually buggy, but what it does is to make it
much more likely to hit the partial TLB invalidation case, since it
introduces a new case in tlb_next_batch() that previously only ever
happened when running out of memory.
The real bug is that the TLB gather virtual memory range setup is subtly
buggered. It was introduced in commit 597e1c3580b7 ("mm/mmu_gather:
enable tlb flush range in generic mmu_gather"), and the range handling
was already fixed at least once in commit e6c495a96ce0 ("mm: fix the TLB
range flushed when __tlb_remove_page() runs out of slots"), but that fix
was not complete.
The problem with the TLB gather virtual address range is that it isn't
set up by the initial tlb_gather_mmu() initialization (which didn't get
the TLB range information), but it is set up ad-hoc later by the
functions that actually flush the TLB. And so any such case that forgot
to update the TLB range entries would potentially miss TLB invalidates.
Rather than try to figure out exactly which particular ad-hoc range
setup was missing (I personally suspect it's the hugetlb case in
zap_huge_pmd(), which didn't have the same logic as zap_pte_range()
did), this patch just gets rid of the problem at the source: make the
TLB range information available to tlb_gather_mmu(), and initialize it
when initializing all the other tlb gather fields.
This makes the patch larger, but conceptually much simpler. And the end
result is much more understandable; even if you want to play games with
partial ranges when invalidating the TLB contents in chunks, now the
range information is always there, and anybody who doesn't want to
bother with it won't introduce subtle bugs.
Ben verified that this fixes his problem.
Reported-bisected-and-tested-by: Ben Tebulin <tebulin@googlemail.com>
Build-testing-by: Stephen Rothwell <sfr@canb.auug.org.au>
Build-testing-by: Richard Weinberger <richard.weinberger@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull ia64 IOH hotplug fixes from Tony Luck:
"Series to fix IOH hotplug in ia64"
* tag 'please-pull-root_bus_hotplug' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
PCI: Replace printks with appropriate pr_*()
PCI/IA64: introduce probe_pci_root_info() to manage _CRS resource
PCI/IA64: Add host bridge resource release for _CRS path
PCI/IA64: fix memleak for create pci root bus fail
PCI/IA64: Allocate pci_root_info instead of using stack
PCI/IA64: embed pci hostbridge resources into pci_root_info
PCI/IA64: SN: use normal resource instead of pci_window
PCI/IA64: SN: remove sn_pci_window_fixup()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull WW mutex support from Ingo Molnar:
"This tree adds support for wound/wait style locks, which the graphics
guys would like to make use of in the TTM graphics subsystem.
Wound/wait mutexes are used when other multiple lock acquisitions of a
similar type can be done in an arbitrary order. The deadlock handling
used here is called wait/wound in the RDBMS literature: The older
tasks waits until it can acquire the contended lock. The younger
tasks needs to back off and drop all the locks it is currently
holding, ie the younger task is wounded.
See this LWN.net description of W/W mutexes:
https://lwn.net/Articles/548909/
The comments there outline specific usecases for this facility (which
have already been implemented for the DRM tree).
Also see Documentation/ww-mutex-design.txt for more details"
* 'core-mutexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking-selftests: Handle unexpected failures more strictly
mutex: Add more w/w tests to test EDEADLK path handling
mutex: Add more tests to lib/locking-selftest.c
mutex: Add w/w tests to lib/locking-selftest.c
mutex: Add w/w mutex slowpath debugging
mutex: Add support for wound/wait style locks
arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS patches (part 1) from Al Viro:
"The major change in this pile is ->readdir() replacement with
->iterate(), dealing with ->f_pos races in ->readdir() instances for
good.
There's a lot more, but I'd prefer to split the pull request into
several stages and this is the first obvious cutoff point."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (67 commits)
[readdir] constify ->actor
[readdir] ->readdir() is gone
[readdir] convert ecryptfs
[readdir] convert coda
[readdir] convert ocfs2
[readdir] convert fatfs
[readdir] convert xfs
[readdir] convert btrfs
[readdir] convert hostfs
[readdir] convert afs
[readdir] convert ncpfs
[readdir] convert hfsplus
[readdir] convert hfs
[readdir] convert befs
[readdir] convert cifs
[readdir] convert freevxfs
[readdir] convert fuse
[readdir] convert hpfs
reiserfs: switch reiserfs_readdir_dentry to inode
reiserfs: is_privroot_deh() needs only directory inode, actually
...
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This will allow me to call functions that have multiple
arguments if fastpath fails. This is required to support ticket
mutexes, because they need to be able to pass an extra argument
to the fail function.
Originally I duplicated the functions, by adding
__mutex_fastpath_lock_retval_arg. This ended up being just a
duplication of the existing function, so a way to test if
fastpath was called ended up being better.
This also cleaned up the reservation mutex patch some by being
able to call an atomic_set instead of atomic_xchg, and making it
easier to detect if the wrong unlock function was previously
used.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: robclark@gmail.com
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130620113105.4001.83929.stgit@patser
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
If pci_create_root_bus() return fail, we should release
pci root info, pci controller etc.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-ia64@vger.kernel.org
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Currently, pcibios_resource_to_bus() and pcibios_bus_to_resource()
functions use pci_host_bridge to translate bus side address from/to
cpu side address. The pci_window in pci_controller never be used again.
So we remove pci_window in pci_controller and embed hostbridge resource
into pci_root_info. Bjorn suggested to implement hostbridge resources
release in IA64 like in X86, this patch is to prepare for that.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Thierry Reding <thierry.reding@avionic-design.de>
Cc: linux-ia64@vger.kernel.org
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
asm/kregs.h isn't always included first, so we need an explicit include.
[Fix build breakage introduced by f21afc25f9ed45b8ffe200d0f071b0caec3ed2ef
smp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu().]
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Since the introduction of preemptible mmu_gather TLB fast mode has been
broken. TLB fast mode relies on there being absolutely no concurrency;
it frees pages first and invalidates TLBs later.
However now we can get concurrency and stuff goes *bang*.
This patch removes all tlb_fast_mode() code; it was found the better
option vs trying to patch the hole by entangling tlb invalidation with
the scheduler.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull kvm updates from Gleb Natapov:
"Highlights of the updates are:
general:
- new emulated device API
- legacy device assignment is now optional
- irqfd interface is more generic and can be shared between arches
x86:
- VMCS shadow support and other nested VMX improvements
- APIC virtualization and Posted Interrupt hardware support
- Optimize mmio spte zapping
ppc:
- BookE: in-kernel MPIC emulation with irqfd support
- Book3S: in-kernel XICS emulation (incomplete)
- Book3S: HV: migration fixes
- BookE: more debug support preparation
- BookE: e6500 support
ARM:
- reworking of Hyp idmaps
s390:
- ioeventfd for virtio-ccw
And many other bug fixes, cleanups and improvements"
* tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
kvm: Add compat_ioctl for device control API
KVM: x86: Account for failing enable_irq_window for NMI window request
KVM: PPC: Book3S: Add API for in-kernel XICS emulation
kvm/ppc/mpic: fix missing unlock in set_base_addr()
kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
kvm/ppc/mpic: remove users
kvm/ppc/mpic: fix mmio region lists when multiple guests used
kvm/ppc/mpic: remove default routes from documentation
kvm: KVM_CAP_IOMMU only available with device assignment
ARM: KVM: iterate over all CPUs for CPU compatibility check
KVM: ARM: Fix spelling in error message
ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
KVM: ARM: Fix API documentation for ONE_REG encoding
ARM: KVM: promote vfp_host pointer to generic host cpu context
ARM: KVM: add architecture specific hook for capabilities
ARM: KVM: perform HYP initilization for hotplugged CPUs
ARM: KVM: switch to a dual-step HYP init code
ARM: KVM: rework HYP page table freeing
ARM: KVM: enforce maximum size for identity mapped code
ARM: KVM: move to a KVM provided HYP idmap
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull compat cleanup from Al Viro:
"Mostly about syscall wrappers this time; there will be another pile
with patches in the same general area from various people, but I'd
rather push those after both that and vfs.git pile are in."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
syscalls.h: slightly reduce the jungles of macros
get rid of union semop in sys_semctl(2) arguments
make do_mremap() static
sparc: no need to sign-extend in sync_file_range() wrapper
ppc compat wrappers for add_key(2) and request_key(2) are pointless
x86: trim sys_ia32.h
x86: sys32_kill and sys32_mprotect are pointless
get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC
merge compat sys_ipc instances
consolidate compat lookup_dcookie()
convert vmsplice to COMPAT_SYSCALL_DEFINE
switch getrusage() to COMPAT_SYSCALL_DEFINE
switch epoll_pwait to COMPAT_SYSCALL_DEFINE
convert sendfile{,64} to COMPAT_SYSCALL_DEFINE
switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE
make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect
make HAVE_SYSCALL_WRAPPERS unconditional
consolidate cond_syscall and SYSCALL_ALIAS declarations
teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long
get rid of duplicate logics in __SC_....[1-6] definitions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP/hotplug changes from Ingo Molnar:
"This is a pretty large, multi-arch series unifying and generalizing
the various disjunct pieces of idle routines that architectures have
historically copied from each other and have grown in random, wildly
inconsistent and sometimes buggy directions:
101 files changed, 455 insertions(+), 1328 deletions(-)
this went through a number of review and test iterations before it was
committed, it was tested on various architectures, was exposed to
linux-next for quite some time - nevertheless it might cause problems
on architectures that don't read the mailing lists and don't regularly
test linux-next.
This cat herding excercise was motivated by the -rt kernel, and was
brought to you by Thomas "the Whip" Gleixner."
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
idle: Remove GENERIC_IDLE_LOOP config switch
um: Use generic idle loop
ia64: Make sure interrupts enabled when we "safe_halt()"
sparc: Use generic idle loop
idle: Remove unused ARCH_HAS_DEFAULT_IDLE
bfin: Fix typo in arch_cpu_idle()
xtensa: Use generic idle loop
x86: Use generic idle loop
unicore: Use generic idle loop
tile: Use generic idle loop
tile: Enter idle with preemption disabled
sh: Use generic idle loop
score: Use generic idle loop
s390: Use generic idle loop
powerpc: Use generic idle loop
parisc: Use generic idle loop
openrisc: Use generic idle loop
mn10300: Use generic idle loop
mips: Use generic idle loop
microblaze: Use generic idle loop
...
|
|
Commit abf09bed3cce ("s390/mm: implement software dirty bits")
introduced another difference in the pte layout vs. the pmd layout on
s390, thoroughly breaking the s390 support for hugetlbfs. This requires
replacing some more pte_xxx functions in mm/hugetlbfs.c with a
huge_pte_xxx version.
This patch introduces those huge_pte_xxx functions and their generic
implementation in asm-generic/hugetlb.h, which will now be included on
all architectures supporting hugetlbfs apart from s390. This change
will be a no-op for those architectures.
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.cz> [for !s390 parts]
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We changed a few things in non-ia64 code paths. This patch blindly applies
the changes to the ia64 code as well, hoping it proves useful in case anyone
revives the ia64 kvm code.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
In commit d166991234347215dc23fc9dc15a63a83a1a54e1
idle: Implement generic idle function
Thomas Gleixner cleaned up many things but perturbed some
fragile code that was keeping ia64 alive. So we started
seeing:
WARNING: at kernel/cpu/idle.c:94 cpu_idle_loop+0x360/0x380()
and other unpleasantness like system hangs during boot.
We really shouldn't ever halt with interrupts disabled.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: magnus.damm@gmail.com
Link: http://lkml.kernel.org/r/516d9a0c26048eae9c@agluck-desk.sc.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Move it to a common place. Preparatory patch for implementing
set/clear for the idle need_resched poll implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130321215233.446034505@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
numa_clear_node() function is not implemented under IA64,
it will be called in unmap_cpu_on_node() in mm/memory_hotplug.c.
This cause build error under IA64, this patch adds numa_clear_node()
in IA64 to fix this problem.
[Added __cpuinit notation to numa_clear_node() to keep linker happy -Tony]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Back 2010 during a revamp of the irq code some initializations
were moved from ia64_mca_init() to ia64_mca_late_init() in
commit c75f2aa13f5b268aba369b5dc566088b5194377c
Cannot use register_percpu_irq() from ia64_mca_init()
But this was hideously wrong. First of all these initializations
are now down far too late. Specifically after all the other cpus
have been brought up and initialized their own CMC vectors from
smp_callin(). Also ia64_mca_late_init() may be called from any cpu
so the line:
ia64_mca_cmc_vector_setup(); /* Setup vector on BSP */
is generally not executed on the BSP, and so the CMC vector isn't
setup at all on that processor.
Make use of the arch_early_irq_init() hook to get this code executed
at just the right moment: not too early, not too late.
Reported-by: Fred Hartnett <fred.hartnett@hp.com>
Tested-by: Fred Hartnett <fred.hartnett@hp.com>
Cc: stable@kernel.org # v2.6.37+
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
The Linux Kernel contains some inline assembly source code which has
wrong asm register constraints in arch/ia64/include/asm/futex.h.
I observed this on Kernel 3.2.23 but it is also true on the most
recent Kernel 3.9-rc1.
File arch/ia64/include/asm/futex.h:
static inline int
futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
u32 oldval, u32 newval)
{
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;
{
register unsigned long r8 __asm ("r8");
unsigned long prev;
__asm__ __volatile__(
" mf;; \n"
" mov %0=r0 \n"
" mov ar.ccv=%4;; \n"
"[1:] cmpxchg4.acq %1=[%2],%3,ar.ccv \n"
" .xdata4 \"__ex_table\", 1b-., 2f-. \n"
"[2:]"
: "=r" (r8), "=r" (prev)
: "r" (uaddr), "r" (newval),
"rO" ((long) (unsigned) oldval)
: "memory");
*uval = prev;
return r8;
}
}
The list of output registers is
: "=r" (r8), "=r" (prev)
The constraint "=r" means that the GCC has to maintain that these vars
are in registers and contain valid info when the program flow leaves
the assembly block (output registers).
But "=r" also means that GCC can put them in registers that are used
as input registers. Input registers are uaddr, newval, oldval on the
example.
The second assembly instruction
" mov %0=r0 \n"
is the first one which writes to a register; it sets %0 to 0. %0 means
the first register operand; it is r8 here. (The r0 is read-only and
always 0 on the Itanium; it can be used if an immediate zero value is
needed.)
This instruction might overwrite one of the other registers which are
still needed.
Whether it really happens depends on how GCC decides what registers it
uses and how it optimizes the code.
The objdump utility can give us disassembly.
The futex_atomic_cmpxchg_inatomic() function is inline, so we have to
look for a module that uses the funtion. This is the
cmpxchg_futex_value_locked() function in
kernel/futex.c:
static int cmpxchg_futex_value_locked(u32 *curval, u32 __user *uaddr,
u32 uval, u32 newval)
{
int ret;
pagefault_disable();
ret = futex_atomic_cmpxchg_inatomic(curval, uaddr, uval, newval);
pagefault_enable();
return ret;
}
Now the disassembly. At first from the Kernel package 3.2.23 which has
been compiled with GCC 4.4, remeber this Kernel seemed to work:
objdump -d linux-3.2.23/debian/build/build_ia64_none_mckinley/kernel/futex.o
0000000000000230 <cmpxchg_futex_value_locked>:
230: 0b 18 80 1b 18 21 [MMI] adds r3=3168,r13;;
236: 80 40 0d 00 42 00 adds r8=40,r3
23c: 00 00 04 00 nop.i 0x0;;
240: 0b 50 00 10 10 10 [MMI] ld4 r10=[r8];;
246: 90 08 28 00 42 00 adds r9=1,r10
24c: 00 00 04 00 nop.i 0x0;;
250: 09 00 00 00 01 00 [MMI] nop.m 0x0
256: 00 48 20 20 23 00 st4 [r8]=r9
25c: 00 00 04 00 nop.i 0x0;;
260: 08 10 80 06 00 21 [MMI] adds r2=32,r3
266: 00 00 00 02 00 00 nop.m 0x0
26c: 02 08 f1 52 extr.u r16=r33,0,61
270: 05 40 88 00 08 e0 [MLX] addp4 r8=r34,r0
276: ff ff 0f 00 00 e0 movl r15=0xfffffffbfff;;
27c: f1 f7 ff 65
280: 09 70 00 04 18 10 [MMI] ld8 r14=[r2]
286: 00 00 00 02 00 c0 nop.m 0x0
28c: f0 80 1c d0 cmp.ltu p6,p7=r15,r16;;
290: 08 40 fc 1d 09 3b [MMI] cmp.eq p8,p9=-1,r14
296: 00 00 00 02 00 40 nop.m 0x0
29c: e1 08 2d d0 cmp.ltu p10,p11=r14,r33
2a0: 56 01 10 00 40 10 [BBB] (p10) br.cond.spnt.few 2e0
<cmpxchg_futex_value_locked+0xb0>
2a6: 02 08 00 80 21 03 (p08) br.cond.dpnt.few 2b0
<cmpxchg_futex_value_locked+0x80>
2ac: 40 00 00 41 (p06) br.cond.spnt.few 2e0
<cmpxchg_futex_value_locked+0xb0>
2b0: 0a 00 00 00 22 00 [MMI] mf;;
2b6: 80 00 00 00 42 00 mov r8=r0
2bc: 00 00 04 00 nop.i 0x0
2c0: 0b 00 20 40 2a 04 [MMI] mov.m ar.ccv=r8;;
2c6: 10 1a 85 22 20 00 cmpxchg4.acq r33=[r33],r35,ar.ccv
2cc: 00 00 04 00 nop.i 0x0;;
2d0: 10 00 84 40 90 11 [MIB] st4 [r32]=r33
2d6: 00 00 00 02 00 00 nop.i 0x0
2dc: 20 00 00 40 br.few 2f0
<cmpxchg_futex_value_locked+0xc0>
2e0: 09 40 c8 f9 ff 27 [MMI] mov r8=-14
2e6: 00 00 00 02 00 00 nop.m 0x0
2ec: 00 00 04 00 nop.i 0x0;;
2f0: 0b 58 20 1a 19 21 [MMI] adds r11=3208,r13;;
2f6: 20 01 2c 20 20 00 ld4 r18=[r11]
2fc: 00 00 04 00 nop.i 0x0;;
300: 0b 88 fc 25 3f 23 [MMI] adds r17=-1,r18;;
306: 00 88 2c 20 23 00 st4 [r11]=r17
30c: 00 00 04 00 nop.i 0x0;;
310: 11 00 00 00 01 00 [MIB] nop.m 0x0
316: 00 00 00 02 00 80 nop.i 0x0
31c: 08 00 84 00 br.ret.sptk.many b0;;
The lines
2b0: 0a 00 00 00 22 00 [MMI] mf;;
2b6: 80 00 00 00 42 00 mov r8=r0
2bc: 00 00 04 00 nop.i 0x0
2c0: 0b 00 20 40 2a 04 [MMI] mov.m ar.ccv=r8;;
2c6: 10 1a 85 22 20 00 cmpxchg4.acq r33=[r33],r35,ar.ccv
2cc: 00 00 04 00 nop.i 0x0;;
are the instructions of the assembly block.
The line
2b6: 80 00 00 00 42 00 mov r8=r0
sets the r8 register to 0 and after that
2c0: 0b 00 20 40 2a 04 [MMI] mov.m ar.ccv=r8;;
prepares the 'oldvalue' for the cmpxchg but it takes it from r8. This
is wrong.
What happened here is what I explained above: An input register is
overwritten which is still needed.
The register operand constraints in futex.h are wrong.
(The problem doesn't occur when the Kernel is compiled with GCC 4.6.)
The attached patch fixes the register operand constraints in futex.h.
The code after patching of it:
static inline int
futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
u32 oldval, u32 newval)
{
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;
{
register unsigned long r8 __asm ("r8") = 0;
unsigned long prev;
__asm__ __volatile__(
" mf;; \n"
" mov ar.ccv=%4;; \n"
"[1:] cmpxchg4.acq %1=[%2],%3,ar.ccv \n"
" .xdata4 \"__ex_table\", 1b-., 2f-. \n"
"[2:]"
: "+r" (r8), "=&r" (prev)
: "r" (uaddr), "r" (newval),
"rO" ((long) (unsigned) oldval)
: "memory");
*uval = prev;
return r8;
}
}
I also initialized the 'r8' var with the C programming language.
The _asm qualifier on the definition of the 'r8' var forces GCC to use
the r8 processor register for it.
I don't believe that we should use inline assembly for zeroing out a
local variable.
The constraint is
"+r" (r8)
what means that it is both an input register and an output register.
Note that the page fault handler will modify the r8 register which
will be the return value of the function.
The real fix is
"=&r" (prev)
The & means that GCC must not use any of the input registers to place
this output register in.
Patched the Kernel 3.2.23 and compiled it with GCC4.4:
0000000000000230 <cmpxchg_futex_value_locked>:
230: 0b 18 80 1b 18 21 [MMI] adds r3=3168,r13;;
236: 80 40 0d 00 42 00 adds r8=40,r3
23c: 00 00 04 00 nop.i 0x0;;
240: 0b 50 00 10 10 10 [MMI] ld4 r10=[r8];;
246: 90 08 28 00 42 00 adds r9=1,r10
24c: 00 00 04 00 nop.i 0x0;;
250: 09 00 00 00 01 00 [MMI] nop.m 0x0
256: 00 48 20 20 23 00 st4 [r8]=r9
25c: 00 00 04 00 nop.i 0x0;;
260: 08 10 80 06 00 21 [MMI] adds r2=32,r3
266: 20 12 01 10 40 00 addp4 r34=r34,r0
26c: 02 08 f1 52 extr.u r16=r33,0,61
270: 05 40 00 00 00 e1 [MLX] mov r8=r0
276: ff ff 0f 00 00 e0 movl r15=0xfffffffbfff;;
27c: f1 f7 ff 65
280: 09 70 00 04 18 10 [MMI] ld8 r14=[r2]
286: 00 00 00 02 00 c0 nop.m 0x0
28c: f0 80 1c d0 cmp.ltu p6,p7=r15,r16;;
290: 08 40 fc 1d 09 3b [MMI] cmp.eq p8,p9=-1,r14
296: 00 00 00 02 00 40 nop.m 0x0
29c: e1 08 2d d0 cmp.ltu p10,p11=r14,r33
2a0: 56 01 10 00 40 10 [BBB] (p10) br.cond.spnt.few 2e0
<cmpxchg_futex_value_locked+0xb0>
2a6: 02 08 00 80 21 03 (p08) br.cond.dpnt.few 2b0
<cmpxchg_futex_value_locked+0x80>
2ac: 40 00 00 41 (p06) br.cond.spnt.few 2e0
<cmpxchg_futex_value_locked+0xb0>
2b0: 0b 00 00 00 22 00 [MMI] mf;;
2b6: 00 10 81 54 08 00 mov.m ar.ccv=r34
2bc: 00 00 04 00 nop.i 0x0;;
2c0: 09 58 8c 42 11 10 [MMI] cmpxchg4.acq r11=[r33],r35,ar.ccv
2c6: 00 00 00 02 00 00 nop.m 0x0
2cc: 00 00 04 00 nop.i 0x0;;
2d0: 10 00 2c 40 90 11 [MIB] st4 [r32]=r11
2d6: 00 00 00 02 00 00 nop.i 0x0
2dc: 20 00 00 40 br.few 2f0
<cmpxchg_futex_value_locked+0xc0>
2e0: 09 40 c8 f9 ff 27 [MMI] mov r8=-14
2e6: 00 00 00 02 00 00 nop.m 0x0
2ec: 00 00 04 00 nop.i 0x0;;
2f0: 0b 88 20 1a 19 21 [MMI] adds r17=3208,r13;;
2f6: 30 01 44 20 20 00 ld4 r19=[r17]
2fc: 00 00 04 00 nop.i 0x0;;
300: 0b 90 fc 27 3f 23 [MMI] adds r18=-1,r19;;
306: 00 90 44 20 23 00 st4 [r17]=r18
30c: 00 00 04 00 nop.i 0x0;;
310: 11 00 00 00 01 00 [MIB] nop.m 0x0
316: 00 00 00 02 00 80 nop.i 0x0
31c: 08 00 84 00 br.ret.sptk.many b0;;
Much better.
There is a
270: 05 40 00 00 00 e1 [MLX] mov r8=r0
which was generated by C code r8 = 0. Below
2b6: 00 10 81 54 08 00 mov.m ar.ccv=r34
what means that oldval is no longer overwritten.
This is Debian bug#702641
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702641).
The patch is applicable on Kernel 3.9-rc1, 3.2.23 and many other versions.
Signed-off-by: Stephan Schreiber <info@fs-driver.org>
Cc: stable@vger.kernel.org
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
take them to asm/linkage.h, with default in linux/linkage.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.
The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.
Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.
PS: the next vfs pile will be xattr stuff."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Pull KVM updates from Marcelo Tosatti:
"KVM updates for the 3.9 merge window, including x86 real mode
emulation fixes, stronger memory slot interface restrictions, mmu_lock
spinlock hold time reduction, improved handling of large page faults
on shadow, initial APICv HW acceleration support, s390 channel IO
based virtio, amongst others"
* tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
Revert "KVM: MMU: lazily drop large spte"
x86: pvclock kvm: align allocation size to page size
KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr
x86 emulator: fix parity calculation for AAD instruction
KVM: PPC: BookE: Handle alignment interrupts
booke: Added DBCR4 SPR number
KVM: PPC: booke: Allow multiple exception types
KVM: PPC: booke: use vcpu reference from thread_struct
KVM: Remove user_alloc from struct kvm_memory_slot
KVM: VMX: disable apicv by default
KVM: s390: Fix handling of iscs.
KVM: MMU: cleanup __direct_map
KVM: MMU: remove pt_access in mmu_set_spte
KVM: MMU: cleanup mapping-level
KVM: MMU: lazily drop large spte
KVM: VMX: cleanup vmx_set_cr0().
KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
KVM: VMX: disable SMEP feature when guest is in non-paging mode
KVM: Remove duplicate text in api.txt
Revert "KVM: MMU: split kvm_mmu_free_page"
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull signal handling cleanups from Al Viro:
"This is the first pile; another one will come a bit later and will
contain SYSCALL_DEFINE-related patches.
- a bunch of signal-related syscalls (both native and compat)
unified.
- a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE
(fixing several potential problems with missing argument
validation, while we are at it)
- a lot of now-pointless wrappers killed
- a couple of architectures (cris and hexagon) forgot to save
altstack settings into sigframe, even though they used the
(uninitialized) values in sigreturn; fixed.
- microblaze fixes for delivery of multiple signals arriving at once
- saner set of helpers for signal delivery introduced, several
architectures switched to using those."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits)
x86: convert to ksignal
sparc: convert to ksignal
arm: switch to struct ksignal * passing
alpha: pass k_sigaction and siginfo_t using ksignal pointer
burying unused conditionals
make do_sigaltstack() static
arm64: switch to generic old sigaction() (compat-only)
arm64: switch to generic compat rt_sigaction()
arm64: switch compat to generic old sigsuspend
arm64: switch to generic compat rt_sigqueueinfo()
arm64: switch to generic compat rt_sigpending()
arm64: switch to generic compat rt_sigprocmask()
arm64: switch to generic sigaltstack
sparc: switch to generic old sigsuspend
sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE
sparc: kill sign-extending wrappers for native syscalls
kill sparc32_open()
sparc: switch to use of generic old sigaction
sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE
mips: switch to generic sys_fork() and sys_clone()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael Wysocki:
- Rework of the ACPI namespace scanning code from Rafael J. Wysocki
with contributions from Bjorn Helgaas, Jiang Liu, Mika Westerberg,
Toshi Kani, and Yinghai Lu.
- ACPI power resources handling and ACPI device PM update from Rafael
J Wysocki.
- ACPICA update to version 20130117 from Bob Moore and Lv Zheng with
contributions from Aaron Lu, Chao Guan, Jesper Juhl, and Tim Gardner.
- Support for Intel Lynxpoint LPSS from Mika Westerberg.
- cpuidle update from Len Brown including Intel Haswell support, C1
state for intel_idle, removal of global pm_idle.
- cpuidle fixes and cleanups from Daniel Lezcano.
- cpufreq fixes and cleanups from Viresh Kumar and Fabio Baltieri with
contributions from Stratos Karafotis and Rickard Andersson.
- Intel P-states driver for Sandy Bridge processors from Dirk
Brandewie.
- cpufreq driver for Marvell Kirkwood SoCs from Andrew Lunn.
- cpufreq fixes related to ordering issues between acpi-cpufreq and
powernow-k8 from Borislav Petkov and Matthew Garrett.
- cpufreq support for Calxeda Highbank processors from Mark Langsdorf
and Rob Herring.
- cpufreq driver for the Freescale i.MX6Q SoC and cpufreq-cpu0 update
from Shawn Guo.
- cpufreq Exynos fixes and cleanups from Jonghwan Choi, Sachin Kamat,
and Inderpal Singh.
- Support for "lightweight suspend" from Zhang Rui.
- Removal of the deprecated power trace API from Paul Gortmaker.
- Assorted updates from Andreas Fleig, Colin Ian King, Davidlohr Bueso,
Joseph Salisbury, Kees Cook, Li Fei, Nishanth Menon, ShuoX Liu,
Srinivas Pandruvada, Tejun Heo, Thomas Renninger, and Yasuaki
Ishimatsu.
* tag 'pm+acpi-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (267 commits)
PM idle: remove global declaration of pm_idle
unicore32 idle: delete stray pm_idle comment
openrisc idle: delete pm_idle
mn10300 idle: delete pm_idle
microblaze idle: delete pm_idle
m32r idle: delete pm_idle, and other dead idle code
ia64 idle: delete pm_idle
cris idle: delete idle and pm_idle
ARM64 idle: delete pm_idle
ARM idle: delete pm_idle
blackfin idle: delete pm_idle
sparc idle: rename pm_idle to sparc_idle
sh idle: rename global pm_idle to static sh_idle
x86 idle: rename global pm_idle to static x86_idle
APM idle: register apm_cpu_idle via cpuidle
cpufreq / intel_pstate: Add kernel command line option disable intel_pstate.
cpufreq / intel_pstate: Change to disallow module build
tools/power turbostat: display SMI count by default
intel_idle: export both C1 and C1E
ACPI / hotplug: Fix concurrency issues and memory leaks
...
|
|
__ARCH_WANT_SYS_RT_SIGACTION,
__ARCH_WANT_SYS_RT_SIGSUSPEND,
__ARCH_WANT_COMPAT_SYS_RT_SIGSUSPEND,
__ARCH_WANT_COMPAT_SYS_SCHED_RR_GET_INTERVAL - not used anymore
CONFIG_GENERIC_{SIGALTSTACK,COMPAT_RT_SIG{ACTION,QUEUEINFO,PENDING,PROCMASK}} -
can be assumed always set.
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Only alpha and sparc are unusual - they have ka_restorer in it.
And nobody needs that exposed to userland.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Switch from __ARCH_WANT_SYS_RT_SIGACTION to opposite
(!CONFIG_ODD_RT_SIGACTION); the only two architectures that
need it are alpha and sparc. The reason for use of CONFIG_...
instead of __ARCH_... is that it's needed only kernel-side
and doing it that way avoids a mess with include order on many
architectures.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
If we want to stop the tick further idle, we need to be
able to account the cputime without using the tick.
Virtual based cputime accounting solves that problem by
hooking into kernel/user boundaries.
However implementing CONFIG_VIRT_CPU_ACCOUNTING require
low level hooks and involves more overhead. But we already
have a generic context tracking subsystem that is required
for RCU needs by archs which plan to shut down the tick
outside idle.
This patch implements a generic virtual based cputime
accounting that relies on these generic kernel/user hooks.
There are some upsides of doing this:
- This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
if context tracking is already built (already necessary for RCU in full
tickless mode).
- We can rely on the generic context tracking subsystem to dynamically
(de)activate the hooks, so that we can switch anytime between virtual
and tick based accounting. This way we don't have the overhead
of the virtual accounting when the tick is running periodically.
And one downside:
- There is probably more overhead than a native virtual based cputime
accounting. But this relies on hooks that are already set anyway.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
|
|
The full dynticks cputime accounting that we'll soon introduce
will rely on sched_clock(). And its clock can have a per
nanosecond granularity.
To prepare for this, we need to have a cputime_t implementation
that has this precision.
ia64 virtual cputime accounting already uses that granularity
so all we need is to librarize its implementation in the asm
generic headers.
Also librarize the default per jiffy granularity cputime_t
as well so that we can easily pick either implementation
depending on the cputime accounting config we choose.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
|
|
This patch does not affect the generation of the Linux binary.
This patch decreases 300 lines of 20121018 divergence.diff.
This patch updates architecture specific environment settings for compiling
ACPICA as such enhancement already has been done in ACPICA.
Note that the appended compiler default settings in the
<acpi/platform/acenv.h> will deprecate some of the macros defined in the
architecture specific <asm/acpi.h>. Thus two of the <asm/acpi.h> headers
have been cleaned up in this patch accordingly.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core __dev* removal patches - take 3 - from Greg Kroah-Hartman:
"Here are the remaining __dev* removal patches against the 3.8-rc2
tree. All of these patches were previously sent to the subsystem
maintainers, most of them were picked up and pushed to you, but there
were a number that fell through the cracks, and new drivers were added
during the merge window, so this series cleans up the rest of the
instances of these markings.
Third time's the charm...
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
Fixed up trivial conflict with the pinctrl pull in pinctrl-sirf.c.
* tag 'driver-core-3.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (54 commits)
misc: remove __dev* attributes.
include: remove __dev* attributes.
Documentation: remove __dev* attributes.
Drivers: misc: remove __dev* attributes.
Drivers: block: remove __dev* attributes.
Drivers: bcma: remove __dev* attributes.
Drivers: char: remove __dev* attributes.
Drivers: clocksource: remove __dev* attributes.
Drivers: ssb: remove __dev* attributes.
Drivers: dma: remove __dev* attributes.
Drivers: gpu: remove __dev* attributes.
Drivers: infinband: remove __dev* attributes.
Drivers: memory: remove __dev* attributes.
Drivers: mmc: remove __dev* attributes.
Drivers: iommu: remove __dev* attributes.
Drivers: power: remove __dev* attributes.
Drivers: message: remove __dev* attributes.
Drivers: macintosh: remove __dev* attributes.
Drivers: mfd: remove __dev* attributes.
pstore: remove __dev* attributes.
...
|
|
CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
markings need to be removed.
This change removes the use of __devinit, __devexit_p, __devinitdata,
and __devexit from these drivers.
Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Linux was granted a new system call to load modules by file descriptor
in commit 34e1169d996a ("module: add syscall to load module from fd").
Wire it up for ia64 (ready for the Chrome port :-)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull signal handling cleanups from Al Viro:
"sigaltstack infrastructure + conversion for x86, alpha and um,
COMPAT_SYSCALL_DEFINE infrastructure.
Note that there are several conflicts between "unify
SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline;
resolution is trivial - just remove definitions of SS_ONSTACK and
SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and
include/uapi/linux/signal.h contains the unified variant."
Fixed up conflicts as per Al.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
alpha: switch to generic sigaltstack
new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
generic compat_sys_sigaltstack()
introduce generic sys_sigaltstack(), switch x86 and um to it
new helper: compat_user_stack_pointer()
new helper: restore_altstack()
unify SS_ONSTACK/SS_DISABLE definitions
new helper: current_user_stack_pointer()
missing user_stack_pointer() instances
Bury the conditionals from kernel_thread/kernel_execve series
COMPAT_SYSCALL_DEFINE: infrastructure
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"A few new features this merge-window. The most important one is
probably, that dma-debug now warns if a dma-handle is not checked with
dma_mapping_error by the device driver. This requires minor changes
to some architectures which make use of dma-debug. Most of these
changes have the respective Acks by the Arch-Maintainers.
Besides that there are updates to the AMD IOMMU driver for refactor
the IOMMU-Groups support and to make sure it does not trigger a
hardware erratum.
The OMAP changes (for which I pulled in a branch from Tony Lindgren's
tree) have a conflict in linux-next with the arm-soc tree. The
conflict is in the file arch/arm/mach-omap2/clock44xx_data.c which is
deleted in the arm-soc tree. It is safe to delete the file too so
solve the conflict. Similar changes are done in the arm-soc tree in
the common clock framework migration. A missing hunk from the patch
in the IOMMU tree will be submitted as a seperate patch when the
merge-window is closed."
* tag 'iommu-updates-v3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (29 commits)
ARM: dma-mapping: support debug_dma_mapping_error
ARM: OMAP4: hwmod data: ipu and dsp to use parent clocks instead of leaf clocks
iommu/omap: Adapt to runtime pm
iommu/omap: Migrate to hwmod framework
iommu/omap: Keep mmu enabled when requested
iommu/omap: Remove redundant clock handling on ISR
iommu/amd: Remove obsolete comment
iommu/amd: Don't use 512GB pages
iommu/tegra: smmu: Move bus_set_iommu after probe for multi arch
iommu/tegra: gart: Move bus_set_iommu after probe for multi arch
iommu/tegra: smmu: Remove unnecessary PTC/TLB flush all
tile: dma_debug: add debug_dma_mapping_error support
sh: dma_debug: add debug_dma_mapping_error support
powerpc: dma_debug: add debug_dma_mapping_error support
mips: dma_debug: add debug_dma_mapping_error support
microblaze: dma-mapping: support debug_dma_mapping_error
ia64: dma_debug: add debug_dma_mapping_error support
c6x: dma_debug: add debug_dma_mapping_error support
ARM64: dma_debug: add debug_dma_mapping_error support
intel-iommu: Prevent devices with RMRRs from being placed into SI Domain
...
|
|
Cross-architecture equivalent of rdusp(); default is
user_stack_pointer(current_pt_regs()) - that works for almost all
platforms that have usp saved in pt_regs. The only exception from
that is ia64 - we want memory stack, not the backing store for
register one.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
All architectures have
CONFIG_GENERIC_KERNEL_THREAD
CONFIG_GENERIC_KERNEL_EXECVE
__ARCH_WANT_SYS_EXECVE
None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers
of kernel_execve() (which is a trivial wrapper for do_execve() now) left.
Kill the conditionals and make both callers use do_execve().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
'arm/tegra' and 'arm/omap' into next
|
|
Seems like everyone copied x86 and defined 4 private memory slots
that never actually get used. Even x86 only uses 3 of the 4. These
aren't exposed so there's no need to add padding.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
It's easy to confuse KVM_MEMORY_SLOTS and KVM_MEM_SLOTS_NUM. One is
the user accessible slots and the other is user + private. Make this
more obvious.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull big execve/kernel_thread/fork unification series from Al Viro:
"All architectures are converted to new model. Quite a bit of that
stuff is actually shared with architecture trees; in such cases it's
literally shared branch pulled by both, not a cherry-pick.
A lot of ugliness and black magic is gone (-3KLoC total in this one):
- kernel_thread()/kernel_execve()/sys_execve() redesign.
We don't do syscalls from kernel anymore for either kernel_thread()
or kernel_execve():
kernel_thread() is essentially clone(2) with callback run before we
return to userland, the callbacks either never return or do
successful do_execve() before returning.
kernel_execve() is a wrapper for do_execve() - it doesn't need to
do transition to user mode anymore.
As a result kernel_thread() and kernel_execve() are
arch-independent now - they live in kernel/fork.c and fs/exec.c
resp. sys_execve() is also in fs/exec.c and it's completely
architecture-independent.
- daemonize() is gone, along with its parts in fs/*.c
- struct pt_regs * is no longer passed to do_fork/copy_process/
copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump.
- sys_fork()/sys_vfork()/sys_clone() unified; some architectures
still need wrappers (ones with callee-saved registers not saved in
pt_regs on syscall entry), but the main part of those suckers is in
kernel/fork.c now."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits)
do_coredump(): get rid of pt_regs argument
print_fatal_signal(): get rid of pt_regs argument
ptrace_signal(): get rid of unused arguments
get rid of ptrace_signal_deliver() arguments
new helper: signal_pt_regs()
unify default ptrace_signal_deliver
flagday: kill pt_regs argument of do_fork()
death to idle_regs()
don't pass regs to copy_process()
flagday: don't pass regs to copy_thread()
bfin: switch to generic vfork, get rid of pointless wrappers
xtensa: switch to generic clone()
openrisc: switch to use of generic fork and clone
unicore32: switch to generic clone(2)
score: switch to generic fork/vfork/clone
c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone()
take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h
mn10300: switch to generic fork/vfork/clone
h8300: switch to generic fork/vfork/clone
tile: switch to generic clone()
...
Conflicts:
arch/microblaze/include/asm/Kbuild
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The biggest change affects group scheduling: we now track the runnable
average on a per-task entity basis, allowing a smoother, exponential
decay average based load/weight estimation instead of the previous
binary on-the-runqueue/off-the-runqueue load weight method.
This will inevitably disturb workloads that were in some sort of
borderline balancing state or unstable equilibrium, so an eye has to
be kept on regressions.
For that reason the new load average is only limited to group
scheduling (shares distribution) at the moment (which was also hurting
the most from the prior, crude weight calculation and whose scheduling
quality wins most from this change) - but we plan to extend this to
regular SMP balancing as well in the future, which will simplify and
speed up things a bit.
Other changes involve ongoing preparatory work to extend NOHZ to the
scheduler as well, eventually allowing completely irq-free user-space
execution."
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
Revert "sched/autogroup: Fix crash on reboot when autogroup is disabled"
cputime: Comment cputime's adjusting code
cputime: Consolidate cputime adjustment code
cputime: Rename thread_group_times to thread_group_cputime_adjusted
cputime: Move thread_group_cputime() to sched code
vtime: Warn if irqs aren't disabled on system time accounting APIs
vtime: No need to disable irqs on vtime_account()
vtime: Consolidate a bit the ctx switch code
vtime: Explicitly account pending user time on process tick
vtime: Remove the underscore prefix invasion
sched/autogroup: Fix crash on reboot when autogroup is disabled
cputime: Separate irqtime accounting from generic vtime
cputime: Specialize irq vtime hooks
kvm: Directly account vtime to system on guest switch
vtime: Make vtime_account_system() irqsafe
vtime: Gather vtime declarations to their own header file
sched: Describe CFS load-balancer
sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking
sched: Make __update_entity_runnable_avg() fast
sched: Update_cfs_shares at period edge
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Lots of activity:
211 files changed, 8328 insertions(+), 4116 deletions(-)
most of it on the tooling side.
Main changes:
* ftrace enhancements and fixes from Steve Rostedt.
* uprobes fixes, cleanups and preparation for the ARM port from Oleg
Nesterov.
* UAPI fixes, from David Howels - prepares the arch/x86 UAPI
transition
* Separate perf tests into multiple objects, one per test, from Jiri
Olsa.
* Make hardware event translations available in sysfs, from Jiri
Olsa.
* Fixes to /proc/pid/maps parsing, preparatory to supporting data
maps, from Namhyung Kim
* Implement ui_progress for GTK, from Namhyung Kim
* Add framework for automated perf_event_attr tests, where tools with
different command line options will be run from a 'perf test', via
python glue, and the perf syscall will be intercepted to verify
that the perf_event_attr fields set by the tool are those expected,
from Jiri Olsa
* Add a 'link' method for hists, so that we can have the leader with
buckets for all the entries in all the hists. This new method is
now used in the default 'diff' output, making the sum of the
'baseline' column be 100%, eliminating blind spots.
* libtraceevent fixes for compiler warnings trying to make perf it
build on some distros, like fedora 14, 32-bit, some of the warnings
really pointed to real bugs.
* Add a browser for 'perf script' and make it available from the
report and annotate browsers. It does filtering to find the
scripts that handle events found in the perf.data file used. From
Feng Tang
* perf inject changes to allow showing where a task sleeps, from
Andrew Vagin.
* Makefile improvements from Namhyung Kim.
* Add --pre and --post command hooks in 'stat', from Peter Zijlstra.
* Don't stop synthesizing threads when one vanishes, this is for the
existing threads when we start a tool like trace.
* Use sched:sched_stat_runtime to provide a thread summary, this
produces the same output as the 'trace summary' subcommand of
tglx's original "trace" tool.
* Support interrupted syscalls in 'trace'
* Add an event duration column and filter in 'trace'.
* There are references to the man pages in some tools, so try to
build Documentation when installing, warning the user if that is
not possible, from Borislav Petkov.
* Give user better message if precise is not supported, from David
Ahern.
* Try to find cross-built objdump path by using the session
environment information in the perf.data file header, from Irina
Tirdea, original patch and idea by Namhyung Kim.
* Diplays more output on features check for make V=1, so that one can
figure out what is happening by looking at gcc output, etc. From
Jiri Olsa.
* Add on_exit implementation for systems without one, e.g. Android,
from Bernhard Rosenkraenzer.
* Only process events for vcpus of interest, helps handling large
number of events, from David Ahern.
* Cross compilation fixes for Android, from Irina Tirdea.
* Add documentation on compiling for Android, from Irina Tirdea.
* perf diff improvements from Jiri Olsa.
* Target (task/user/cpu/syswide) handling improvements, from Namhyung
Kim.
* Add support in 'trace' for tracing workload given by command line,
from Namhyung Kim.
* ... and much more."
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (194 commits)
uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race
perf evsel: Introduce is_group_member method
perf powerpc: Use uapi/unistd.h to fix build error
tools: Pass the target in descend
tools: Honour the O= flag when tool build called from a higher Makefile
tools: Define a Makefile function to do subdir processing
perf ui: Always compile browser setup code
perf ui: Add ui_progress__finish()
perf ui gtk: Implement ui_progress functions
perf ui: Introduce generic ui_progress helper
perf ui tui: Move progress.c under ui/tui directory
perf tools: Add basic event modifier sanity check
perf tools: Omit group members from perf_evlist__disable/enable
perf tools: Ensure single disable call per event in record comand
perf tools: Fix 'disabled' attribute config for record command
perf tools: Fix attributes for '{}' defined event groups
perf tools: Use sscanf for parsing /proc/pid/maps
perf tools: Add gtk.<command> config option for launching GTK browser
perf tools: Fix compile error on NO_NEWT=1 build
perf hists: Initialize all of he->stat with zeroes
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull Char/Misc driver merge from Greg Kroah-Hartman:
"Here is the "big" char/misc driver patches for 3.8-rc1. I'm starting
to put random driver subsystems that I had previously sent you through
the driver-core tree in this tree, as it makes more sense to do so.
Nothing major here, the various __dev* removals, some mei driver
updates, and other random driver-specific things from the different
maintainers and developers.
Note, some MFD drivers got added through this tree, and they are also
coming in through the "real" MFD tree as well, due to some major
mis-communication between me and the different developers. If you
have any merge conflicts, take the ones from the MFD tree, not these
ones, sorry about that.
All of this has been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
Fix up trivial conflict in drivers/mmc/host/Kconfig due to new drivers
having been added (both at the end, as usual..)
* tag 'char-misc-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (84 commits)
MAINTAINERS: remove drivers/staging/hv/
misc/st_kim: Free resources in the error path of probe()
drivers/char: for hpet, add count checking, and ~0UL instead of -1
w1-gpio: Simplify & get rid of defines
w1-gpio: Pinctrl-fy
extcon: remove use of __devexit_p
extcon: remove use of __devinit
extcon: remove use of __devexit
drivers: uio: Only allocate new private data when probing device tree node
drivers: uio_dmem_genirq: Allow partial success when opening device
drivers: uio_dmem_genirq: Don't use DMA_ERROR_CODE to indicate unmapped regions
drivers: uio_dmem_genirq: Don't mix address spaces for dynamic region vaddr
uio: remove use of __devexit
uio: remove use of __devinitdata
uio: remove use of __devinit
uio: remove use of __devexit_p
char: remove use of __devexit
char: remove use of __devinitconst
char: remove use of __devinitdata
char: remove use of __devinit
...
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Add dma-debug interface debug_dma_mapping_error() to debug drivers that fail
to check dma mapping errors on addresses returned by dma_map_single() and
dma_map_page() interfaces.
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
|
|
On ia64 and powerpc, vtime context switch only consists
in flushing system and user pending time, plus a few
arch housekeeping.
Consolidate that into a generic implementation. s390 is
a special case because pending user and system time accounting
there is hard to dissociate. So it's keeping its own implementation.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
|