| Age | Commit message (Collapse) | Author |
|
Use the new vma-manager infrastructure. This doesn't change any
implementation details as the vma-offset-manager is nearly copied 1-to-1
from TTM.
The vm_lock is moved into the offset manager so we can drop it from TTM.
During lookup, we use the vma locking helpers to take a reference to the
found object.
In all other scenarios, locking stays the same as before. We always
guarantee that drm_vma_offset_remove() is called only during destruction.
Hence, helpers like drm_vma_node_offset_addr() are always safe as long as
the node has a valid offset.
This also drops the addr_space_offset member as it is a copy of vm_start
in vma_node objects. Use the accessor functions instead.
v4:
- remove vm_lock
- use drm_vma_offset_lock_lookup() to protect lookup (instead of vm_lock)
Cc: Dave Airlie <airlied@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Martin Peres <martin.peres@labri.fr>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@gmail.com>
|
|
Use the new vma manager instead of the old hashtable. Also convert all
drivers to use the new convenience helpers. This drops all the
(map_list.hash.key << PAGE_SHIFT) non-sense.
Locking and access-management is exactly the same as before with an
additional lock inside of the vma-manager, which strictly wouldn't be
needed for gem.
v2:
- rebase on drm-next
- init nodes via drm_vma_node_reset() in drm_gem.c
v3:
- fix tegra
v4:
- remove duplicate if (drm_vma_node_has_offset()) checks
- inline now trivial drm_vma_node_offset_addr() calls
v5:
- skip node-reset on gem-init due to kzalloc()
- do not allow mapping gem-objects with offsets (backwards compat)
- remove unneccessary casts
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@gmail.com>
|
|
If we want to map GPU memory into user-space, we need to linearize the
addresses to not confuse mm-core. Currently, GEM and TTM both implement
their own offset-managers to assign a pgoff to each object for user-space
CPU access. GEM uses a hash-table, TTM uses an rbtree.
This patch provides a unified implementation that can be used to replace
both. TTM allows partial mmaps with a given offset, so we cannot use
hashtables as the start address may not be known at mmap time. Hence, we
use the rbtree-implementation of TTM.
We could easily update drm_mm to use an rbtree instead of a linked list
for it's object list and thus drop the rbtree from the vma-manager.
However, this would slow down drm_mm object allocation for all other
use-cases (rbtree insertion) and add another 4-8 bytes to each mm node.
Hence, use the separate tree but allow for later migration.
This is a rewrite of the 2012-proposal by David Airlie <airlied@linux.ie>
v2:
- fix Docbook integration
- drop drm_mm_node_linked() and use drm_mm_node_allocated()
- remove unjustified likely/unlikely usage (but keep for rbtree paths)
- remove BUG_ON() as drm_mm already does that
- clarify page-based vs. byte-based addresses
- use drm_vma_node_reset() for initialization, too
v4:
- allow external locking via drm_vma_offset_un/lock_lookup()
- add locked lookup helper drm_vma_offset_lookup_locked()
v5:
- fix drm_vma_offset_lookup() to correctly validate range-mismatches
(fix (offset > start + pages))
- fix drm_vma_offset_exact_lookup() to actually do what it says
- remove redundant vm_pages member (add drm_vma_node_size() helper)
- remove unneeded goto
- fix documentation
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@gmail.com>
|
|
When building the htmldocs (in verbose mode), scripts/kernel-doc reports the
following type of warnings:
Warning(include/linux/ktime.h:75): No description found for return value of
'ktime_set'
Fix them by using a "Return:" section to describe the return values.
(Also apply some minor reformatting along the way.)
Signed-off-by: Yacine Belkadi <yacine.belkadi.1@gmail.com>
Acked-by: Rob Landley <rob@landley.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Add pfuze100 regulator driver.
Signed-off-by: Robin Gong <b38343@freescale.com>
Tested-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
|
|
Since everybody sets kstrdup()ed constant string to "struct xattr"->name but
nobody modifies "struct xattr"->name , we can omit kstrdup() and its failure
checking by constifying ->name member of "struct xattr".
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Joel Becker <jlbec@evilplan.org> [ocfs2]
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Tested-by: Paul Moore <paul@paul-moore.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
|
|
No users outside net/core/dev.c.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Idea of this patch is to add optional limitation of number of
unsent bytes in TCP sockets, to reduce usage of kernel memory.
TCP receiver might announce a big window, and TCP sender autotuning
might allow a large amount of bytes in write queue, but this has little
performance impact if a large part of this buffering is wasted :
Write queue needs to be large only to deal with large BDP, not
necessarily to cope with scheduling delays (incoming ACKS make room
for the application to queue more bytes)
For most workloads, using a value of 128 KB or less is OK to give
applications enough time to react to POLLOUT events in time
(or being awaken in a blocking sendmsg())
This patch adds two ways to set the limit :
1) Per socket option TCP_NOTSENT_LOWAT
2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets
not using TCP_NOTSENT_LOWAT socket option (or setting a zero value)
Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect.
This changes poll()/select()/epoll() to report POLLOUT
only if number of unsent bytes is below tp->nosent_lowat
Note this might increase number of sendmsg()/sendfile() calls
when using non blocking sockets,
and increase number of context switches for blocking sockets.
Note this is not related to SO_SNDLOWAT (as SO_SNDLOWAT is
defined as :
Specify the minimum number of bytes in the buffer until
the socket layer will pass the data to the protocol)
Tested:
netperf sessions, and watching /proc/net/protocols "memory" column for TCP
With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory
used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458)
lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
TCPv6 1880 2 45458 no 208 yes ipv6 y y y y y y y y y y y y y n y y y y y
TCP 1696 508 45458 no 208 yes kernel y y y y y y y y y y y y y n y y y y y
lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
TCPv6 1880 2 20567 no 208 yes ipv6 y y y y y y y y y y y y y n y y y y y
TCP 1696 508 20567 no 208 yes kernel y y y y y y y y y y y y y n y y y y y
Using 128KB has no bad effect on the throughput or cpu usage
of a single flow, although there is an increase of context switches.
A bonus is that we hold socket lock for a shorter amount
of time and should improve latencies of ACK processing.
lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1651584 6291456 16384 20.00 17447.90 10^6bits/s 3.13 S -1.00 U 0.353 -1.000 usec/KB
Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':
412,514 context-switches
200.034645535 seconds time elapsed
lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1593240 6291456 16384 20.00 17321.16 10^6bits/s 3.35 S -1.00 U 0.381 -1.000 usec/KB
Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':
2,675,818 context-switches
200.029651391 seconds time elapsed
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-By: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Several call sites use the hardcoded following condition :
sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)
Lets use a helper because TCP_NOTSENT_LOWAT support will change this
condition for TCP sockets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The SCTP mailing list address to send patches or questions
to is linux-sctp@vger.kernel.org and not
lksctp-developers@lists.sourceforge.net anymore. Therefore,
update all occurences.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
USB spec stats that short packet can only appear at the end
of transfer. Because lost of HC(EHCI/UHCI/OHCI/...) can't
build a full packet from discontinuous buffers, we introduce
the limit in usb_submit_urb() to avoid such kind of bad sg buffers
coming from driver.
The limit might be a bit strict:
- platform has iommu to do sg list mapping
- some host controllers may support to build full packet from
discontinuous buffers.
But considered that most of HCs don't support that, and driver
need work well or keep consistent on different HCs and ARCHs, we
have to introduce the limit.
Currently, only usbtest is reported to pass such sg buffers to HC,
and other users(mass storage, usbfs) don't have the problem.
We don't check it on USB wireless device, because:
- wireless devices can't be attached to common USB
bus(EHCI/UHCI/OHCI/...)
- the max packet size of endpoint may be odd, and often can't
devide 4KB which is a typical usage in usb mass storage application
Reported-by: Konstantin Filatov <kfilatov@parallels.com>
Reported-by: Denis V. Lunev <den@openvz.org>
Cc: Felipe Balbi <balbi@ti.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Commits 6a1c0680cf3ba94356ecd58833e1540c93472a57 and
9356b535fcb71db494fc434acceb79f56d15bda2, respectively
'tty: Convert termios_mutex to termios_rwsem' and
'n_tty: Access termios values safely'
introduced a circular lock dependency with console_lock and
termios_rwsem.
The lockdep report [1] shows that n_tty_write() will attempt
to claim console_lock while holding the termios_rwsem, whereas
tty_do_resize() may already hold the console_lock while
claiming the termios_rwsem.
Since n_tty_write() and tty_do_resize() do not contend
over the same data -- the tty->winsize structure -- correct
the lock dependency by introducing a new lock which
specifically serializes access to tty->winsize only.
[1] Lockdep report
======================================================
[ INFO: possible circular locking dependency detected ]
3.10.0-0+tip-xeon+lockdep #0+tip Not tainted
-------------------------------------------------------
modprobe/277 is trying to acquire lock:
(&tty->termios_rwsem){++++..}, at: [<ffffffff81452656>] tty_do_resize+0x36/0xe0
but task is already holding lock:
((fb_notifier_list).rwsem){.+.+.+}, at: [<ffffffff8107aac6>] __blocking_notifier_call_chain+0x56/0xc0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 ((fb_notifier_list).rwsem){.+.+.+}:
[<ffffffff810b6d62>] lock_acquire+0x92/0x1f0
[<ffffffff8175b797>] down_read+0x47/0x5c
[<ffffffff8107aac6>] __blocking_notifier_call_chain+0x56/0xc0
[<ffffffff8107ab46>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff813d7c0b>] fb_notifier_call_chain+0x1b/0x20
[<ffffffff813d95b2>] register_framebuffer+0x1e2/0x320
[<ffffffffa01043e1>] drm_fb_helper_initial_config+0x371/0x540 [drm_kms_helper]
[<ffffffffa01bcb05>] nouveau_fbcon_init+0x105/0x140 [nouveau]
[<ffffffffa01ad0af>] nouveau_drm_load+0x43f/0x610 [nouveau]
[<ffffffffa008a79e>] drm_get_pci_dev+0x17e/0x2a0 [drm]
[<ffffffffa01ad4da>] nouveau_drm_probe+0x25a/0x2a0 [nouveau]
[<ffffffff813b13db>] local_pci_probe+0x4b/0x80
[<ffffffff813b1701>] pci_device_probe+0x111/0x120
[<ffffffff814977eb>] driver_probe_device+0x8b/0x3a0
[<ffffffff81497bab>] __driver_attach+0xab/0xb0
[<ffffffff814956ad>] bus_for_each_dev+0x5d/0xa0
[<ffffffff814971fe>] driver_attach+0x1e/0x20
[<ffffffff81496cc1>] bus_add_driver+0x111/0x290
[<ffffffff814982b7>] driver_register+0x77/0x170
[<ffffffff813b0454>] __pci_register_driver+0x64/0x70
[<ffffffffa008a9da>] drm_pci_init+0x11a/0x130 [drm]
[<ffffffffa022a04d>] nouveau_drm_init+0x4d/0x1000 [nouveau]
[<ffffffff810002ea>] do_one_initcall+0xea/0x1a0
[<ffffffff810c54cb>] load_module+0x123b/0x1bf0
[<ffffffff810c5f57>] SyS_init_module+0xd7/0x120
[<ffffffff817677c2>] system_call_fastpath+0x16/0x1b
-> #1 (console_lock){+.+.+.}:
[<ffffffff810b6d62>] lock_acquire+0x92/0x1f0
[<ffffffff810430a7>] console_lock+0x77/0x80
[<ffffffff8146b2a1>] con_flush_chars+0x31/0x50
[<ffffffff8145780c>] n_tty_write+0x1ec/0x4d0
[<ffffffff814541b9>] tty_write+0x159/0x2e0
[<ffffffff814543f5>] redirected_tty_write+0xb5/0xc0
[<ffffffff811ab9d5>] vfs_write+0xc5/0x1f0
[<ffffffff811abec5>] SyS_write+0x55/0xa0
[<ffffffff817677c2>] system_call_fastpath+0x16/0x1b
-> #0 (&tty->termios_rwsem){++++..}:
[<ffffffff810b65c3>] __lock_acquire+0x1c43/0x1d30
[<ffffffff810b6d62>] lock_acquire+0x92/0x1f0
[<ffffffff8175b724>] down_write+0x44/0x70
[<ffffffff81452656>] tty_do_resize+0x36/0xe0
[<ffffffff8146c841>] vc_do_resize+0x3e1/0x4c0
[<ffffffff8146c99f>] vc_resize+0x1f/0x30
[<ffffffff813e4535>] fbcon_init+0x385/0x5a0
[<ffffffff8146a4bc>] visual_init+0xbc/0x120
[<ffffffff8146cd13>] do_bind_con_driver+0x163/0x320
[<ffffffff8146cfa1>] do_take_over_console+0x61/0x70
[<ffffffff813e2b93>] do_fbcon_takeover+0x63/0xc0
[<ffffffff813e67a5>] fbcon_event_notify+0x715/0x820
[<ffffffff81762f9d>] notifier_call_chain+0x5d/0x110
[<ffffffff8107aadc>] __blocking_notifier_call_chain+0x6c/0xc0
[<ffffffff8107ab46>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff813d7c0b>] fb_notifier_call_chain+0x1b/0x20
[<ffffffff813d95b2>] register_framebuffer+0x1e2/0x320
[<ffffffffa01043e1>] drm_fb_helper_initial_config+0x371/0x540 [drm_kms_helper]
[<ffffffffa01bcb05>] nouveau_fbcon_init+0x105/0x140 [nouveau]
[<ffffffffa01ad0af>] nouveau_drm_load+0x43f/0x610 [nouveau]
[<ffffffffa008a79e>] drm_get_pci_dev+0x17e/0x2a0 [drm]
[<ffffffffa01ad4da>] nouveau_drm_probe+0x25a/0x2a0 [nouveau]
[<ffffffff813b13db>] local_pci_probe+0x4b/0x80
[<ffffffff813b1701>] pci_device_probe+0x111/0x120
[<ffffffff814977eb>] driver_probe_device+0x8b/0x3a0
[<ffffffff81497bab>] __driver_attach+0xab/0xb0
[<ffffffff814956ad>] bus_for_each_dev+0x5d/0xa0
[<ffffffff814971fe>] driver_attach+0x1e/0x20
[<ffffffff81496cc1>] bus_add_driver+0x111/0x290
[<ffffffff814982b7>] driver_register+0x77/0x170
[<ffffffff813b0454>] __pci_register_driver+0x64/0x70
[<ffffffffa008a9da>] drm_pci_init+0x11a/0x130 [drm]
[<ffffffffa022a04d>] nouveau_drm_init+0x4d/0x1000 [nouveau]
[<ffffffff810002ea>] do_one_initcall+0xea/0x1a0
[<ffffffff810c54cb>] load_module+0x123b/0x1bf0
[<ffffffff810c5f57>] SyS_init_module+0xd7/0x120
[<ffffffff817677c2>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Chain exists of:
&tty->termios_rwsem --> console_lock --> (fb_notifier_list).rwsem
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock((fb_notifier_list).rwsem);
lock(console_lock);
lock((fb_notifier_list).rwsem);
lock(&tty->termios_rwsem);
*** DEADLOCK ***
7 locks held by modprobe/277:
#0: (&__lockdep_no_validate__){......}, at: [<ffffffff81497b5b>] __driver_attach+0x5b/0xb0
#1: (&__lockdep_no_validate__){......}, at: [<ffffffff81497b69>] __driver_attach+0x69/0xb0
#2: (drm_global_mutex){+.+.+.}, at: [<ffffffffa008a6dd>] drm_get_pci_dev+0xbd/0x2a0 [drm]
#3: (registration_lock){+.+.+.}, at: [<ffffffff813d93f5>] register_framebuffer+0x25/0x320
#4: (&fb_info->lock){+.+.+.}, at: [<ffffffff813d8116>] lock_fb_info+0x26/0x60
#5: (console_lock){+.+.+.}, at: [<ffffffff813d95a4>] register_framebuffer+0x1d4/0x320
#6: ((fb_notifier_list).rwsem){.+.+.+}, at: [<ffffffff8107aac6>] __blocking_notifier_call_chain+0x56/0xc0
stack backtrace:
CPU: 0 PID: 277 Comm: modprobe Not tainted 3.10.0-0+tip-xeon+lockdep #0+tip
Hardware name: Dell Inc. Precision WorkStation T5400 /0RW203, BIOS A11 04/30/2012
ffffffff8213e5e0 ffff8802aa2fb298 ffffffff81755f19 ffff8802aa2fb2e8
ffffffff8174f506 ffff8802aa2fa000 ffff8802aa2fb378 ffff8802aa2ea8e8
ffff8802aa2ea910 ffff8802aa2ea8e8 0000000000000006 0000000000000007
Call Trace:
[<ffffffff81755f19>] dump_stack+0x19/0x1b
[<ffffffff8174f506>] print_circular_bug+0x1fb/0x20c
[<ffffffff810b65c3>] __lock_acquire+0x1c43/0x1d30
[<ffffffff810b775e>] ? mark_held_locks+0xae/0x120
[<ffffffff810b78d5>] ? trace_hardirqs_on_caller+0x105/0x1d0
[<ffffffff810b6d62>] lock_acquire+0x92/0x1f0
[<ffffffff81452656>] ? tty_do_resize+0x36/0xe0
[<ffffffff8175b724>] down_write+0x44/0x70
[<ffffffff81452656>] ? tty_do_resize+0x36/0xe0
[<ffffffff81452656>] tty_do_resize+0x36/0xe0
[<ffffffff8146c841>] vc_do_resize+0x3e1/0x4c0
[<ffffffff8146c99f>] vc_resize+0x1f/0x30
[<ffffffff813e4535>] fbcon_init+0x385/0x5a0
[<ffffffff8146a4bc>] visual_init+0xbc/0x120
[<ffffffff8146cd13>] do_bind_con_driver+0x163/0x320
[<ffffffff8146cfa1>] do_take_over_console+0x61/0x70
[<ffffffff813e2b93>] do_fbcon_takeover+0x63/0xc0
[<ffffffff813e67a5>] fbcon_event_notify+0x715/0x820
[<ffffffff81762f9d>] notifier_call_chain+0x5d/0x110
[<ffffffff8107aadc>] __blocking_notifier_call_chain+0x6c/0xc0
[<ffffffff8107ab46>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff813d7c0b>] fb_notifier_call_chain+0x1b/0x20
[<ffffffff813d95b2>] register_framebuffer+0x1e2/0x320
[<ffffffffa01043e1>] drm_fb_helper_initial_config+0x371/0x540 [drm_kms_helper]
[<ffffffff8173cbcb>] ? kmemleak_alloc+0x5b/0xc0
[<ffffffff81198874>] ? kmem_cache_alloc_trace+0x104/0x290
[<ffffffffa01035e1>] ? drm_fb_helper_single_add_all_connectors+0x81/0xf0 [drm_kms_helper]
[<ffffffffa01bcb05>] nouveau_fbcon_init+0x105/0x140 [nouveau]
[<ffffffffa01ad0af>] nouveau_drm_load+0x43f/0x610 [nouveau]
[<ffffffffa008a79e>] drm_get_pci_dev+0x17e/0x2a0 [drm]
[<ffffffffa01ad4da>] nouveau_drm_probe+0x25a/0x2a0 [nouveau]
[<ffffffff8175f162>] ? _raw_spin_unlock_irqrestore+0x42/0x80
[<ffffffff813b13db>] local_pci_probe+0x4b/0x80
[<ffffffff813b1701>] pci_device_probe+0x111/0x120
[<ffffffff814977eb>] driver_probe_device+0x8b/0x3a0
[<ffffffff81497bab>] __driver_attach+0xab/0xb0
[<ffffffff81497b00>] ? driver_probe_device+0x3a0/0x3a0
[<ffffffff814956ad>] bus_for_each_dev+0x5d/0xa0
[<ffffffff814971fe>] driver_attach+0x1e/0x20
[<ffffffff81496cc1>] bus_add_driver+0x111/0x290
[<ffffffffa022a000>] ? 0xffffffffa0229fff
[<ffffffff814982b7>] driver_register+0x77/0x170
[<ffffffffa022a000>] ? 0xffffffffa0229fff
[<ffffffff813b0454>] __pci_register_driver+0x64/0x70
[<ffffffffa008a9da>] drm_pci_init+0x11a/0x130 [drm]
[<ffffffffa022a000>] ? 0xffffffffa0229fff
[<ffffffffa022a000>] ? 0xffffffffa0229fff
[<ffffffffa022a04d>] nouveau_drm_init+0x4d/0x1000 [nouveau]
[<ffffffff810002ea>] do_one_initcall+0xea/0x1a0
[<ffffffff810c54cb>] load_module+0x123b/0x1bf0
[<ffffffff81399a50>] ? ddebug_proc_open+0xb0/0xb0
[<ffffffff813855ae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff810c5f57>] SyS_init_module+0xd7/0x120
[<ffffffff817677c2>] system_call_fastpath+0x16/0x1b
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull crypto fixes from Herbert Xu:
"This push fixes a memory corruption issue in caam, as well as
reverting the new optimised crct10dif implementation as it breaks boot
on initrd systems.
Hopefully crct10dif will be reinstated once the supporting code is
added so that it doesn't break boot"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
Revert "crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework"
crypto: caam - Fixed the memory out of bound overwrite issue
|
|
Replace the SATA_PHY_# by the more readable definitons.
tj: Being routed through libata branch to enable implementation of
ahci_imx.
Signed-off-by: Richard Zhu <r65037@freescale.com>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
When creating a less privileged mount namespace or propogating mounts
from a more privileged to a less privileged mount namespace lock the
submounts so they may not be unmounted individually in the child mount
namespace revealing what is under them.
This enforces the reasonable expectation that it is not possible to
see under a mount point. Most of the time mounts are on empty
directories and revealing that does not matter, however I have seen an
occassionaly sloppy configuration where there were interesting things
concealed under a mount point that probably should not be revealed.
Expirable submounts are not locked because they will eventually
unmount automatically so whatever is under them already needs
to be safe for unprivileged users to access.
From a practical standpoint these restrictions do not appear to be
significant for unprivileged users of the mount namespace. Recursive
bind mounts and pivot_root continues to work, and mounts that are
created in a mount namespace may be unmounted there. All of which
means that the common idiom of keeping a directory of interesting
files and using pivot_root to throw everything else away continues to
work just fine.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
|
|
This patch makes mask/wake_invert bool and puts all flags into a bitfield
for consistency and to save some space.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
|
|
In order to avoid race conditions the assignment of dapm->update should happen
while card->dapm_mutex is being held. To allow CODEC drivers to run a register
update when using snd_soc_dapm_mux_update_power() or
snd_soc_dapm_mixer_update_power() add a update parameter to these two functions.
The update parameter will be assigned to dapm->update while card->dapm_mutex is
locked.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
|
|
Currently when updating a control that is shared between multiple widgets the
whole power-up/power-down sequence is being run once for each widget. The
control register is updated during the first run, which means the CODEC internal
routing is also updated for all widgets during this first run. The input and
output paths for each widgets are only updated though during the respective run
for that widget. This leads to a slight inconsistency between the CODEC's
internal state and ASoC's state, which causes non optimal behavior in regard to
click and pop avoidance.
E.g. consider the following setup where two MUXs share the same control.
+------+
A1 ------| |
| MUX1 |----- C1
B1 ------| |
+------+
|
control ---+
|
+------+
A2 ------| |
| MUX2 |----- C2
B2 ------| |
+------+
If the control is updated to switch the MUXs from input A to input B with the
current code the power-up/power-down sequence will look like this:
Run soc_dapm_mux_update_power for MUX1
Power-down A1
Update MUXing
Power-up B1
Run soc_dapm_mux_update_power for MUX2
Power-down A2
(Update MUXing)
Power-up B2
Note that the second 'Update Muxing' is a no-op, since the register was already
updated.
While the preferred order for avoiding pops and clicks should be:
Run soc_dapm_mux_update_power for control
Power-down A1
Power-down A2
Update MUXing
Power-up B1
Power-up B2
This patch changes the behavior to the later by running the updates for all
widgets that the control is attached to at the same time.
The new code is also a bit simpler since callers of
soc_dapm_{mux,muxer}_update_power don't have to loop over each widget anymore
and neither do we need to keep track for which of the kcontrol's widgets the
current update is.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
|
|
soc_dpcm_runtime_update() operates on a ASoC card as a whole. Currently it takes
a snd_soc_dapm_widget as its only parameter though. The widget is then used to
look up the card and is otherwise unused. This patch changes the function to
take a pointer to the card directly. This makes it possible to to call
soc_dpcm_runtime_update() for updates which are not related to one specific
widget.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
|
|
On some PAE architectures, the entire range of physical memory could reside
outside the 32-bit limit. These systems need the ability to specify the
initrd location using 64-bit numbers.
This patch globally modifies the early_init_dt_setup_initrd_arch() function to
use 64-bit numbers instead of the current unsigned long.
There has been quite a bit of debate about whether to use u64 or phys_addr_t.
It was concluded to stick to u64 to be consistent with rest of the device
tree code. As summarized by Geert, "The address to load the initrd is decided
by the bootloader/user and set at that point later in time. The dtb should not
be tied to the kernel you are booting"
More details on the discussion can be found here:
https://lkml.org/lkml/2013/6/20/690
https://lkml.org/lkml/2012/9/13/544
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
|
|
I am getting a few
|warning: unused variable ‘p’ [-Wunused-variable]
|warning: unused variable ‘prop’ [-Wunused-variable]
in the case where CONFIG_OF is not defined and the parameters are only
used in the loop macro.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
|
|
transform framework"
This reverts commits
67822649d7305caf3dd50ed46c27b99c94eff996
39761214eefc6b070f29402aa1165f24d789b3f7
0b95a7f85718adcbba36407ef88bba0a7379ed03
31d939625a9a20b1badd2d4e6bf6fd39fa523405
2d31e518a42828df7877bca23a958627d60408bc
Unfortunately this change broke boot on some systems that used an
initrd which does not include the newly created crct10dif modules.
As these modules are required by sd_mod under certain configurations
this is a serious problem.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fix from Tony Luck:
"Fix EDAC lockdep splat"
* tag 'please-pull-bp-edac' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC: Fix lockdep splat
|
|
In canonical mode, an EOF which is not the first character of the line
causes read() to complete and return the number of characters read so
far (commonly referred to as EOF push). However, if the previous read()
returned because the user buffer was full _and_ the next character
is an EOF not at the beginning of the line, read() must not return 0,
thus mistakenly indicating the end-of-file condition.
The TTY_PUSH flag is used to indicate an EOF was received which is not
at the beginning of the line. Because the EOF push condition is
evaluated by a thread other than the read(), multiple EOF pushes can
cause a premature end-of-file to be indicated.
Instead, discover the 'EOF push as first read character' condition
from the read() thread itself, and restart the i/o loop if detected.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/davidb/linux-msm into next/cleanup
From Stephen Boyd:
Now that we have a generic arch hook for broadcast we can remove the
local timer API entirely. Doing so will reduce code in ARM core, reduce
the architecture dependencies of our timer drivers, and simplify the code
because we no longer go through an architecture layer that is essentially
a hotplug notifier.
* tag 'remove-local-timers' of git://git.kernel.org/pub/scm/linux/kernel/git/davidb/linux-msm:
ARM: smp: Remove local timer API
clocksource: time-armada-370-xp: Divorce from local timer API
clocksource: time-armada-370-xp: Fix sparse warning
ARM: msm: Divorce msm_timer from local timer API
ARM: PRIMA2: Divorce timer-marco from local timer API
ARM: EXYNOS4: Divorce mct from local timer API
ARM: OMAP2+: Divorce from local timer API
ARM: smp_twd: Divorce smp_twd from local timer API
ARM: smp: Remove duplicate dummy timer implementation
Resolved a large number of conflicts due to __cpuinit cleanups, etc.
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
Similar to what is implemented in bonding. User is able to ask team
driver to send IGMP rejoins in case port is enabled or disabled. Using
previously introduced netdev notifier.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Until now, bond_resend_igmp_join_requests() looks for vlans attached to
bonding device, bridge where bonding act as port manually. It does not
care of other scenarios, like stacked bonds or team device above. Make
this more generic and use netdev notifier to propagate the event to
upper devices and to actually call ip_mc_rejoin_groups().
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When port is enabled or disabled, allow to notify peers by unsolicitated
NAs or gratuitous ARPs. Disabled by default.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
TTY_BUFFER_PAGE is only used within drivers/tty/tty_buffer.c;
relocate to that file scope.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Convert the tty_buffer_flush() exclusion mechanism to a
public interface - tty_buffer_lock/unlock_exclusive() - and use
the interface to safely write the paste selection to the line
discipline.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Atomic bit ops are no longer required to indicate a flip buffer
flush is pending, as the flush_mutex is sufficient barrier.
Remove the unnecessary port .iflags field and localize flip buffer
state to struct tty_bufhead.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Separate the head and tail ptrs to avoid cache-line contention
(so called 'false-sharing') between concurrent threads.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Now that dropping the buffer lock is not necessary (as result of
converting the spin lock to a mutex), the flip buffer flush no
longer needs to be handled by the buffer work.
Simply signal a flush is required; the buffer work will exit the
i/o loop, which allows tty_buffer_flush() to proceed.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The buffer work may race with parallel tty_buffer_flush. Use a
mutex to guarantee exclusive modify access to the head flip
buffer.
Remove the unneeded spin lock.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Lockless flip buffers require atomically updating the bytes-in-use
watermark.
The pty driver also peeks at the watermark value to limit
memory consumption to a much lower value than the default; query
the watermark with new fn, tty_buffer_space_avail().
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Use a 0-sized sentinel to avoid assigning the head ptr from
the driver side thread. This also eliminates testing head/tail
for NULL.
When the sentinel is first 'consumed' by the buffer work
(or by tty_buffer_flush()), it is detached from the list but not
freed nor added to the free list. Both buffer work and
tty_buffer_flush() continue to preserve at least 1 flip buffer
to which head & tail is pointed.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In preparation for lockless flip buffers, make the flip buffer
free list lockless.
NB: using llist is not the optimal solution, as the driver and
buffer work may contend over the llist head unnecessarily. However,
test measurements indicate this contention is low.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The char_buf_ptr and flag_buf_ptr values are trivially derived from
the .data field offset; compute values as needed.
Fixes a long-standing type-mismatch with the char and flag ptrs.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
No tty driver modifies termios during throttle() or unthrottle().
Therefore, only read safety is required.
However, tty_throttle_safe and tty_unthrottle_safe must still be
mutually exclusive; introduce throttle_mutex for that purpose.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
termios is commonly accessed unsafely (especially by N_TTY)
because the existing mutex forces exclusive access.
Convert existing usage.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Although line discipline receiving is single-producer/single-consumer,
using tty->receive_room to manage flow control creates unnecessary
critical regions requiring additional lock use.
Instead, introduce the optional .receive_buf2() ldisc method which
returns the # of bytes actually received. Serialization is guaranteed
by the caller.
In turn, the line discipline should schedule the buffer work item
whenever space becomes available; ie., when there is room to receive
data and receive_room() previously returned 0 (the buffer work
item stops processing if receive_buf2() returns 0). Note the
'no room' state need not be atomic despite concurrent use by two
threads because only the buffer work thread can set the state and
only the read() thread can clear the state.
Add n_tty_receive_buf2() as the receive_buf2() method for N_TTY.
Provide a public helper function, tty_ldisc_receive_buf(), to use
when directly accessing the receive_buf() methods.
Line disciplines not using input flow control can continue to set
tty->receive_room to a fixed value and only provide the receive_buf()
method.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Line discipline locking was performed with a combination of
a mutex, a status bit, a count, and a waitqueue -- basically,
a rw semaphore.
Replace the existing combination with an ld_semaphore.
Fixes:
1) the 'reference acquire after ldisc locked' bug
2) the over-complicated halt mechanism
3) lock order wrt. tty_lock()
4) dropping locks while changing ldisc
5) previously unidentified deadlock while locking ldisc from
both linked ttys concurrently
6) previously unidentified recursive deadlocks
Adds much-needed lockdep diagnostics.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix the following:
BUG: key ffff88043bdd0330 not in .data!
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
Call Trace:
dump_stack
warn_slowpath_common
warn_slowpath_fmt
lockdep_init_map
? trace_hardirqs_on_caller
? trace_hardirqs_on
debug_mutex_init
__mutex_init
bus_register
edac_create_sysfs_mci_device
edac_mc_add_mc
sbridge_probe
pci_device_probe
driver_probe_device
__driver_attach
? driver_probe_device
bus_for_each_dev
driver_attach
bus_add_driver
driver_register
__pci_register_driver
? 0xffffffffa0010fff
sbridge_init
? 0xffffffffa0010fff
do_one_initcall
load_module
? unset_module_init_ro_nx
SyS_init_module
tracesys
---[ end trace d24a70b0d3ddf733 ]---
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
EDAC sbridge: Driver loaded.
What happens is that bus_register needs a statically allocated lock_key
because the last is handed in to lockdep. However, struct mem_ctl_info
embeds struct bus_type (the whole struct, not a pointer to it) and the
whole thing gets dynamically allocated.
Fix this by using a statically allocated struct bus_type for the MC bus.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: stable@kernel.org # v3.10
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
"This contains two patches, both of which aren't fixes per-se but I
think it'd be better to fast-track them.
One removes bcache_subsys_id which was added without proper review
through the block tree. Fortunately, bcache cgroup code is
unconditionally disabled, so this was never exposed to userland. The
cgroup subsys_id is removed. Kent will remove the affected (disabled)
code through bcache branch.
The other simplifies task_group_path_from_hierarchy(). The function
doesn't currently have in-kernel users but there are external code and
development going on dependent on the function and making the function
available for 3.11 would make things go smoother"
* 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: replace task_cgroup_path_from_hierarchy() with task_cgroup_path()
cgroup: remove bcache_subsys_id which got added stealthily
|
|
In case the hardware interrupt mask register does not prevent the chip level
irq from being asserted by the corresponding interrupt status bit, already
set interrupt bits should to be cleared once after masking them during
initialization. Add a flag to let drivers enable this behavior.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
|
|
The em_x270_mci_setpower() and em_x270_usb_hub_init() functions
call regulator_enable(), which may return an error that must
be checked.
This changes the em_x270_usb_hub_init() function to bail out
if it fails, and changes the pxamci_platform_data->setpower
callback so that the a failed em_x270_mci_setpower call
can be propagated by the pxamci driver into the mmc core.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Mike Rapoport <mike@compulab.co.il>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
Acked-by: Chris Ball <cjb@laptop.org>
[olof: fixed order of regulator_enable() and test in em_x270_usb_hub_init]
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
The wake-affine scheduler feature is currently always trying to pull
the wakee close to the waker. In theory this should be beneficial if
the waker's CPU caches hot data for the wakee, and it's also beneficial
in the extreme ping-pong high context switch rate case.
Testing shows it can benefit hackbench up to 15%.
However, the feature is somewhat blind, from which some workloads
such as pgbench suffer. It's also time-consuming algorithmically.
Testing shows it can damage pgbench up to 50% - far more than the
benefit it brings in the best case.
So wake-affine should be smarter and it should realize when to
stop its thankless effort at trying to find a suitable CPU to wake on.
This patch introduces 'wakee_flips', which will be increased each
time the task flips (switches) its wakee target.
So a high 'wakee_flips' value means the task has more than one
wakee, and the bigger the number, the higher the wakeup frequency.
Now when making the decision on whether to pull or not, pay attention to
the wakee with a high 'wakee_flips', pulling such a task may benefit
the wakee. Also imply that the waker will face cruel competition later,
it could be very cruel or very fast depends on the story behind
'wakee_flips', waker therefore suffers.
Furthermore, if waker also has a high 'wakee_flips', that implies that
multiple tasks rely on it, then waker's higher latency will damage all
of them, so pulling wakee seems to be a bad deal.
Thus, when 'waker->wakee_flips / wakee->wakee_flips' becomes
higher and higher, the cost of pulling seems to be worse and worse.
The patch therefore helps the wake-affine feature to stop its pulling
work when:
wakee->wakee_flips > factor &&
waker->wakee_flips > (factor * wakee->wakee_flips)
The 'factor' here is the number of CPUs in the current CPU's NUMA node,
so a bigger node will lead to more pulling since the trial becomes more
severe.
After applying the patch, pgbench shows up to 40% improvements and no regressions.
Tested with 12 cpu x86 server and tip 3.10.0-rc7.
The percentages in the final column highlight the areas with the biggest wins,
all other areas improved as well:
pgbench base smart
| db_size | clients | tps | | tps |
+---------+---------+-------+ +-------+
| 22 MB | 1 | 10598 | | 10796 |
| 22 MB | 2 | 21257 | | 21336 |
| 22 MB | 4 | 41386 | | 41622 |
| 22 MB | 8 | 51253 | | 57932 |
| 22 MB | 12 | 48570 | | 54000 |
| 22 MB | 16 | 46748 | | 55982 | +19.75%
| 22 MB | 24 | 44346 | | 55847 | +25.93%
| 22 MB | 32 | 43460 | | 54614 | +25.66%
| 7484 MB | 1 | 8951 | | 9193 |
| 7484 MB | 2 | 19233 | | 19240 |
| 7484 MB | 4 | 37239 | | 37302 |
| 7484 MB | 8 | 46087 | | 50018 |
| 7484 MB | 12 | 42054 | | 48763 |
| 7484 MB | 16 | 40765 | | 51633 | +26.66%
| 7484 MB | 24 | 37651 | | 52377 | +39.11%
| 7484 MB | 32 | 37056 | | 51108 | +37.92%
| 15 GB | 1 | 8845 | | 9104 |
| 15 GB | 2 | 19094 | | 19162 |
| 15 GB | 4 | 36979 | | 36983 |
| 15 GB | 8 | 46087 | | 49977 |
| 15 GB | 12 | 41901 | | 48591 |
| 15 GB | 16 | 40147 | | 50651 | +26.16%
| 15 GB | 24 | 37250 | | 52365 | +40.58%
| 15 GB | 32 | 36470 | | 50015 | +37.14%
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/51D50057.9000809@linux.vnet.ibm.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
For modern CPUs, perf clock is directly related to TSC. TSC
can be calculated from perf clock and vice versa using a simple
calculation. Two of the three componenets of that calculation
are already exported in struct perf_event_mmap_page. This patch
exports the third.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1372425741-1676-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The capabilities bits must not be "union'ed" together.
Put them in a separate struct.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1372425741-1676-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|