aboutsummaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2013-06-17Merge 3.10-rc6 into char-misc-nextGreg Kroah-Hartman
We want the fixes in here.
2013-06-16percpu-refcount: use RCU-sched insted of normal RCUTejun Heo
percpu-refcount was incorrectly using preempt_disable/enable() for RCU critical sections against call_rcu(). 6a24474da8 ("percpu-refcount: consistently use plain (non-sched) RCU") fixed it by converting the preepmtion operations with rcu_read_[un]lock() citing that there isn't any advantage in using sched-RCU over using the usual one; however, rcu_read_[un]lock() for the preemptible RCU implementation - CONFIG_TREE_PREEMPT_RCU, chosen when CONFIG_PREEMPT - are slightly more expensive than preempt_disable/enable(). In a contrived microbench which repeats the followings, - percpu_ref_get() - copy 32 bytes of data into percpu buffer - percpu_put_get() - copy 32 bytes of data into percpu buffer rcu_read_[un]lock() used in percpu_ref_get/put() makes it go slower by about 15% when compared to using sched-RCU. As the RCU critical sections are extremely short, using sched-RCU shouldn't have any latency implications. Convert to RCU-sched. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Kent Overstreet <koverstreet@google.com> Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au>
2013-06-13percpu-refcount: implement percpu_tryget() along with ↵Tejun Heo
percpu_ref_kill_and_confirm() Implement percpu_tryget() which stops giving out references once the percpu_ref is visible as killed. Because the refcnt is per-cpu, different CPUs will start to see a refcnt as killed at different points in time and tryget() may continue to succeed on subset of cpus for a while after percpu_ref_kill() returns. For use cases where it's necessary to know when all CPUs start to see the refcnt as dead, percpu_ref_kill_and_confirm() is added. The new function takes an extra argument @confirm_kill which is invoked when the refcnt is guaranteed to be viewed as killed on all CPUs. While this isn't the prettiest interface, it doesn't force synchronous wait and is much safer than requiring the caller to do its own call_rcu(). v2: Patch description rephrased to emphasize that tryget() may continue to succeed on some CPUs after kill() returns as suggested by Kent. v3: Function comment in percpu_ref_kill_and_confirm() updated warning people to not depend on the implied RCU grace period from the confirm callback as it's an implementation detail. Signed-off-by: Tejun Heo <tj@kernel.org> Slightly-Grumpily-Acked-by: Kent Overstreet <koverstreet@google.com>
2013-06-13percpu-refcount: implement percpu_ref_cancel_init()Tejun Heo
Normally, percpu_ref_init() initializes and percpu_ref_kill() initiates destruction which completes asynchronously. The asynchronous destruction can be problematic in init failure path where the caller wants to destroy half-constructed object - distinguishing half-constructed objects from the usual release method can be painful for complex objects. This patch implements percpu_ref_cancel_init() which synchronously destroys the percpu_ref without invoking release. To avoid unintentional misuses, the function requires the ref to have finished percpu_ref_init() but never used and triggers WARN otherwise. v2: Explain the weird name and usage restriction in the function comment. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Kent Overstreet <koverstreet@google.com>
2013-06-13percpu-refcount: add __must_check to percpu_ref_init() and don't use ↵Tejun Heo
ACCESS_ONCE() in percpu_ref_kill_rcu() Two small changes. * Unlike most init functions, percpu_ref_init() allocates memory and may fail. Let's mark it with __must_check in case the caller forgets. * percpu_ref_kill_rcu() is unnecessarily using ACCESS_ONCE() to dereference @ref->pcpu_count, which can be misleading. The pointer is guaranteed to be valid and visible and can't change underneath the function. Drop ACCESS_ONCE(). Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-12percpu-refcount: cosmetic updatesTejun Heo
* s/percpu_ref_release/percpu_ref_func_t/ as it's customary to have _t postfix for types and the type is gonna be used for a different type of callback too. * Add @ARG to function comments. * Drop unnecessary and unaligned indentation from percpu_ref_init() function comment. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Kent Overstreet <koverstreet@google.com>
2013-06-12lib/mpi/mpicoder.c: looping issue, need stop when equal to zero, found by ↵Chen Gang
'EXTRA_FLAGS=-W'. For 'while' looping, need stop when 'nbytes == 0', or will cause issue. ('nbytes' is size_t which is always bigger or equal than zero). The related warning: (with EXTRA_CFLAGS=-W) lib/mpi/mpicoder.c:40:2: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] Signed-off-by: Chen Gang <gang.chen@asianux.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-07kobject: sanitize argument for format stringKees Cook
Unlike kobject_set_name(), the kset_create_and_add() interface does not provide a way to use format strings, so make sure that the interface cannot be abused accidentally. It looks like all current callers use static strings, so there's no existing flaw. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-05net: core: move mac_pton() to lib/net_utils.cAndy Shevchenko
Since we have at least one user of this function outside of CONFIG_NET scope, we have to provide this function independently. The proposed solution is to move it under lib/net_utils.c with corresponding configuration variable and select wherever it is needed. Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-05crypto: crct10dif - Use PTR_RETHerbert Xu
lib/crc-t10dif.c:42:1-3: WARNING: PTR_RET can be used Use PTR_RET rather than if(IS_ERR(...)) + PTR_ERR Generated by: coccinelle/api/ptr_ret.cocci Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-06-03percpu-refcount: Don't use silly cmpxchg()Kent Overstreet
The cmpxchg() was just to ensure the debug check didn't race, which was a bit excessive. The caller is supposed to do the appropriate synchronization, which means percpu_ref_kill() can just do a simple store. Signed-off-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-03percpu: implement generic percpu refcountingKent Overstreet
This implements a refcount with similar semantics to atomic_get()/atomic_dec_and_test() - but percpu. It also implements two stage shutdown, as we need it to tear down the percpu counts. Before dropping the initial refcount, you must call percpu_ref_kill(); this puts the refcount in "shutting down mode" and switches back to a single atomic refcount with the appropriate barriers (synchronize_rcu()). It's also legal to call percpu_ref_kill() multiple times - it only returns true once, so callers don't have to reimplement shutdown synchronization. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: coding-style tweak] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-03debugfs: add get/set for atomic typesSeth Jennings
debugfs currently lack the ability to create attributes that set/get atomic_t values. This patch adds support for this through a new debugfs_create_atomic_t() function. Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-29sprintf: hex_string(): fix commentSteven Rostedt
hex_string() had a typo in a comment. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-05-24MPILIB: disable usage of floating point registers on pariscHelge Deller
The umul_ppmm() macro for parisc uses the xmpyu assembler statement which does calculation via a floating point register. But usage of floating point registers inside the Linux kernel are not allowed and gcc will stop compilation due to the -mdisable-fpregs compiler option. Fix this by disabling the umul_ppmm() and udiv_qrnnd() macros. The mpilib will then use the generic built-in implementations instead. Signed-off-by: Helge Deller <deller@gmx.de>
2013-05-23Merge tag 'driver-core-3.10-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg Kroah-Hartman: "Here are 3 tiny driver core fixes for 3.10-rc2. A needed symbol export, a change to make it easier to track down offending sysfs files with incorrect attributes, and a klist bugfix. All have been in linux-next for a while" * tag 'driver-core-3.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: klist: del waiter from klist_remove_waiters before wakeup waitting process driver core: print sysfs attribute name when warning about bogus permissions driver core: export subsys_virtual_register
2013-05-23lib: make iovec obj instead of libRandy Dunlap
Fix build error io vmw_vmci.ko when CONFIG_VMWARE_VMCI=m by chaning iovec.o from lib-y to obj-y. ERROR: "memcpy_toiovec" [drivers/misc/vmw_vmci/vmw_vmci.ko] undefined! ERROR: "memcpy_fromiovec" [drivers/misc/vmw_vmci/vmw_vmci.ko] undefined! Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-21klist: del waiter from klist_remove_waiters before wakeup waitting processwang, biao
There is a race between klist_remove and klist_release. klist_remove uses a local var waiter saved on stack. When klist_release calls wake_up_process(waiter->process) to wake up the waiter, waiter might run immediately and reuse the stack. Then, klist_release calls list_del(&waiter->list) to change previous wait data and cause prior waiter thread corrupt. The patch fixes it against kernel 3.9. Signed-off-by: wang, biao <biao.wang@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-20crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform ↵Tim Chen
framework When CRC T10 DIF is calculated using the crypto transform framework, we wrap the crc_t10dif function call to utilize it. This allows us to take advantage of any accelerated CRC T10 DIF transform that is plugged into the crypto framework. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-05-20Hoist memcpy_fromiovec/memcpy_toiovec into lib/Rusty Russell
ERROR: "memcpy_fromiovec" [drivers/vhost/vhost_scsi.ko] undefined! That function is only present with CONFIG_NET. Turns out that crypto/algif_skcipher.c also uses that outside net, but it actually needs sockets anyway. In addition, commit 6d4f0139d642c45411a47879325891ce2a7c164a added CONFIG_NET dependency to CONFIG_VMCI for memcpy_toiovec, so hoist that function and revert that commit too. socket.h already includes uio.h, so no callers need updating; trying only broke things fo x86_64 randconfig (thanks Fengguang!). Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-05-08Merge branch 'for-3.10/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: "It might look big in volume, but when categorized, not a lot of drivers are touched. The pull request contains: - mtip32xx fixes from Micron. - A slew of drbd updates, this time in a nicer series. - bcache, a flash/ssd caching framework from Kent. - Fixes for cciss" * 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits) bcache: Use bd_link_disk_holder() bcache: Allocator cleanup/fixes cciss: bug fix to prevent cciss from loading in kdump crash kernel cciss: add cciss_allow_hpsa module parameter drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions mtip32xx: Workaround for unaligned writes bcache: Make sure blocksize isn't smaller than device blocksize bcache: Fix merge_bvec_fn usage for when it modifies the bvm bcache: Correctly check against BIO_MAX_PAGES bcache: Hack around stuff that clones up to bi_max_vecs bcache: Set ra_pages based on backing device's ra_pages bcache: Take data offset from the bdev superblock. mtip32xx: mtip32xx: Disable TRIM support mtip32xx: fix a smatch warning bcache: Disable broken btree fuzz tester bcache: Fix a format string overflow bcache: Fix a minor memory leak on device teardown bcache: Documentation updates bcache: Use WARN_ONCE() instead of __WARN() bcache: Add missing #include <linux/prefetch.h> ...
2013-05-07rwsem: check counter to avoid cmpxchg callsDavidlohr Bueso
This patch tries to reduce the amount of cmpxchg calls in the writer failed path by checking the counter value first before issuing the instruction. If ->count is not set to RWSEM_WAITING_BIAS then there is no point wasting a cmpxchg call. Furthermore, Michel states "I suppose it helps due to the case where someone else steals the lock while we're trying to acquire sem->wait_lock." Two very different workloads and machines were used to see how this patch improves throughput: pgbench on a quad-core laptop and aim7 on a large 8 socket box with 80 cores. Some results comparing Michel's fast-path write lock stealing (tps-rwsem) on a quad-core laptop running pgbench: | db_size | clients | tps-rwsem | tps-patch | +---------+----------+----------------+--------------+ | 160 MB | 1 | 6906 | 9153 | + 32.5 | 160 MB | 2 | 15931 | 22487 | + 41.1% | 160 MB | 4 | 33021 | 32503 | | 160 MB | 8 | 34626 | 34695 | | 160 MB | 16 | 33098 | 34003 | | 160 MB | 20 | 31343 | 31440 | | 160 MB | 30 | 28961 | 28987 | | 160 MB | 40 | 26902 | 26970 | | 160 MB | 50 | 25760 | 25810 | ------------------------------------------------------ | 1.6 GB | 1 | 7729 | 7537 | | 1.6 GB | 2 | 19009 | 23508 | + 23.7% | 1.6 GB | 4 | 33185 | 32666 | | 1.6 GB | 8 | 34550 | 34318 | | 1.6 GB | 16 | 33079 | 32689 | | 1.6 GB | 20 | 31494 | 31702 | | 1.6 GB | 30 | 28535 | 28755 | | 1.6 GB | 40 | 27054 | 27017 | | 1.6 GB | 50 | 25591 | 25560 | ------------------------------------------------------ | 7.6 GB | 1 | 6224 | 7469 | + 20.0% | 7.6 GB | 2 | 13611 | 12778 | | 7.6 GB | 4 | 33108 | 32927 | | 7.6 GB | 8 | 34712 | 34878 | | 7.6 GB | 16 | 32895 | 33003 | | 7.6 GB | 20 | 31689 | 31974 | | 7.6 GB | 30 | 29003 | 28806 | | 7.6 GB | 40 | 26683 | 26976 | | 7.6 GB | 50 | 25925 | 25652 | ------------------------------------------------------ For the aim7 worloads, they overall improved on top of Michel's patchset. For full graphs on how the rwsem series plus this patch behaves on a large 8 socket machine against a vanilla kernel: http://stgolabs.net/rwsem-aim7-results.tar.gz Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07kref: minor cleanupAnatol Pomozov
- make warning smp-safe - result of atomic _unless_zero functions should be checked by caller to avoid use-after-free error - trivial whitespace fix. Link: https://lkml.org/lkml/2013/4/12/391 Tested: compile x86, boot machine and run xfstests Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> [ Removed line-break, changed to use WARN_ON_ONCE() - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07Merge branch 'rwsem-optimizations'Linus Torvalds
Merge rwsem optimizations from Michel Lespinasse: "These patches extend Alex Shi's work (which added write lock stealing on the rwsem slow path) in order to provide rwsem write lock stealing on the fast path (that is, without taking the rwsem's wait_lock). I have unfortunately been unable to push this through -next before due to Ingo Molnar / David Howells / Peter Zijlstra being busy with other things. However, this has gotten some attention from Rik van Riel and Davidlohr Bueso who both commented that they felt this was ready for v3.10, and Ingo Molnar has said that he was OK with me pushing directly to you. So, here goes :) Davidlohr got the following test results from pgbench running on a quad-core laptop: | db_size | clients | tps-vanilla | tps-rwsem | +---------+----------+----------------+--------------+ | 160 MB | 1 | 5803 | 6906 | + 19.0% | 160 MB | 2 | 13092 | 15931 | | 160 MB | 4 | 29412 | 33021 | | 160 MB | 8 | 32448 | 34626 | | 160 MB | 16 | 32758 | 33098 | | 160 MB | 20 | 26940 | 31343 | + 16.3% | 160 MB | 30 | 25147 | 28961 | | 160 MB | 40 | 25484 | 26902 | | 160 MB | 50 | 24528 | 25760 | ------------------------------------------------------ | 1.6 GB | 1 | 5733 | 7729 | + 34.8% | 1.6 GB | 2 | 9411 | 19009 | + 101.9% | 1.6 GB | 4 | 31818 | 33185 | | 1.6 GB | 8 | 33700 | 34550 | | 1.6 GB | 16 | 32751 | 33079 | | 1.6 GB | 20 | 30919 | 31494 | | 1.6 GB | 30 | 28540 | 28535 | | 1.6 GB | 40 | 26380 | 27054 | | 1.6 GB | 50 | 25241 | 25591 | ------------------------------------------------------ | 7.6 GB | 1 | 5779 | 6224 | | 7.6 GB | 2 | 10897 | 13611 | + 24.9% | 7.6 GB | 4 | 32683 | 33108 | | 7.6 GB | 8 | 33968 | 34712 | | 7.6 GB | 16 | 32287 | 32895 | | 7.6 GB | 20 | 27770 | 31689 | + 14.1% | 7.6 GB | 30 | 26739 | 29003 | | 7.6 GB | 40 | 24901 | 26683 | | 7.6 GB | 50 | 17115 | 25925 | + 51.5% ------------------------------------------------------ (Davidlohr also has one additional patch which further improves throughput, though I will ask him to send it directly to you as I have suggested some minor changes)." * emailed patches from Michel Lespinasse <walken@google.com>: rwsem: no need for explicit signed longs x86 rwsem: avoid taking slow path when stealing write lock rwsem: do not block readers at head of queue if other readers are active rwsem: implement support for write lock stealing on the fastpath rwsem: simplify __rwsem_do_wake rwsem: skip initial trylock in rwsem_down_write_failed rwsem: avoid taking wait_lock in rwsem_down_write_failed rwsem: use cmpxchg for trying to steal write lock rwsem: more agressive lock stealing in rwsem_down_write_failed rwsem: simplify rwsem_down_write_failed rwsem: simplify rwsem_down_read_failed rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed rwsem: shorter spinlocked section in rwsem_down_failed_common() rwsem: make the waiter type an enumeration rather than a bitmask
2013-05-07rwsem: no need for explicit signed longsDavidlohr Bueso
Change explicit "signed long" declarations into plain "long" as suggested by Peter Hurley. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Reviewed-by: Michel Lespinasse <walken@google.com> Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07rwsem: do not block readers at head of queue if other readers are activeMichel Lespinasse
This change fixes a race condition where a reader might determine it needs to block, but by the time it acquires the wait_lock the rwsem has active readers and no queued waiters. In this situation the reader can run in parallel with the existing active readers; it does not need to block until the active readers complete. Thanks to Peter Hurley for noticing this possible race. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07rwsem: implement support for write lock stealing on the fastpathMichel Lespinasse
When we decide to wake up readers, we must first grant them as many read locks as necessary, and then actually wake up all these readers. But in order to know how many read shares to grant, we must first count the readers at the head of the queue. This might take a while if there are many readers, and we want to be protected against a writer stealing the lock while we're counting. To that end, we grant the first reader lock before counting how many more readers are queued. We also require some adjustments to the wake_type semantics. RWSEM_WAKE_NO_ACTIVE used to mean that we had found the count to be RWSEM_WAITING_BIAS, in which case the rwsem was known to be free as nobody could steal it while we hold the wait_lock. This doesn't make sense once we implement fastpath write lock stealing, so we now use RWSEM_WAKE_ANY in that case. Similarly, when rwsem_down_write_failed found that a read lock was active, it would use RWSEM_WAKE_READ_OWNED which signalled that new readers could be woken without checking first that the rwsem was available. We can't do that anymore since the existing readers might release their read locks, and a writer could steal the lock before we wake up additional readers. So, we have to use a new RWSEM_WAKE_READERS value to indicate we only want to wake readers, but we don't currently hold any read lock. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07rwsem: simplify __rwsem_do_wakeMichel Lespinasse
This is mostly for cleanup value: - We don't need several gotos to handle the case where the first waiter is a writer. Two simple tests will do (and generate very similar code). - In the remainder of the function, we know the first waiter is a reader, so we don't have to double check that. We can use do..while loops to iterate over the readers to wake (generates slightly better code). Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07rwsem: skip initial trylock in rwsem_down_write_failedMichel Lespinasse
We can skip the initial trylock in rwsem_down_write_failed() if there are known active lockers already, thus saving one likely-to-fail cmpxchg. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07rwsem: avoid taking wait_lock in rwsem_down_write_failedMichel Lespinasse
In rwsem_down_write_failed(), if there are active locks after we wake up (i.e. the lock got stolen from us), skip taking the wait_lock and go back to sleep immediately. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07rwsem: use cmpxchg for trying to steal write lockMichel Lespinasse
Using rwsem_atomic_update to try stealing the write lock forced us to undo the adjustment in the failure path. We can have simpler and faster code by using cmpxchg instead. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07rwsem: more agressive lock stealing in rwsem_down_write_failedMichel Lespinasse
Some small code simplifications can be achieved by doing more agressive lock stealing: - When rwsem_down_write_failed() notices that there are no active locks (and thus no thread to wake us if we decided to sleep), it used to wake the first queued process. However, stealing the lock is also sufficient to deal with this case, so we don't need this check anymore. - In try_get_writer_sem(), we can steal the lock even when the first waiter is a reader. This is correct because the code path that wakes readers is protected by the wait_lock. As to the performance effects of this change, they are expected to be minimal: readers are still granted the lock (rather than having to acquire it themselves) when they reach the front of the wait queue, so we have essentially the same behavior as in rwsem-spinlock. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07rwsem: simplify rwsem_down_write_failedMichel Lespinasse
When waking writers, we never grant them the lock - instead, they have to acquire it themselves when they run, and remove themselves from the wait_list when they succeed. As a result, we can do a few simplifications in rwsem_down_write_failed(): - We don't need to check for !waiter.task since __rwsem_do_wake() doesn't remove writers from the wait_list - There is no point releaseing the wait_lock before entering the wait loop, as we will need to reacquire it immediately. We can change the loop so that the lock is always held at the start of each loop iteration. - We don't need to get a reference on the task structure, since the task is responsible for removing itself from the wait_list. There is no risk, like in the rwsem_down_read_failed() case, that a task would wake up and exit (thus destroying its task structure) while __rwsem_do_wake() is still running - wait_lock protects against that. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07rwsem: simplify rwsem_down_read_failedMichel Lespinasse
When trying to acquire a read lock, the RWSEM_ACTIVE_READ_BIAS adjustment doesn't cause other readers to block, so we never have to worry about waking them back after canceling this adjustment in rwsem_down_read_failed(). We also never want to steal the lock in rwsem_down_read_failed(), so we don't have to grab the wait_lock either. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failedMichel Lespinasse
Remove the rwsem_down_failed_common function and replace it with two identical copies of its code in rwsem_down_{read,write}_failed. This is because we want to make different optimizations in rwsem_down_{read,write}_failed; we are adding this pure-duplication step as a separate commit in order to make it easier to check the following steps. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07rwsem: shorter spinlocked section in rwsem_down_failed_common()Michel Lespinasse
This change reduces the size of the spinlocked and TASK_UNINTERRUPTIBLE sections in rwsem_down_failed_common(): - We only need the sem->wait_lock to insert ourselves on the wait_list; the waiter node can be prepared outside of the wait_lock. - The task state only needs to be set to TASK_UNINTERRUPTIBLE immediately before checking if we actually need to sleep; it doesn't need to protect the entire function. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07rwsem: make the waiter type an enumeration rather than a bitmaskMichel Lespinasse
We are not planning to add some new waiter flags, so we can convert the waiter type into an enumeration. Background: David Howells suggested I do this back when I tried adding a new waiter type for unfair readers. However, I believe the cleanup applies regardless of that use case. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-05Give the OID registry file module info to avoid kernel taintingDavid Howells
Give the OID registry file module information so that it doesn't taint the kernel when compiled as a module and loaded. Reported-by: Dros Adamson <Weston.Adamson@netapp.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Trond Myklebust <Trond.Myklebust@netapp.com> cc: stable@vger.kernel.org cc: linux-nfs@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-02Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm updates from Dave Airlie: "This is the main drm pull request for 3.10. Wierd bits: - OMAP drm changes required OMAP dss changes, in drivers/video, so I took them in here. - one more fbcon fix for font handover - VT switch avoidance in pm code - scatterlist helpers for gpu drivers - have acks from akpm Highlights: - qxl kms driver - driver for the spice qxl virtual GPU Nouveau: - fermi/kepler VRAM compression - GK110/nvf0 modesetting support. Tegra: - host1x core merged with 2D engine support i915: - vt switchless resume - more valleyview support - vblank fixes - modesetting pipe config rework radeon: - UVD engine support - SI chip tiling support - GPU registers initialisation from golden values. exynos: - device tree changes - fimc block support Otherwise: - bunches of fixes all over the place." * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (513 commits) qxl: update to new idr interfaces. drm/nouveau: fix build with nv50->nvc0 drm/radeon: fix handling of v6 power tables drm/radeon: clarify family checks in pm table parsing drm/radeon: consolidate UVD clock programming drm/radeon: fix UPLL_REF_DIV_MASK definition radeon: add bo tracking debugfs drm/radeon: add new richland pci ids drm/radeon: add some new SI PCI ids drm/radeon: fix scratch reg handling for UVD fence drm/radeon: allocate SA bo in the requested domain drm/radeon: fix possible segfault when parsing pm tables drm/radeon: fix endian bugs in atom_allocate_fb_scratch() OMAPDSS: TFP410: return EPROBE_DEFER if the i2c adapter not found OMAPDSS: VENC: Add error handling for venc_probe_pdata OMAPDSS: HDMI: Add error handling for hdmi_probe_pdata OMAPDSS: RFBI: Add error handling for rfbi_probe_pdata OMAPDSS: DSI: Add error handling for dsi_probe_pdata OMAPDSS: SDI: Add error handling for sdi_probe_pdata OMAPDSS: DPI: Add error handling for dpi_probe_pdata ...
2013-05-02Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "This fixes the cputime scaling overflow problems for good without having bad 32-bit overhead, and gets rid of the div64_u64_rem() helper as well." * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "math64: New div64_u64_rem helper" sched: Avoid prev->stime underflow sched: Do not account bogus utime sched: Avoid cputime scaling overflow
2013-05-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS updates from Al Viro, Misc cleanups all over the place, mainly wrt /proc interfaces (switch create_proc_entry to proc_create(), get rid of the deprecated create_proc_read_entry() in favor of using proc_create_data() and seq_file etc). 7kloc removed. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits) don't bother with deferred freeing of fdtables proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h proc: Make the PROC_I() and PDE() macros internal to procfs proc: Supply a function to remove a proc entry by PDE take cgroup_open() and cpuset_open() to fs/proc/base.c ppc: Clean up scanlog ppc: Clean up rtas_flash driver somewhat hostap: proc: Use remove_proc_subtree() drm: proc: Use remove_proc_subtree() drm: proc: Use minor->index to label things, not PDE->name drm: Constify drm_proc_list[] zoran: Don't print proc_dir_entry data in debug reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() proc: Supply an accessor for getting the data from a PDE's parent airo: Use remove_proc_subtree() rtl8192u: Don't need to save device proc dir PDE rtl8187se: Use a dir under /proc/net/r8180/ proc: Add proc_mkdir_data() proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} proc: Move PDE_NET() to fs/proc/proc_net.c ...
2013-04-30Merge branch 'akpm' (incoming from Andrew)Linus Torvalds
Merge third batch of fixes from Andrew Morton: "Most of the rest. I still have two large patchsets against AIO and IPC, but they're a bit stuck behind other trees and I'm about to vanish for six days. - random fixlets - inotify - more of the MM queue - show_stack() cleanups - DMI update - kthread/workqueue things - compat cleanups - epoll udpates - binfmt updates - nilfs2 - hfs - hfsplus - ptrace - kmod - coredump - kexec - rbtree - pids - pidns - pps - semaphore tweaks - some w1 patches - relay updates - core Kconfig changes - sysrq tweaks" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits) Documentation/sysrq: fix inconstistent help message of sysrq key ethernet/emac/sysrq: fix inconstistent help message of sysrq key sparc/sysrq: fix inconstistent help message of sysrq key powerpc/xmon/sysrq: fix inconstistent help message of sysrq key ARM/etm/sysrq: fix inconstistent help message of sysrq key power/sysrq: fix inconstistent help message of sysrq key kgdb/sysrq: fix inconstistent help message of sysrq key lib/decompress.c: fix initconst notifier-error-inject: fix module names in Kconfig kernel/sys.c: make prctl(PR_SET_MM) generally available UAPI: remove empty Kbuild files menuconfig: print more info for symbol without prompts init/Kconfig: re-order CONFIG_EXPERT options to fix menuconfig display kconfig menu: move Virtualization drivers near other virtualization options Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS relay: use macro PAGE_ALIGN instead of FIX_SIZE kernel/relay.c: move FIX_SIZE macro into relay.c kernel/relay.c: remove unused function argument actor drivers/w1/slaves/w1_ds2760.c: fix the error handling in w1_ds2760_add_slave() drivers/w1/slaves/w1_ds2781.c: fix the error handling in w1_ds2781_add_slave() ...
2013-04-30lib/decompress.c: fix initconstAndi Kleen
Signed-off-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30notifier-error-inject: fix module names in KconfigAkinobu Mita
The Kconfig help text for MEMORY_NOTIFIER_ERROR_INJECT and OF_RECONFIG_NOTIFIER_ERROR_INJECT has mismatched module names. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKSStephen Boyd
The help text for this config is duplicated across the x86, parisc, and s390 Kconfig.debug files. Arnd Bergman noted that the help text was slightly misleading and should be fixed to state that enabling this option isn't a problem when using pre 4.4 gcc. To simplify the rewording, consolidate the text into lib/Kconfig.debug and modify it there to be more explicit about when you should say N to this config. Also, make the text a bit more generic by stating that this option enables compile time checks so we can cover architectures which emit warnings vs. ones which emit errors. The details of how an architecture decided to implement the checks isn't as important as the concept of compile time checking of copy_from_user() calls. While we're doing this, remove all the copy_from_user_overflow() code that's duplicated many times and place it into lib/ so that any architecture supporting this option can get the function for free. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Helge Deller <deller@gmx.de> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30rbtree_test: add __init/__exit annotationsDavidlohr Bueso
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Reviewed-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30rbtree_test: add extra rbtree integrity checkDavidlohr Bueso
Account for the rbtree having 2**bh(v)-1 internal nodes. While this can be seen as a consequence of other checks, Michel states that it nicely sums up what the other properties are for. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Reviewed-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30dynamic_debug: reuse generic string_unescape functionAndy Shevchenko
There is kernel function to do the job in generic way. Let's use it. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30lib/string_helpers: introduce generic string_unescapeAndy Shevchenko
There are several places in kernel where modules unescapes input to convert C-Style Escape Sequences into byte codes. The patch provides generic implementation of such approach. Test cases are also included into the patch. [akpm@linux-foundation.org: clarify comment] [akpm@linux-foundation.org: export get_random_int() to modules] Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Baron <jbaron@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: William Hubbs <w.d.hubbs@gmail.com> Cc: Chris Brannon <chris@the-brannons.com> Cc: Kirk Reiser <kirk@braille.uwo.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30dump_stack: consolidate dump_stack() implementations and unify their behaviorsTejun Heo
Both dump_stack() and show_stack() are currently implemented by each architecture. show_stack(NULL, NULL) dumps the backtrace for the current task as does dump_stack(). On some archs, dump_stack() prints extra information - pid, utsname and so on - in addition to the backtrace while the two are identical on other archs. The usages in arch-independent code of the two functions indicate show_stack(NULL, NULL) should print out bare backtrace while dump_stack() is used for debugging purposes when something went wrong, so it does make sense to print additional information on the task which triggered dump_stack(). There's no reason to require archs to implement two separate but mostly identical functions. It leads to unnecessary subtle information. This patch expands the dummy fallback dump_stack() implementation in lib/dump_stack.c such that it prints out debug information (taken from x86) and invokes show_stack(NULL, NULL) and drops arch-specific dump_stack() implementations in all archs except blackfin. Blackfin's dump_stack() does something wonky that I don't understand. Debug information can be printed separately by calling dump_stack_print_info() so that arch-specific dump_stack() implementation can still emit the same debug information. This is used in blackfin. This patch brings the following behavior changes. * On some archs, an extra level in backtrace for show_stack() could be printed. This is because the top frame was determined in dump_stack() on those archs while generic dump_stack() can't do that reliably. It can be compensated by inlining dump_stack() but not sure whether that'd be necessary. * Most archs didn't use to print debug info on dump_stack(). They do now. An example WARN dump follows. WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505() Hardware name: empty Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #9 0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48 ffffffff8108f50f ffffffff82228240 0000000000000040 ffffffff8234a03c 0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58 Call Trace: [<ffffffff81c614dc>] dump_stack+0x19/0x1b [<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20 [<ffffffff8234a071>] init_workqueues+0x35/0x505 ... v2: CPU number added to the generic debug info as requested by s390 folks and dropped the s390 specific dump_stack(). This loses %ksp from the debug message which the maintainers think isn't important enough to keep the s390-specific dump_stack() implementation. dump_stack_print_info() is moved to kernel/printk.c from lib/dump_stack.c. Because linkage is per objecct file, dump_stack_print_info() living in the same lib file as generic dump_stack() means that archs which implement custom dump_stack() - at this point, only blackfin - can't use dump_stack_print_info() as that will bring in the generic version of dump_stack() too. v1 The v1 patch broke build on blackfin due to this issue. The build breakage was reported by Fengguang Wu. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390 bits] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon bits] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>