| Age | Commit message (Collapse) | Author |
|
decnet/af_decnet.c
Move prototype declaration of functions to header file include/net/dn_route.h
from net/decnet/af_decnet.c because it is used by more than one file.
This eliminates the following warning in net/decnet/dn_route.c:
net/decnet/dn_route.c:629:5: warning: no previous prototype for ‘dn_route_rcv’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some filesystems can handle direct I/O writes beyond i_size safely,
so allow them to opt into receiving them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/nftables/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes, mostly nftables
fixes, most relevantly they are:
* Fix a crash in the h323 conntrack NAT helper due to expectation list
corruption, from Alexey Dobriyan.
* A couple of RCU race fixes for conntrack, one manifests by hitting BUG_ON
in nf_nat_setup_info() and the destroy path, patches from Andrey Vagin and
me.
* Dump direction attribute in nft_ct only if it is set, from Arturo
Borrero.
* Fix IPVS bug in its own connection tracking system that may lead to
copying only 4 bytes of the IPv6 address when initializing the
ip_vs_conn object, from Michal Kubecek.
* Fix -EBUSY errors in nftables when deleting the rules, chain and tables
in a row due mixture of asynchronous and synchronous object releasing,
from me.
* Three fixes for the nf_tables set infrastructure when using intervals and
mappings, from me.
* Four patches to fixing the nf_tables log, reject and ct expressions from
the new inet table, from Patrick McHardy.
* Fix memory overrun in the map that is used to dynamically allocate names
from anonymous sets, also from Patrick.
* Fix a potential oops if you dump a set with NFPROTO_UNSPEC and a table
name, from Patrick McHardy.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add prototype declaration to header file include/linux/bio.h because it
is used by more than one file.
This eliminates the following warning in bio-integrity.c:
fs/bio-integrity.c:214:14: warning: no previous prototype for ‘bio_integrity_tag_size’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Cosmetic. This doesn't really matter because a) device->mutex is
the only user of __lockdep_no_validate__ and b) this class should
be never reported as the source of problem, but if something goes
wrong "&dev->mutex" looks better than "&__lockdep_no_validate__"
as the name of the lock.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140120182016.GA26512@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The "int check" argument of lock_acquire() and held_lock->check are
misleading. This is actually a boolean: 2 means "true", everything
else is "false".
And there is no need to pass 1 or 0 to lock_acquire() depending on
CONFIG_PROVE_LOCKING, __lock_acquire() checks prove_locking at the
start and clears "check" if !CONFIG_PROVE_LOCKING.
Note: probably we can simply kill this member/arg. The only explicit
user of check => 0 is rcu_lock_acquire(), perhaps we can change it to
use lock_acquire(trylock =>, read => 2). __lockdep_no_validate means
check => 0 implicitly, but we can change validate_chain() to check
hlock->instance->key instead. Not to mention it would be nice to get
rid of lockdep_set_novalidate_class().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140120182006.GA26495@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This patch allows each architecture to add its specific assembly optimized
arch_mcs_spin_lock_contended and arch_mcs_spinlock_uncontended for
MCS lock and unlock functions.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: AswinChandramouleeswaran <aswin@hp.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Rik vanRiel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: MichelLespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Figo.zhang" <figo1802@gmail.com>
Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew R Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390347382.3138.67.camel@schen9-DESK
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
when sync_page_range() had been introduced; generic_file_write{,v}() correctly
synced
pos_after_write - written .. pos_after_write - 1
but generic_file_aio_write() synced
pos_before_write .. pos_before_write + written - 1
instead. Which is not the same thing with O_APPEND, obviously.
A couple of years later correct variant had been killed off when
everything switched to use of generic_file_aio_write().
All users of generic_file_aio_write() are affected, and the same bug
has been copied into other instances of ->aio_write().
The fix is trivial; the only subtle point is that generic_write_sync()
ought to be inlined to avoid calculations useless for the majority of
calls.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
As patch "sched: Move the priority specific bits into a new header file" exposes
the priority related macros in linux/sched/prio.h, we don't have to implement
task_nice() in kernel/sched/core.c any more.
This patch implements it in linux/sched/sched.h as static inline function,
saving the kernel stack and enhancing performance a bit.
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Cc: clark.williams@gmail.com
Cc: rostedt@goodmis.org
Cc: raistlin@linux.it
Cc: juri.lelli@gmail.com
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390878045-7096-1-git-send-email-yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Some drivers use request_any_context_irq() but there isn't a
devm_* function for it. Add one so that these drivers don't need
to explicitly free the irq on driver detach.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: http://lkml.kernel.org/r/1388709460-19222-3-git-send-email-sboyd@codeaurora.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The hrtimer mode of broadcast is supported only when
GENERIC_CLOCKEVENTS_BROADCAST and TICK_ONESHOT config options
are enabled. Hence compile in the functions for hrtimer mode
of broadcast only when these options are selected.
Also fix max_delta_ticks value for the pseudo clock device.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/52F719EE.9010304@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
There isn't an explicit stolen memory base register on gen2.
Some old comment in the i915 code suggests we should get it via
max_low_pfn_mapped, but that's clearly a bad idea on my MGM.
The e820 map in said machine looks like this:
BIOS-e820: [mem 0x0000000000000000-0x000000000009f7ff] usable
BIOS-e820: [mem 0x000000000009f800-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000ce000-0x00000000000cffff] reserved
BIOS-e820: [mem 0x00000000000dc000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x000000001f6effff] usable
BIOS-e820: [mem 0x000000001f6f0000-0x000000001f6f7fff] ACPI data
BIOS-e820: [mem 0x000000001f6f8000-0x000000001f6fffff] ACPI NVS
BIOS-e820: [mem 0x000000001f700000-0x000000001fffffff] reserved
BIOS-e820: [mem 0x00000000fec10000-0x00000000fec1ffff] reserved
BIOS-e820: [mem 0x00000000ffb00000-0x00000000ffbfffff] reserved
BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
That makes max_low_pfn_mapped = 1f6f0000, so assuming our stolen
memory would start there would place it on top of some ACPI
memory regions. So not a good idea as already stated.
The 9MB region after the ACPI regions at 0x1f700000 however
looks promising given that the macine reports the stolen memory
size to be 8MB. Looking at the PGTBL_CTL register, the GTT
entries are at offset 0x1fee00000, and given that the GTT
entries occupy 128KB, it looks like the stolen memory could
start at 0x1f700000 and the GTT entries would occupy the last
128KB of the stolen memory.
After some more digging through chipset documentation, I've
determined the BIOS first allocates space for something called
TSEG (something to do with SMM) from the top of memory, and then
it allocates the graphics stolen memory below that. Accordind to
the chipset documentation TSEG has a fixed size of 1MB on 855.
So that explains the top 1MB in the e820 region. And it also
confirms that the GTT entries are in fact at the end of the the
stolen memory region.
Derive the stolen memory base address on gen2 the same as the
BIOS does (TOM-TSEG_SIZE-stolen_size). There are a few
differences between the registers on various gen2 chipsets, so a
few different codepaths are required.
865G is again bit more special since it seems to support enough
memory to hit 4GB address space issues. This means the PCI
allocations will also affect the location of the stolen memory.
Fortunately there appears to be the TOUD register which may give
us the correct answer directly. But the chipset docs are a bit
unclear, so I'm not 100% sure that the graphics stolen memory is
always the last thing the BIOS steals. Someone would need to
verify it on a real system.
I tested this on the my 830 and 855 machines, and so far
everything looks peachy.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1391628540-23072-3-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Enabling '-Wsign-compare' compiler warnings on code that includes
include/linux/bitops.h can generate the following warning:
In file included from include/linux/kernel.h:10:0,
from <random filename>:48:
include/linux/bitops.h: In function 'hweight_long':
include/linux/bitops.h:77:26: error: signed and unsigned type in conditional expression [-Werror=sign-compare]
(converted to an error with -Werror)
This is due to the use of the logical negation operator '!' in the
__const_hweight8 macro in include/asm-generic/bitops/const_hweight.h.
The use of that operator here results in a signed value.
Fix by explicitly casting the __const_hweight8 macro expansion to
'unsigned int'. While here, clean up several checkpatch.pl warnings.
Signed-off-by: Paul Walmsley <pwalmsley@nvidia.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1312180459580.30198@tamien
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Some macros in kernel/sched/sched.h about priority are
private to kernel/sched. But they are useful to other
parts of the core kernel.
This patch moves these macros from kernel/sched/sched.h to
include/linux/sched/prio.h so that they are available to
other subsystems.
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Cc: raistlin@linux.it
Cc: juri.lelli@gmail.com
Cc: clark.williams@gmail.com
Cc: rostedt@goodmis.org
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/2b022810905b52d13238466807f4b2a691577180.1390859827.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Some bits about priority are defined in linux/sched/rt.h, but
some of them are not only for rt scheduler, such as MAX_PRIO.
This patch move them all into a new header file, linux/sched/prio.h.
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Cc: clark.williams@gmail.com
Cc: rostedt@goodmis.org
Cc: raistlin@linux.it
Cc: juri.lelli@gmail.com
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f7549508a1588da2c613d601748ca9de30fa5dcf.1390859827.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Calling printk() from NMI context is bad (TM), so move it to IRQ
context.
This also avoids the problem where the printk() time is measured by
the generic NMI duration goo and triggers a second warning.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-75dv35xf6dhhmeb7nq6fua31@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This warning seems to show up a lot now, since ___wait_event()
is (indirectly) used inside wait_event_timeout(), which also
has a variable called __ret. Rename the one in ___wait_event()
to ___ret (another leading underscore) to suppress the warning.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1391704121.12789.20.camel@jlt4.sipsolutions.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Move the support to perform an HMAC calculation into
the CCP operations file. This eliminates the need to
perform a synchronous SHA operation used to calculate
the HMAC.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"Quite a varied little collection of fixes. Most of them are
relatively small or isolated; the biggest one is Mel Gorman's fixes
for TLB range flushing.
A couple of AMD-related fixes (including not crashing when given an
invalid microcode image) and fix a crash when compiled with gcov"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, microcode, AMD: Unify valid container checks
x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y
x86/efi: Allow mapping BGRT on x86-32
x86: Fix the initialization of physnode_map
x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable()
x86/intel/mid: Fix X86_INTEL_MID dependencies
arch/x86/mm/srat: Skip NUMA_NO_NODE while parsing SLIT
mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge
x86: mm: change tlb_flushall_shift for IvyBridge
x86/mm: Eliminate redundant page table walk during TLB range flushing
x86/mm: Clean up inconsistencies when flushing TLB ranges
mm, x86: Account for TLB flushes only when debugging
x86/AMD/NB: Fix amd_set_subcaches() parameter type
x86/quirks: Add workaround for AMD F16h Erratum792
x86, doc, kconfig: Fix dud URL for Microcode data
|
|
Pending kernfs conversion depends on kernfs improvements in
driver-core-next. Pull it into for-3.15.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
It's no longer referenced outside cgroup core, so renaming is easy.
Let's rename it for consistency & brevity.
This patch is pure rename.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
|
|
cgroup_subsys is a bit messier than it needs to be.
* The name of a subsys can be different from its internal identifier
defined in cgroup_subsys.h. Most subsystems use the matching name
but three - cpu, memory and perf_event - use different ones.
* cgroup_subsys_id enums are postfixed with _subsys_id and each
cgroup_subsys is postfixed with _subsys. cgroup.h is widely
included throughout various subsystems, it doesn't and shouldn't
have claim on such generic names which don't have any qualifier
indicating that they belong to cgroup.
* cgroup_subsys->subsys_id should always equal the matching
cgroup_subsys_id enum; however, we require each controller to
initialize it and then BUG if they don't match, which is a bit
silly.
This patch cleans up cgroup_subsys names and initialization by doing
the followings.
* cgroup_subsys_id enums are now postfixed with _cgrp_id, and each
cgroup_subsys with _cgrp_subsys.
* With the above, renaming subsys identifiers to match the userland
visible names doesn't cause any naming conflicts. All non-matching
identifiers are renamed to match the official names.
cpu_cgroup -> cpu
mem_cgroup -> memory
perf -> perf_event
* controllers no longer need to initialize ->subsys_id and ->name.
They're generated in cgroup core and set automatically during boot.
* Redundant cgroup_subsys declarations removed.
* While updating BUG_ON()s in cgroup_init_early(), convert them to
WARN()s. BUGging that early during boot is stupid - the kernel
can't print anything, even through serial console and the trap
handler doesn't even link stack frame properly for back-tracing.
This patch doesn't introduce any behavior changes.
v2: Rebased on top of fe1217c4f3f7 ("net: net_cls: move cgroupfs
classid handling into core").
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
|
|
With module supported dropped from net_prio, no controller is using
cgroup module support. None of actual resource controllers can be
built as a module and we aren't gonna add new controllers which don't
control resources. This patch drops module support from cgroup.
* cgroup_[un]load_subsys() and cgroup_subsys->module removed.
* As there's no point in distinguishing IS_BUILTIN() and IS_MODULE(),
cgroup_subsys.h now uses IS_ENABLED() directly.
* enum cgroup_subsys_id now exactly matches the list of enabled
controllers as ordered in cgroup_subsys.h.
* cgroup_subsys[] is now a contiguously occupied array. Size
specification is no longer necessary and dropped.
* for_each_builtin_subsys() is removed and for_each_subsys() is
updated to not require any locking.
* module ref handling is removed from rebind_subsystems().
* Module related comments dropped.
v2: Rebased on top of fe1217c4f3f7 ("net: net_cls: move cgroupfs
classid handling into core").
v3: Added {} around the if (need_forkexit_callback) block in
cgroup_post_fork() for readability as suggested by Li.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
|
|
init_netclassid_cgroup()
net_prio is the only cgroup which is allowed to be built as a module.
The savings from allowing one controller to be built as a module are
tiny especially given that cgroup module support itself adds quite a
bit of complexity.
Given that none of other controllers has much chance of being made a
module and that we're unlikely to add new modular controllers, the
added complexity is simply not justifiable.
As a first step to drop cgroup module support, this patch changes the
config option to bool from tristate and drops module related code from
it.
Also, while an earlier commit fe1217c4f3f7 ("net: net_cls: move
cgroupfs classid handling into core") dropped module support from
net_cls cgroup, it retained a call to cgroup_load_subsys(), which is
noop for built-in controllers. Drop it along with
init_netclassid_cgroup().
v2: Removed modular version of task_netprioidx() in
include/net/netprio_cgroup.h as suggested by Li Zefan.
v3: Rebased on top of fe1217c4f3f7 ("net: net_cls: move cgroupfs
classid handling into core"). net_cls cgroup part is mostly
dropped except for removal of init_netclassid_cgroup().
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Thomas Graf <tgraf@suug.ch>
|
|
To prepare to implement byte sized index for managing the freelist
of a slab, we should restrict the number of objects in a slab to be less
or equal to 256, since byte only represent 256 different values.
Setting the size of object to value equal or more than newly introduced
SLAB_OBJ_MIN_SIZE ensures that the number of objects in a slab is less or
equal to 256 for a slab with 1 page.
If page size is rather larger than 4096, above assumption would be wrong.
In this case, we would fall back on 2 bytes sized index.
If minimum size of kmalloc is less than 16, we use it as minimum object
size and give up this optimization.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
As sysfs was kernfs's only user, kernfs has been piggybacking on
CONFIG_SYSFS; however, kernfs is scheduled to grow a new user very
soon. Introduce a separate config option CONFIG_KERNFS which is to be
selected by kernfs users.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Currently, kobject is invoking kernfs_enable_ns() directly. This is
fine now as sysfs and kernfs are enabled and disabled together. If
sysfs is disabled, kernfs_enable_ns() is switched to dummy
implementation too and everything is fine; however, kernfs will soon
have its own config option CONFIG_KERNFS and !SYSFS && KERNFS will be
possible, which can make kobject call into non-dummy
kernfs_enable_ns() with NULL kernfs_node pointers leading to an oops.
Introduce sysfs_enable_ns() which is a wrapper around
kernfs_enable_ns() so that it can be made a noop depending only on
CONFIG_SYSFS regardless of the planned CONFIG_KERNFS.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
kernfs_node->parent and ->name are currently marked as "published"
indicating that kernfs users may access them directly; however, those
fields may get updated by kernfs_rename[_ns]() and unrestricted access
may lead to erroneous values or oops.
Protect ->parent and ->name updates with a irq-safe spinlock
kernfs_rename_lock and implement the following accessors for these
fields.
* kernfs_name() - format the node's name into the specified buffer
* kernfs_path() - format the node's path into the specified buffer
* pr_cont_kernfs_name() - pr_cont a node's name (doesn't need buffer)
* pr_cont_kernfs_path() - pr_cont a node's path (doesn't need buffer)
* kernfs_get_parent() - pin and return a node's parent
All can be called under any context. The recursive sysfs_pathname()
in fs/sysfs/dir.c is replaced with kernfs_path() and
sysfs_rename_dir_ns() is updated to use kernfs_get_parent() instead of
dereferencing parent directly.
v2: Dummy definition of kernfs_path() for !CONFIG_KERNFS was missing
static inline making it cause a lot of build warnings. Add it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
kernfs_rename()
Implement helpers to determine node from dentry and root from
super_block. Also add a kernfs_rename_ns() wrapper which assumes NULL
namespace. These generally make sense and will be used by cgroup.
v2: Some dummy implementations for !CONFIG_SYSFS was missing. Fixed.
Reported by kbuild test robot.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add a private data field to be used by kernfs file operations. This
generally makes sense and will be used by cgroup.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
A write to a kernfs_node is buffered through a kernel buffer. Writes
<= PAGE_SIZE are performed atomically, while larger ones are executed
in PAGE_SIZE chunks. While this is enough for sysfs, cgroup which is
scheduled to be converted to use kernfs needs a bit more control over
it.
This patch adds kernfs_ops->atomic_write_len. If not set (zero), the
behavior stays the same. If set, writes upto the size are executed
atomically and larger writes are rejected with -E2BIG.
A different implementation strategy would be allowing configuring
chunking size while making the original write size available to the
write method; however, such strategy, while being more complicated,
doesn't really buy anything. If the write implementation has to
handle chunking, the specific chunk size shouldn't matter all that
much.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Currently, kernfs_nodes are made visible to userland on creation,
which makes it difficult for kernfs users to atomically succeed or
fail creation of multiple nodes. In addition, if something fails
after creating some nodes, the created nodes might already be in use
and their active refs need to be drained for removal, which has the
potential to introduce tricky reverse locking dependency on active_ref
depending on how the error path is synchronized.
This patch introduces per-root flag KERNFS_ROOT_CREATE_DEACTIVATED.
If set, all nodes under the root are created in the deactivated state
and stay invisible to userland until explicitly enabled by the new
kernfs_activate() API. Also, nodes which have never been activated
are guaranteed to bypass draining on removal thus allowing error paths
to not worry about lockding dependency on active_ref draining.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add two super_block related syscall callbacks ->remount_fs() and
->show_options() to kernfs_syscall_ops. These simply forward the
matching super_operations.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We're gonna need non-dir syscall callbacks, which will make dir_ops a
misnomer. Let's rename kernfs_dir_ops to kernfs_syscall_ops.
This is pure rename.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
kernfs_dir_ops are currently being invoked without any active
reference, which makes it tricky for the invoked operations to
determine whether the objects associated those nodes are safe to
access and will remain that way for the duration of such operations.
kernfs already has active_ref mechanism to deal with this which makes
the removal of a given node the synchronization point for gating the
file operations. There's no reason for dir_ops to be any different.
Update the dir_ops handling so that active_ref is held while the
dir_ops are executing. This guarantees that while a dir_ops is
executing the target nodes stay alive.
As kernfs_dir_ops doesn't have any in-kernel user at this point, this
doesn't affect anybody.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
All device_schedule_callback_owner() users are converted to use
device_remove_file_self(). Remove now unused
{sysfs|device}_schedule_callback_owner().
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Sometimes it's necessary to implement a node which wants to delete
nodes including itself. This isn't straightforward because of kernfs
active reference. While a file operation is in progress, an active
reference is held and kernfs_remove() waits for all such references to
drain before completing. For a self-deleting node, this is a deadlock
as kernfs_remove() ends up waiting for an active reference that itself
is sitting on top of.
This currently is worked around in the sysfs layer using
sysfs_schedule_callback() which makes such removals asynchronous.
While it works, it's rather cumbersome and inherently breaks
synchronicity of the operation - the file operation which triggered
the operation may complete before the removal is finished (or even
started) and the removal may fail asynchronously. If a removal
operation is immmediately followed by another operation which expects
the specific name to be available (e.g. removal followed by rename
onto the same name), there's no way to make the latter operation
reliable.
The thing is there's no inherent reason for this to be asynchrnous.
All that's necessary to do this synchronous is a dedicated operation
which drops its own active ref and deactivates self. This patch
implements kernfs_remove_self() and its wrappers in sysfs and driver
core. kernfs_remove_self() is to be called from one of the file
operations, drops the active ref the task is holding, removes the self
node, and restores active ref to the dead node so that the ref is
balanced afterwards. __kernfs_remove() is updated so that it takes an
early exit if the target node is already fully removed so that the
active ref restored by kernfs_remove_self() after removal doesn't
confuse the deactivation path.
This makes implementing self-deleting nodes very easy. The normal
removal path doesn't even need to be changed to use
kernfs_remove_self() for the self-deleting node. The method can
invoke kernfs_remove_self() on itself before proceeding the normal
removal path. kernfs_remove() invoked on the node by the normal
deletion path will simply be ignored.
This will replace sysfs_schedule_callback(). A subtle feature of
sysfs_schedule_callback() is that it collapses multiple invocations -
even if multiple removals are triggered, the removal callback is run
only once. An equivalent effect can be achieved by testing the return
value of kernfs_remove_self() - only the one which gets %true return
value should proceed with actual deletion. All other instances of
kernfs_remove_self() will wait till the enclosing kernfs operation
which invoked the winning instance of kernfs_remove_self() finishes
and then return %false. This trivially makes all users of
kernfs_remove_self() automatically show correct synchronous behavior
even when there are multiple concurrent operations - all "echo 1 >
delete" instances will finish only after the whole operation is
completed by one of the instances.
Note that manipulation of active ref is implemented in separate public
functions - kernfs_[un]break_active_protection().
kernfs_remove_self() is the only user at the moment but this will be
used to cater to more complex cases.
v2: For !CONFIG_SYSFS, dummy version kernfs_remove_self() was missing
and sysfs_remove_file_self() had incorrect return type. Fix it.
Reported by kbuild test bot.
v3: kernfs_[un]break_active_protection() separated out from
kernfs_remove_self() and exposed as public API.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
KERNFS_REMOVED is used to mark half-initialized and dying nodes so
that they don't show up in lookups and deny adding new nodes under or
renaming it; however, its role overlaps that of deactivation.
It's necessary to deny addition of new children while removal is in
progress; however, this role considerably intersects with deactivation
- KERNFS_REMOVED prevents new children while deactivation prevents new
file operations. There's no reason to have them separate making
things more complex than necessary.
This patch removes KERNFS_REMOVED.
* Instead of KERNFS_REMOVED, each node now starts its life
deactivated. This means that we now use both atomic_add() and
atomic_sub() on KN_DEACTIVATED_BIAS, which is INT_MIN. The compiler
generates an overflow warnings when negating INT_MIN as the negation
can't be represented as a positive number. Nothing is actually
broken but let's bump BIAS by one to avoid the warnings for archs
which negates the subtrahend..
* A new helper kernfs_active() which tests whether kn->active >= 0 is
added for convenience and lockdep annotation. All KERNFS_REMOVED
tests are replaced with negated kernfs_active() tests.
* __kernfs_remove() is updated to deactivate, but not drain, all nodes
in the subtree instead of setting KERNFS_REMOVED. This removes
deactivation from kernfs_deactivate(), which is now renamed to
kernfs_drain().
* Sanity check on KERNFS_REMOVED in kernfs_put() is replaced with
checks on the active ref.
* Some comment style updates in the affected area.
v2: Reordered before removal path restructuring. kernfs_active()
dropped and kernfs_get/put_active() used instead. RB_EMPTY_NODE()
used in the lookup paths.
v3: Reverted most of v2 except for creating a new node with
KN_DEACTIVATED_BIAS.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There currently are two mechanisms gating active ref lockdep
annotations - KERNFS_LOCKDEP flag and KERNFS_ACTIVE_REF type mask.
The former disables lockdep annotations in kernfs_get/put_active()
while the latter disables all of kernfs_deactivate().
While KERNFS_ACTIVE_REF also behaves as an optimization to skip the
deactivation step for non-file nodes, the benefit is marginal and it
needlessly diverges code paths. Let's drop KERNFS_ACTIVE_REF.
While at it, add a test helper kernfs_lockdep() to test KERNFS_LOCKDEP
flag so that it's more convenient and the related code can be compiled
out when not enabled.
v2: Refreshed on top of ("kernfs: make kernfs_deactivate() honor
KERNFS_LOCKDEP flag"). As the earlier patch already added
KERNFS_LOCKDEP tests to kernfs_deactivate(), those additions are
dropped from this patch and the existing ones are simply converted
to kernfs_lockdep().
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
kernfs_addrm_cxt and the accompanying kernfs_addrm_start/finish() were
added because there were operations which should be performed outside
kernfs_mutex after adding and removing kernfs_nodes. The necessary
operations were recorded in kernfs_addrm_cxt and performed by
kernfs_addrm_finish(); however, after the recent changes which
relocated deactivation and unmapping so that they're performed
directly during removal, the only operation kernfs_addrm_finish()
performs is kernfs_put(), which can be moved inside the removal path
too.
This patch moves the kernfs_put() of the base ref to __kernfs_remove()
and remove kernfs_addrm_cxt and kernfs_addrm_start/finish().
* kernfs_add_one() is updated to grab and release kernfs_mutex itself.
sysfs_addrm_start/finish() invocations around it are removed from
all users.
* __kernfs_remove() puts an unlinked node directly instead of chaining
it to kernfs_addrm_cxt. Its callers are updated to grab and release
kernfs_mutex instead of calling kernfs_addrm_start/finish() around
it.
v2: Rebased on top of "kernfs: associate a new kernfs_node with its
parent on creation" which dropped @parent from kernfs_add_one().
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
kernfs_node->u.completion is used to notify deactivation completion
from kernfs_put_active() to kernfs_deactivate(). We now allow
multiple racing removals of the same node and the current removal
scheme is no longer correct - kernfs_remove() invocation may return
before the node is properly deactivated if it races against another
removal. The removal path will be restructured to address the issue.
To help such restructure which requires supporting multiple waiters,
this patch replaces kernfs_node->u.completion with
kernfs_root->deactivate_waitq. This makes deactivation event
notifications share a per-root waitqueue_head; however, the wait path
is quite cold and this will also allow shaving one pointer off
kernfs_node.
v2: Refreshed on top of ("kernfs: make kernfs_deactivate() honor
KERNFS_LOCKDEP flag").
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This allows replying only to the requestor portid while still
supporting broadcasting. Pass 0 to portid for the previous behavior.
Signed-off-by: David Fries <David@Fries.net>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
iovcnt is declared as a signed integer in both the userspace API and
as a local variable in mic_virtio.c. The while() loop in mic_virtio.c
iterates until the local variable iovcnt reaches the value 0. If
userspace passes e.g. INT_MIN as iovcnt field, this loop then appears
to depend on an undefined behavior (signed underflow) to complete.
The fix is to use unsigned integers in both the userspace API and
the local variable.
This issue was reported @ https://lkml.org/lkml/2014/1/10/10
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
As we implement Virtual Receive Side Scaling on the networking side
(the VRSS patches are currently under review), it will be useful to have
per-channel state that vmbus drivers can manage. Add support for
managing per-channel state.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The current channel code is using scatterlist abstraction to pass data to the
ringbuffer API on the send path. This causes unnecessary translations
between virtual and physical addresses. Fix this.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
On Gen2 firmware, Hyper-V does not emulate the PCI bus. However, the MMIO
information is packaged up in DSDT. Extract this information and export it
for use by the synthetic framebuffer driver. This is the only driver that
needs this currently.
In this version of the patch mmio, I have updated the hyperv header file
(linux/hyperv.h) with mmio definitions.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This patch adds the hyperv.h header to the uapi folder, and adds it to the Kbuild file.
Doing this enables compiling userspace Hyper-V tools using the installed headers.
Version 2: Split UAPI parts into new header, instead of duplicating.
Signed-off-by: Bjarke Istrup Pedersen <gurligebis@gentoo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This reverts commit 35773dac5f862cb1c82ea151eba3e2f6de51ec3e. It's a
hack that caused regressions in the usb-storage and userspace USB
drivers that use usbfs and libusb. Commit 70cabb7d992f "xhci 1.0: Limit
arbitrarily-aligned scatter gather." should fix the issues seen with the
ax88179_178a driver on xHCI 1.0 hosts, without causing regressions.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: stable@vger.kernel.org # 3.12
|
|
* Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit).
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
This is neede for proper SG_IO operation as well as various uses of
blk_execute_rq from the SCSI midlayer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|