linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2014-05-19	cgroup: disallow debug controller on the default hierarchy	Tejun Heo
	The debug controller, as its name suggests, exposes cgroup core internals to userland to aid debugging. Unfortunately, except for the name, there's no provision to prevent its usage in production configurations and the controller is widely enabled and mounted leaking internal details to userland. Like most other debug information, the information exposed by debug isn't interesting even for debugging itself once the related parts are working reliably. This controller has no reason for existing. This patch implements cgrp_dfl_root_inhibit_ss_mask which can suppress specific subsystems on the default hierarchy and adds the debug subsystem to it so that it can be gradually deprecated as usages move towards the unified hierarchy. Signed-off-by: Tejun Heo <tj@kernel.org>
2014-05-19	virtio_scsi: use cmd_size	Christoph Hellwig
	Taken almost entirely from Nicholas Bellinger's scsi-mq conversion. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-19	Merge branch 'for-3.16/blk-mq-tagging' into for-3.16/core	Jens Axboe
	Signed-off-by: Jens Axboe <axboe@fb.com> Conflicts: block/blk-mq-tag.c
2014-05-19	rcu: Provide API to suppress stall warnings while sysrc runs	Rik van Riel
	Some sysrq handlers can run for a long time, because they dump a lot of data onto a serial console. Having RCU stall warnings pop up in the middle of them only makes the problem worse. This commit provides rcu_sysrq_start() and rcu_sysrq_end() APIs to temporarily suppress RCU CPU stall warnings while a sysrq request is handled. Signed-off-by: Rik van Riel <riel@redhat.com> [ paulmck: Fix TINY_RCU build error. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-05-19	blk-mq: switch ctx pending map to the sparser blk_align_bitmap	Jens Axboe
	Each hardware queue has a bitmap of software queues with pending requests. When new IO is queued on a software queue, the bit is set, and when IO is pruned on a hardware queue run, the bit is cleared. This causes a lot of traffic. Switch this from the regular BITS_PER_LONG bitmap to a sparser layout, similarly to what was done for blk-mq tagging. 20% performance increase was observed for single threaded IO, and about 15% performanc increase on multiple threads driving the same device. Signed-off-by: Jens Axboe <axboe@fb.com>
2014-05-19	cfg80211: constify wowlan/coalesce mask/pattern pointers	Johannes Berg
	This requires changing the nl80211 parsing code a bit to use intermediate pointers for the allocation, but clarifies the API towards the drivers. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-19	cfg80211: constify more pointers in the cfg80211 API	Johannes Berg
	This also propagates through the drivers. The orinoco driver uses the cfg80211 API structs for internal bookkeeping, and so needs a (void *) cast that removes the const - but that's OK because it allocates those pointers. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-19	cfg80211: constify MAC addresses in cfg80211 ops	Johannes Berg
	This propagates through all the drivers and mac80211. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-19	clk: exynos5420: Add 5800 specific clocks	Alim Akhtar
	Exynos5800 clock structure is mostly similar to 5420 with only a small delta changes. So the 5420 clock file is re-used for 5800 also. The common clocks for both are seggreagated and few clocks which are different for both are separately initialized. Signed-off-by: Alim Akhtar <alim.akhtar@samsung.com> Signed-off-by: Arun Kumar K <arun.kk@samsung.com> Acked-by: Tomasz Figa <t.figa@samsung.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2014-05-19	perf: Fix perf_event_open(.flags) test	Peter Zijlstra
	Vince noticed that we test the (unsigned long) flags field against an (unsigned int) constant. This would allow setting the high bits on 64bit platforms and not get an error. There is nothing that uses the high bits, so it should be entirely harmless, but we don't want userspace to accidentally set them anyway, so fix the constants. Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140423102254.GL11096@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-19	perf: Fix a race between ring_buffer_detach() and ring_buffer_attach()	Peter Zijlstra
	Alexander noticed that we use RCU iteration on rb->event_list but do not use list_{add,del}_rcu() to add,remove entries to that list, nor do we observe proper grace periods when re-using the entries. Merge ring_buffer_detach() into ring_buffer_attach() such that attaching to the NULL buffer is detaching. Furthermore, ensure that between any 'detach' and 'attach' of the same event we observe the required grace period, but only when strictly required. In effect this means that only ioctl(.request = PERF_EVENT_IOC_SET_OUTPUT) will wait for a grace period, while the normal initial attach and final detach will not be delayed. This patch should, I think, do the right thing under all circumstances, the 'normal' cases all should never see the extra grace period, but the two cases: 1) PERF_EVENT_IOC_SET_OUTPUT on an event which already has a ring_buffer set, will now observe the required grace period between removing itself from the old and attaching itself to the new buffer. This case is 'simple' in that both buffers are present in perf_event_set_output() one could think an unconditional synchronize_rcu() would be sufficient; however... 2) an event that has a buffer attached, the buffer is destroyed (munmap) and then the event is attached to a new/different buffer using PERF_EVENT_IOC_SET_OUTPUT. This case is more complex because the buffer destruction does: ring_buffer_attach(.rb = NULL) followed by the ioctl() doing: ring_buffer_attach(.rb = foo); and we still need to observe the grace period between these two calls due to us reusing the event->rb_entry list_head. In order to make 2 happen we use Paul's latest cond_synchronize_rcu() call. Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140507123526.GD13658@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-19	mac80211: fix csa_counter_offs argument name in docbook	Luciano Coelho
	The csa_counter_offs was erroneously described as csa_offs in the docbook section. This fixes two warnings when making htmldocs (at least): Warning(include/net/mac80211.h:3428): No description found for parameter 'csa_counter_offs[IEEE80211_MAX_CSA_COUNTERS_NUM]' Warning(include/net/mac80211.h:3428): Excess struct/union/enum/typedef member 'csa_offs' description in 'ieee80211_mutable_offsets' Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-19	cfg80211: add documentation for max_num_csa_counters	Luciano Coelho
	Move the comment in the structure to a description of the max_num_csa_counters field in the docbook area. This fixes a warning when building htmldocs (at least): Warning(include/net/cfg80211.h:3064): No description found for parameter 'max_num_csa_counters' Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-19	scsi: reintroduce scsi_driver.init_command	Christoph Hellwig
	Instead of letting the ULD play games with the prep_fn move back to the model of a central prep_fn with a callback to the ULD. This already cleans up and shortens the code by itself, and will be required to properly support blk-mq in the SCSI midlayer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-05-19	netfilter: nf_tables: defer all object release via rcu	Pablo Neira Ayuso
	Now that all objects are released in the reverse order via the transaction infrastructure, we can enqueue the release via call_rcu to save one synchronize_rcu. For small rule-sets loaded via nft -f, it now takes around 50ms less here. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19	netfilter: nf_tables: remove skb and nlh from context structure	Pablo Neira Ayuso
	Instead of caching the original skbuff that contains the netlink messages, this stores the netlink message sequence number, the netlink portID and the report flag. This helps to prepare the introduction of the object release via call_rcu. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19	netfilter: nf_tables: use new transaction infrastructure to handle elements	Pablo Neira Ayuso
	Leave the set content in consistent state if we fail to load the batch. Use the new generic transaction infrastructure to achieve this. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19	netfilter: nf_tables: use new transaction infrastructure to handle table	Pablo Neira Ayuso
	This patch speeds up rule-set updates and it also provides a way to revert updates and leave things in consistent state in case that the batch needs to be aborted. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19	netfilter: nf_tables: use new transaction infrastructure to handle chain	Pablo Neira Ayuso
	This patch speeds up rule-set updates and it also introduces a way to revert chain updates if the batch is aborted. The idea is to store the changes in the transaction to apply that in the commit step. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19	netfilter: nf_tables: use new transaction infrastructure to handle sets	Pablo Neira Ayuso
	This patch reworks the nf_tables API so set updates are included in the same batch that contains rule updates. This speeds up rule-set updates since we skip a dialog of four messages between kernel and user-space (two on each direction), from: 1) create the set and send netlink message to the kernel 2) process the response from the kernel that contains the allocated name. 3) add the set elements and send netlink message to the kernel. 4) process the response from the kernel (to check for errors). To: 1) add the set to the batch. 2) add the set elements to the batch. 3) add the rule that points to the set. 4) send batch to the kernel. This also introduces an internal set ID (NFTA_SET_ID) that is unique in the batch so set elements and rules can refer to new sets. Backward compatibility has been only retained in userspace, this means that new nft versions can talk to the kernel both in the new and the old fashion. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19	netfilter: nf_tables: add message type to transactions	Pablo Neira Ayuso
	The patch adds message type to the transaction to simplify the commit the and abort routines. Yet another step forward in the generalisation of the transaction infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19	netfilter: nf_tables: generalise transaction infrastructure	Pablo Neira Ayuso
	This patch generalises the existing rule transaction infrastructure so it can be used to handle set, table and chain object transactions as well. The transaction provides a data area that stores private information depending on the transaction type. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19	netfilter: nf_tables: deconstify table and chain in context structure	Pablo Neira Ayuso
	The new transaction infrastructure updates the family, table and chain objects in the context structure, so let's deconstify them. While at it, move the context structure initialization routine to the top of the source file as it will be also used from the table and chain routines. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19	can: unify identifiers to ensure unique include processing	Oliver Hartkopp
	Armin pointed me to the fact that the identifier which is used to ensure the unique include processing in lunux/include/uapi/linux/can.h is CAN_H. This clashed with his own source as includes from libraries and APIs should use an underscore '_' at the identifier start. This patch fixes the protection identifiers in all CAN relavant includes. Reported-by: Armin Burchardt <armin@uni-bremen.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2014-05-19	can: add Renesas R-Car CAN driver	Sergei Shtylyov
	Add support for the CAN controller found in Renesas R-Car SoCs. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2014-05-18	Input: atmel_mxt_ts - implement CRC check for configuration data	Nick Dyer
	The configuration is stored in NVRAM on the maXTouch chip. When the device is reset it reports a CRC of the stored configuration values. Therefore it isn't necessary to send the configuration on each probe - we can check the CRC matches and avoid a timeconsuming backup/reset cycle. Signed-off-by: Nick Dyer <nick.dyer@itdev.co.uk> Acked-by: Benson Leung <bleung@chromium.org> Acked-by: Yufeng Shen <miletus@chromium.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2014-05-18	Input: atmel_mxt_ts - improve T19 GPIO keys handling	Nick Dyer
	* The mapping of the GPIO numbers into the T19 status byte varies between different maXTouch chips. Some have up to 7 GPIOs. Allowing a keycode array of up to 8 items is simpler and more generic. So replace #define with configurable number of keys which also allows the removal of is_tp. * Rename platform data parameters to include "t19" to prevent confusion with T15 key array. * Probe aborts early on when pdata is NULL, so no need to check. * Move "int i" to beginning of function (mixed declarations and code) * Use API calls rather than __set_bit() * Remove unused dev variable. Signed-off-by: Nick Dyer <nick.dyer@itdev.co.uk> Acked-by: Yufeng Shen <miletus@chromium.org> Reviewed-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2014-05-18	Input: atmel_mxt_ts - remove unnecessary platform data	Nick Dyer
	It is not necessary to download these values to the maXTouch chip on every probe, since they are stored in NVRAM. It makes life difficult when tuning the device to keep them in sync with the config array/file, and requires a new kernel build for minor tweaks. These parameters only represent a tiny subset of the available configuration options, tracking all of these options in platform data would be a endless task. In addition, different versions of maXTouch chips may have these values in different places or may not even have them at all. Having these values also makes life more complex for device tree and other platforms where having to define a static configuration isn't helpful. Signed-off-by: Nick Dyer <nick.dyer@itdev.co.uk> Acked-by: Benson Leung <bleung@chromium.org> Acked-by: Yufeng Shen <miletus@chromium.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2014-05-18	Input: pixcir_i2c_ts - get rid of pdata->attb_read_val()	Roger Quadros
	Get rid of the attb_read_val() platform hook. Instead, read the ATTB gpio directly from the driver. Fail if valid ATTB gpio is not provided by patform data. Signed-off-by: Roger Quadros <rogerq@ti.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2014-05-18	Input: pixcir_i2c_ts - initialize interrupt mode and power mode	Roger Quadros
	Introduce helper functions to configure power and interrupt registers. Default to IDLE mode on probe as device supports auto wakeup to ACVIE mode on detecting finger touch. Configure interrupt mode and polarity on start up. Power down on device closure or module removal. Signed-off-by: Roger Quadros <rogerq@ti.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2014-05-18	sparc: fix sparse warnings in smp_32.c + smp_64.c	Sam Ravnborg
	Fix following warnings: smp_32.c:177:5: warning: symbol 'setup_profiling_timer' was not declared. Should it be static? smp_64.c:1202:5: warning: symbol 'setup_profiling_timer' was not declared. Should it be static? smp_64.c:989:6: warning: symbol 'kgdb_roundup_cpus' was not declared. Should it be static? Add prototype to include/linux/profile.h of setup_profiling_timer Add missing include to smp_64.c Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-19	irqchip: gic: Use mask field in GICC_IAR	Haojian Zhuang
	Bit[9:0] is interrupt ID field in GICC_IAR. Bit[12:10] is CPU ID field, and others are reserved. So we should use GICC_IAR_INT_ID_MASK to get interrupt ID. It's not a good way to use ~0x1c00 (CPU ID field) to get interrupt ID. Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org> Link: https://lkml.kernel.org/r/1399795571-17231-3-git-send-email-haojian.zhuang@linaro.org Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2014-05-19	ethtool, be2net: constify array pointer parameters to ethtool_ops::set_rxfh	Ben Hutchings
	Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-05-19	ethtool: Disallow ETHTOOL_SRSSH with both indir table and hash key unchanged	Ben Hutchings
	This would be a no-op, so there is no reason to request it. This also allows conversion of the current implementations of ethtool_ops::{get,set}_rxfh_indir to ethtool_ops::{get,set}_rxfh with no change other than their parameters. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-05-19	ethtool: Expand documentation of ethtool_ops::{get,set}_rxfh()	Ben Hutchings
	Some corner-cases are not explained properly. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-05-19	ethtool: Improve explanation of the two arrays following struct ethtool_rxfh	Ben Hutchings
	The use of two variable-length arrays is unusual so deserves a bit more explanation. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-05-19	ethtool: Name the 'no change' value for setting RSS hash key but not indir table	Ben Hutchings
	We usually allocate special values of u32 fields starting from the top down, so also change the value to 0xffffffff. As these operations haven't been included in a stable release yet, it's not too late to change. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-05-19	virtio-rng: fixes for device registration/unregistration	Sasha Levin
	There are several fixes in this patch (mostly because it's hard splitting them up): - Revert the name field in struct hwrng back to 'const'. Also, don't do an extra kmalloc for the name - just wasteful. - Deal with allocation failures properly. - Use IDA to allocate device number instead of brute forcing one. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-05-19	Merge tag 'drm-intel-next-2014-05-06' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm-intel into drm-next - ring init improvements (Chris) - vebox2 support (Zhao Yakui) - more prep work for runtime pm on Baytrail (Imre) - eDram support for BDW (Ben) - prep work for userptr support (Chris) - first parts of the encoder->mode_set callback removal (Daniel) - 64b reloc fixes (Ben) - first part of atomic plane updates (Ville) * tag 'drm-intel-next-2014-05-06' of git://anongit.freedesktop.org/drm-intel: (75 commits) drm/i915: Remove useless checks from primary enable/disable drm/i915: Merge LP1+ watermarks in safer way drm/i915: Make sure computed watermarks never overflow the registers drm/i915: Add pipe update trace points drm/i915: Perform primary enable/disable atomically with sprite updates drm/i915: Make sprite updates atomic drm/i915: Support 64b relocations drm/i915: Support 64b execbuf drm/i915/sdvo: Remove ->mode_set callback drm/i915/crt: Remove ->mode_set callback drm/i915/tv: Remove ->mode_set callback drm/i915/tv: Rip out pipe-disabling nonsense from ->mode_set drm/i915/tv: De-magic device check drm/i915/tv: extract set_color_conversion drm/i915/tv: extract set_tv_mode_timings drm/i915/dvo: Remove ->mode_set callback drm/i915: Make encoder->mode_set callbacks optional drm/i915: Make primary_enabled match the actual hardware state drm/i915: Move ring_begin to signal() drm/i915: Virtualize the ringbuffer signal func ...
2014-05-16	net: cdc_ncm: remove redundant "disconnected" flag	Bjørn Mork
	Calling netif_carrier_{on,off} is sufficient. There is no need to duplicate the carrier state in a driver specific flag. Acked-by: Enrico Mioso <mrkiko.rs@gmail.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16	net: cdc_ncm: use sane defaults for rx/tx buffers	Bjørn Mork
	Lots of devices request much larger buffers than reasonable. This cause real problems for users of hosts with limited resources. Reducing the default buffer size to 16kB for such devices is a reasonable trade-off between allowing them to aggregate traffic and avoiding memory exhaustion on resource restrained hosts. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16	net: cdc_ncm/cdc_mbim: adding NCM protocol statistics	Bjørn Mork
	To have an idea of the effects of the protocol coalescing it's useful to have some counters showing the different aspects. Due to the asymmetrical usbnet interface the netdev rx_bytes counter has been counting real received payload, while the tx_bytes counter has included the NCM/MBIM framing overhead. This overhead can be many times the payload because of the aggressive padding strategy of this driver, and will vary a lot depending on device and traffic. With very few exceptions, users are only interested in the payload size. Having an somewhat accurate payload byte counter is particularly important for mobile broadband devices, which many NCM devices and of course all MBIM devices are. Users and userspace applications will use this counter to monitor account quotas. Having protocol specific counters for the overhead, we are now able to correct the tx_bytes netdev counter so that it shows the real payload Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16	net: cdc_ncm: set reasonable padding limits	Bjørn Mork
	We pad frames larger than X to maximum size for devices which don't need a ZLP after maximum sized frames. This allows the device to optimize its transfers for one fixed buffer size. X was arbitrarily set at 512 bytes regardless of real buffer maximum, causing extreme overheads due to excessive padding of larger tx buffers. Limit the padding to at most 3 full USB packets, still allowing the overhead to payload ratio of 3/1. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16	net: cdc_ncm: use true max dgram count for header estimates	Bjørn Mork
	Many newer NCM and MBIM devices will request a maximum tx datagram count which is much smaller than our hard-coded absolute max. We can reduce the overhead without sacrificing any of the simplicity for these devices, by simply using the true negotiated count in when calculated the maximum NTH and NDP header sizes. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16	net: cdc_ncm: use ethtool to tune coalescing settings	Bjørn Mork
	Datagram coalescing is an integral part of the NCM and MBIM protocols, intended to reduce the interrupt load primarily on the device end of the USB link. As with all coalescing solutions, there is a trade-off between buffering and interrupts. The current defaults are based on the assumption that device side buffers should be the limiting factor. However, many modern high speed LTE modems suffers from buffer-bloat, making this assumption fail. This results in sub-optimal performance due to excessive coalescing. And in cases where such modems are connected to cheap embedded hosts there is often severe buffer allocation issues, giving very noticeable performance degradation . A start on improving this is going from build time hard coded limits to per device user configurable limits. The ethtool coalescing API was selected as user interface because, although the tuned values are buffer sizes, these settings directly control datagram coalescing. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16	bonding: Fix stacked device detection in arp monitoring	Vlad Yasevich
	Prior to commit fbd929f2dce460456807a51e18d623db3db9f077 bonding: support QinQ for bond arp interval the arp monitoring code allowed for proper detection of devices stacked on top of vlans. Since the above commit, the code can still detect a device stacked on top of single vlan, but not a device stacked on top of Q-in-Q configuration. The search will only set the inner vlan tag if the route device is the vlan device. However, this is not always the case, as it is possible to extend the stacked configuration. With this patch it is possible to provision devices on top Q-in-Q vlan configuration that should be used as a source of ARP monitoring information. For example: ip link add link bond0 vlan10 type vlan proto 802.1q id 10 ip link add link vlan10 vlan100 type vlan proto 802.1q id 100 ip link add link vlan100 type macvlan Note: This patch limites the number of stacked VLANs to 2, just like before. The original, however had another issue in that if we had more then 2 levels of VLANs, we would end up generating incorrectly tagged traffic. This is no longer possible. Fixes: fbd929f2dce460456807a51e18d623db3db9f077 (bonding: support QinQ for bond arp interval) CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@redhat.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Ding Tianhong <dingtianhong@huawei.com> CC: Patric McHardy <kaber@trash.net> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16	macvlan: Fix lockdep warnings with stacked macvlan devices	Vlad Yasevich
	Macvlan devices try to avoid stacking, but that's not always successfull or even desired. As an example, the following configuration is perefectly legal and valid: eth0 <--- macvlan0 <---- vlan0.10 <--- macvlan1 However, this configuration produces the following lockdep trace: [ 115.620418] ====================================================== [ 115.620477] [ INFO: possible circular locking dependency detected ] [ 115.620516] 3.15.0-rc1+ #24 Not tainted [ 115.620540] ------------------------------------------------------- [ 115.620577] ip/1704 is trying to acquire lock: [ 115.620604] (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80 [ 115.620686] but task is already holding lock: [ 115.620723] (&macvlan_netdev_addr_lock_key){+.....}, at: [<ffffffff815da5be>] dev_set_rx_mode+0x1e/0x40 [ 115.620795] which lock already depends on the new lock. [ 115.620853] the existing dependency chain (in reverse order) is: [ 115.620894] -> #1 (&macvlan_netdev_addr_lock_key){+.....}: [ 115.620935] [<ffffffff810d57f2>] lock_acquire+0xa2/0x130 [ 115.620974] [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50 [ 115.621019] [<ffffffffa07296c3>] vlan_dev_set_rx_mode+0x53/0x110 [8021q] [ 115.621066] [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0 [ 115.621105] [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40 [ 115.621143] [<ffffffff815da6be>] __dev_open+0xde/0x140 [ 115.621174] [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170 [ 115.621174] [<ffffffff815daaa9>] dev_change_flags+0x29/0x60 [ 115.621174] [<ffffffff815e7f11>] do_setlink+0x321/0x9a0 [ 115.621174] [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730 [ 115.621174] [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250 [ 115.621174] [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0 [ 115.621174] [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40 [ 115.621174] [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0 [ 115.621174] [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740 [ 115.621174] [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0 [ 115.621174] [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380 [ 115.621174] [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80 [ 115.621174] [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20 [ 115.621174] [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b [ 115.621174] -> #0 (&vlan_netdev_addr_lock_key/1){+.....}: [ 115.621174] [<ffffffff810d4d43>] __lock_acquire+0x1773/0x1a60 [ 115.621174] [<ffffffff810d57f2>] lock_acquire+0xa2/0x130 [ 115.621174] [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50 [ 115.621174] [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80 [ 115.621174] [<ffffffffa0696d2a>] macvlan_set_mac_lists+0xca/0x110 [macvlan] [ 115.621174] [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0 [ 115.621174] [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40 [ 115.621174] [<ffffffff815da6be>] __dev_open+0xde/0x140 [ 115.621174] [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170 [ 115.621174] [<ffffffff815daaa9>] dev_change_flags+0x29/0x60 [ 115.621174] [<ffffffff815e7f11>] do_setlink+0x321/0x9a0 [ 115.621174] [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730 [ 115.621174] [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250 [ 115.621174] [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0 [ 115.621174] [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40 [ 115.621174] [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0 [ 115.621174] [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740 [ 115.621174] [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0 [ 115.621174] [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380 [ 115.621174] [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80 [ 115.621174] [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20 [ 115.621174] [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b [ 115.621174] other info that might help us debug this: [ 115.621174] Possible unsafe locking scenario: [ 115.621174] CPU0 CPU1 [ 115.621174] ---- ---- [ 115.621174] lock(&macvlan_netdev_addr_lock_key); [ 115.621174] lock(&vlan_netdev_addr_lock_key/1); [ 115.621174] lock(&macvlan_netdev_addr_lock_key); [ 115.621174] lock(&vlan_netdev_addr_lock_key/1); [ 115.621174] * DEADLOCK * [ 115.621174] 2 locks held by ip/1704: [ 115.621174] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff815e6dbb>] rtnetlink_rcv+0x1b/0x40 [ 115.621174] #1: (&macvlan_netdev_addr_lock_key){+.....}, at: [<ffffffff815da5be>] dev_set_rx_mode+0x1e/0x40 [ 115.621174] stack backtrace: [ 115.621174] CPU: 3 PID: 1704 Comm: ip Not tainted 3.15.0-rc1+ #24 [ 115.621174] Hardware name: Hewlett-Packard HP xw8400 Workstation/0A08h, BIOS 786D5 v02.38 10/25/2010 [ 115.621174] ffffffff82339ae0 ffff880465f79568 ffffffff816ee20c ffffffff82339ae0 [ 115.621174] ffff880465f795a8 ffffffff816e9e1b ffff880465f79600 ffff880465b019c8 [ 115.621174] 0000000000000001 0000000000000002 ffff880465b019c8 ffff880465b01230 [ 115.621174] Call Trace: [ 115.621174] [<ffffffff816ee20c>] dump_stack+0x4d/0x66 [ 115.621174] [<ffffffff816e9e1b>] print_circular_bug+0x200/0x20e [ 115.621174] [<ffffffff810d4d43>] __lock_acquire+0x1773/0x1a60 [ 115.621174] [<ffffffff810d3172>] ? trace_hardirqs_on_caller+0xb2/0x1d0 [ 115.621174] [<ffffffff810d57f2>] lock_acquire+0xa2/0x130 [ 115.621174] [<ffffffff815df49c>] ? dev_uc_sync+0x3c/0x80 [ 115.621174] [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50 [ 115.621174] [<ffffffff815df49c>] ? dev_uc_sync+0x3c/0x80 [ 115.621174] [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80 [ 115.621174] [<ffffffffa0696d2a>] macvlan_set_mac_lists+0xca/0x110 [macvlan] [ 115.621174] [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0 [ 115.621174] [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40 [ 115.621174] [<ffffffff815da6be>] __dev_open+0xde/0x140 [ 115.621174] [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170 [ 115.621174] [<ffffffff815daaa9>] dev_change_flags+0x29/0x60 [ 115.621174] [<ffffffff811e1db1>] ? mem_cgroup_bad_page_check+0x21/0x30 [ 115.621174] [<ffffffff815e7f11>] do_setlink+0x321/0x9a0 [ 115.621174] [<ffffffff810d394c>] ? __lock_acquire+0x37c/0x1a60 [ 115.621174] [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730 [ 115.621174] [<ffffffff815ea169>] ? rtnl_newlink+0xe9/0x730 [ 115.621174] [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250 [ 115.621174] [<ffffffff810d329d>] ? trace_hardirqs_on+0xd/0x10 [ 115.621174] [<ffffffff815e6dbb>] ? rtnetlink_rcv+0x1b/0x40 [ 115.621174] [<ffffffff815e6de0>] ? rtnetlink_rcv+0x40/0x40 [ 115.621174] [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0 [ 115.621174] [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40 [ 115.621174] [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0 [ 115.621174] [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740 [ 115.621174] [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0 [ 115.621174] [<ffffffff8119d4af>] ? might_fault+0x5f/0xb0 [ 115.621174] [<ffffffff8119d4f8>] ? might_fault+0xa8/0xb0 [ 115.621174] [<ffffffff8119d4af>] ? might_fault+0x5f/0xb0 [ 115.621174] [<ffffffff815cb51e>] ? verify_iovec+0x5e/0xe0 [ 115.621174] [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380 [ 115.621174] [<ffffffff816faa0d>] ? __do_page_fault+0x11d/0x570 [ 115.621174] [<ffffffff810cfe9f>] ? up_read+0x1f/0x40 [ 115.621174] [<ffffffff816fab04>] ? __do_page_fault+0x214/0x570 [ 115.621174] [<ffffffff8120a10b>] ? mntput_no_expire+0x6b/0x1c0 [ 115.621174] [<ffffffff8120a0b7>] ? mntput_no_expire+0x17/0x1c0 [ 115.621174] [<ffffffff8120a284>] ? mntput+0x24/0x40 [ 115.621174] [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80 [ 115.621174] [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20 [ 115.621174] [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b Fix this by correctly providing macvlan lockdep class. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16	vlan: Fix lockdep warning with stacked vlan devices.	Vlad Yasevich
	This reverts commit dc8eaaa006350d24030502a4521542e74b5cb39f. vlan: Fix lockdep warning when vlan dev handle notification Instead we use the new new API to find the lock subclass of our vlan device. This way we can support configurations where vlans are interspersed with other devices: bond -> vlan -> macvlan -> vlan Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16	net: Allow for more then a single subclass for netif_addr_lock	Vlad Yasevich
	Currently netif_addr_lock_nested assumes that there can be only a single nesting level between 2 devices. However, if we have multiple devices of the same type stacked, this fails. For example: eth0 <-- vlan0.10 <-- vlan0.10.20 A more complicated configuration may stack more then one type of device in different order. Ex: eth0 <-- vlan0.10 <-- macvlan0 <-- vlan1.10.20 <-- macvlan1 This patch adds an ndo_* function that allows each stackable device to report its nesting level. If the device doesn't provide this function default subclass of 1 is used. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16	net: Find the nesting level of a given device by type.	Vlad Yasevich
	Multiple devices in the kernel can be stacked/nested and they need to know their nesting level for the purposes of lockdep. This patch provides a generic function that determines a nesting level of a particular device by its type (ex: vlan, macvlan, etc). We only care about nesting of the same type of devices. For example: eth0 <- vlan0.10 <- macvlan0 <- vlan1.20 The nesting level of vlan1.20 would be 1, since there is another vlan in the stack under it. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>