aboutsummaryrefslogtreecommitdiff
path: root/net/core
AgeCommit message (Collapse)Author
2012-04-02net: fix napi_reuse_skb() skb reserveEric Dumazet
[ Upstream commit 2a2a459eeeff48640dc557548ce576d666ab06ed ] napi->skb is allocated in napi_get_frags() using netdev_alloc_skb_ip_align(), with a reserve of NET_SKB_PAD + NET_IP_ALIGN bytes. However, when such skb is recycled in napi_reuse_skb(), it ends with a reserve of NET_IP_ALIGN which is suboptimal. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02rtnetlink: Fix VF IFLA policyGreg Rose
commit 48752f6513012a1b078da08b145d5c40a644f058 upstream. Add VF spoof check to IFLA policy. The original patch I submitted to add the spoof checking feature to rtnl failed to add the proper policy rule that identifies the data type and len. This patch corrects that oversight. No bugs have been reported against this but it may cause some problem for the netlink message parsing that uses the policy table. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-04rtnetlink: fix rtnl_calcit() and rtnl_dump_ifinfo()Eric Dumazet
nlmsg_parse() might return an error, so test its return value before potential random memory accesses. Errors introduced in commit 115c9b81928 (rtnetlink: Fix problem with buffer allocation) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21rtnetlink: Fix problem with buffer allocationGreg Rose
Implement a new netlink attribute type IFLA_EXT_MASK. The mask is a 32 bit value that can be used to indicate to the kernel that certain extended ifinfo values are requested by the user application. At this time the only mask value defined is RTEXT_FILTER_VF to indicate that the user wants the ifinfo dump to send information about the VFs belonging to the interface. This patch fixes a bug in which certain applications do not have large enough buffers to accommodate the extra information returned by the kernel with large numbers of SR-IOV virtual functions. Those applications will not send the new netlink attribute with the interface info dump request netlink messages so they will not get unexpectedly large request buffers returned by the kernel. Modifies the rtnl_calcit function to traverse the list of net devices and compute the minimum buffer size that can hold the info dumps of all matching devices based upon the filter passed in via the new netlink attribute filter mask. If no filter mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE. With this change it is possible to add yet to be defined netlink attributes to the dump request which should make it fairly extensible in the future. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21neighbour: Fixed race condition at tbl->nhtMichel Machado
When the fixed race condition happens: 1. While function neigh_periodic_work scans the neighbor hash table pointed by field tbl->nht, it unlocks and locks tbl->lock between buckets in order to call cond_resched. 2. Assume that function neigh_periodic_work calls cond_resched, that is, the lock tbl->lock is available, and function neigh_hash_grow runs. 3. Once function neigh_hash_grow finishes, and RCU calls neigh_hash_free_rcu, the original struct neigh_hash_table that function neigh_periodic_work was using doesn't exist anymore. 4. Once back at neigh_periodic_work, whenever the old struct neigh_hash_table is accessed, things can go badly. Signed-off-by: Michel Machado <michel@digirati.com.br> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14netpoll: netpoll_poll_dev() should access dev->flagsEric Dumazet
commit 5a698af53f (bond: service netpoll arp queue on master device) tested IFF_SLAVE flag against dev->priv_flags instead of dev->flags Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: WANG Cong <amwang@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10netprio_cgroup: fix wrong memory access when NETPRIO_CGROUP=mNeil Horman
When the netprio_cgroup module is not loaded, net_prio_subsys_id is -1, and so sock_update_prioidx() accesses cgroup_subsys array with negative index subsys[-1]. Make the code resembles cls_cgroup code, which is bug free. Origionally-authored-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10netprio_cgroup: don't allocate prio table when a device is registeredNeil Horman
So we delay the allocation till the priority is set through cgroup, and this makes skb_update_priority() faster when it's not set. This also eliminates an off-by-one bug similar with the one fixed in the previous patch. Origionally-authored-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10netprio_cgroup: fix an off-by-one bugNeil Horman
# mount -t cgroup xxx /mnt # mkdir /mnt/tmp # cat /mnt/tmp/net_prio.ifpriomap lo 0 eth0 0 virbr0 0 # echo 'lo 999' > /mnt/tmp/net_prio.ifpriomap # cat /mnt/tmp/net_prio.ifpriomap lo 999 eth0 0 virbr0 4101267344 We got weired output, because we exceeded the boundary of the array. We may even crash the kernel.. Origionally-authored-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-08gro: more generic L2 header checkEric Dumazet
Shlomo Pongratz reported GRO L2 header check was suited for Ethernet only, and failed on IB/ipoib traffic. He provided a patch faking a zeroed header to let GRO aggregates frames. Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header check to be more generic, ie not assuming L2 header is 14 bytes, but taking into account hard_header_len. __napi_gro_receive() has special handling for the common case (Ethernet) to avoid a memcmp() call and use an inline optimized function instead. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Shlomo Pongratz <shlomop@mellanox.com> Cc: Roland Dreier <roland@kernel.org> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-04netprio_cgroup: Fix obo in get_prioidxNeil Horman
It was recently pointed out to me that the get_prioidx function sets a bit in the prioidx map prior to checking to see if the index being set is out of bounds. This patch corrects that, avoiding the possiblity of us writing beyond the end of the array Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Stanislaw Gruszka <sgruszka@redhat.com> CC: Stanislaw Gruszka <sgruszka@redhat.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-01ethtool: Null-terminate filename passed to ethtool_ops::flash_deviceBen Hutchings
The parameters for ETHTOOL_FLASHDEV include a filename, which ought to be null-terminated. Currently the only driver that implements ethtool_ops::flash_device attempts to add a null terminator if necessary, but does it wrongly. Do it in the ethtool core instead. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-26net: RTNETLINK adjusting values of min_ifinfo_dump_sizeStefan Gula
Setting link parameters on a netdevice changes the value of if_nlmsg_size(), therefore it is necessary to recalculate min_ifinfo_dump_size. Signed-off-by: Stefan Gula <steweg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-26netns: fix net_alloc_generic()Eric Dumazet
When a new net namespace is created, we should attach to it a "struct net_generic" with enough slots (even empty), or we can hit the following BUG_ON() : [ 200.752016] kernel BUG at include/net/netns/generic.h:40! ... [ 200.752016] [<ffffffff825c3cea>] ? get_cfcnfg+0x3a/0x180 [ 200.752016] [<ffffffff821cf0b0>] ? lockdep_rtnl_is_held+0x10/0x20 [ 200.752016] [<ffffffff825c41be>] caif_device_notify+0x2e/0x530 [ 200.752016] [<ffffffff810d61b7>] notifier_call_chain+0x67/0x110 [ 200.752016] [<ffffffff810d67c1>] raw_notifier_call_chain+0x11/0x20 [ 200.752016] [<ffffffff821bae82>] call_netdevice_notifiers+0x32/0x60 [ 200.752016] [<ffffffff821c2b26>] register_netdevice+0x196/0x300 [ 200.752016] [<ffffffff821c2ca9>] register_netdev+0x19/0x30 [ 200.752016] [<ffffffff81c1c67a>] loopback_net_init+0x4a/0xa0 [ 200.752016] [<ffffffff821b5e62>] ops_init+0x42/0x180 [ 200.752016] [<ffffffff821b600b>] setup_net+0x6b/0x100 [ 200.752016] [<ffffffff821b6466>] copy_net_ns+0x86/0x110 [ 200.752016] [<ffffffff810d5789>] create_new_namespaces+0xd9/0x190 net_alloc_generic() should take into account the maximum index into the ptr array, as a subsystem might use net_generic() anytime. This also reduces number of reallocations in net_assign_generic() Reported-by: Sasha Levin <levinsasha928@gmail.com> Tested-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Sjur Brændeland <sjur.brandeland@stericsson.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-24net: flow_dissector.c missing include linux/export.hJesper Dangaard Brouer
The file net/core/flow_dissector.c seems to be missing including linux/export.h. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22ethtool: allow ETHTOOL_GSSET_INFO for usersMichał Mirosław
Allow ETHTOOL_GSSET_INFO ethtool ioctl() for unprivileged users. ETHTOOL_GSTRINGS is already allowed, but is unusable without this one. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22net: introduce res_counter_charge_nofail() for socket allocationsGlauber Costa
There is a case in __sk_mem_schedule(), where an allocation is beyond the maximum, but yet we are allowed to proceed. It happens under the following condition: sk->sk_wmem_queued + size >= sk->sk_sndbuf The network code won't revert the allocation in this case, meaning that at some point later it'll try to do it. Since this is never communicated to the underlying res_counter code, there is an inbalance in res_counter uncharge operation. I see two ways of fixing this: 1) storing the information about those allocations somewhere in memcg, and then deducting from that first, before we start draining the res_counter, 2) providing a slightly different allocation function for the res_counter, that matches the original behavior of the network code more closely. I decided to go for #2 here, believing it to be more elegant, since #1 would require us to do basically that, but in a more obscure way. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> CC: Tejun Heo <tj@kernel.org> CC: Li Zefan <lizf@cn.fujitsu.com> CC: Laurent Chavey <chavey@google.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22pktgen: Fix unsigned function that is returning negative valsPaul Gortmaker
Every call to num_args() immediately checks the return value for less than zero, as it will return -EFAULT for a failed get_user() call. So it makes no sense for the function to be declared as an unsigned long. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits) tg3: Fix single-vector MSI-X code openvswitch: Fix multipart datapath dumps. ipv6: fix per device IP snmp counters inetpeer: initialize ->redirect_genid in inet_getpeer() net: fix NULL-deref in WARN() in skb_gso_segment() net: WARN if skb_checksum_help() is called on skb requiring segmentation caif: Remove bad WARN_ON in caif_dev caif: Fix typo in Vendor/Product-ID for CAIF modems bnx2x: Disable AN KR work-around for BCM57810 bnx2x: Remove AutoGrEEEn for BCM84833 bnx2x: Remove 100Mb force speed for BCM84833 bnx2x: Fix PFC setting on BCM57840 bnx2x: Fix Super-Isolate mode for BCM84833 net: fix some sparse errors net: kill duplicate included header net: sh-eth: Fix build error by the value which is not defined net: Use device model to get driver name in skb_gso_segment() bridge: BH already disabled in br_fdb_cleanup() net: move sock_update_memcg outside of CONFIG_INET mwl8k: Fixing Sparse ENDIAN CHECK warning ...
2012-01-17net: fix NULL-deref in WARN() in skb_gso_segment()Michał Mirosław
Bug was introduced in commit c8f44affb7244f2ac3e703cab13d55ede27621bb. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-17net: WARN if skb_checksum_help() is called on skb requiring segmentationBen Hutchings
skb_checksum_help() has never done anything useful with skbs that require segmentation. Setting skb->ip_summed = CHECKSUM_NONE makes them invalid and provokes a later WARNing in skb_gso_segment(). Passing such an skb to skb_checksum_help() indicates a bug, so we should warn about it immediately. Move the warning from skb_gso_segment() into a shared function, and add gso_type and gso_size to it. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-17net: fix some sparse errorsEric Dumazet
make C=2 CF="-D__CHECK_ENDIAN__" M=net And fix flowi4_init_output() prototype for sport Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-17net: Use device model to get driver name in skb_gso_segment()Ben Hutchings
ethtool operations generally require the caller to hold RTNL and are not safe to call in atomic context. The device model provides this information for most devices; we'll only lose it for some old ISA drivers. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-17bql: Fix inconsistency between file mode and attr method.Hiroaki SHIMODA
There is no store() method for inflight attribute in the tx-<n>/byte_queue_limits sysfs directory. So remove S_IWUSR bit. Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-14Merge branch 'for-linus' of git://selinuxproject.org/~jmorris/linux-securityLinus Torvalds
* 'for-linus' of git://selinuxproject.org/~jmorris/linux-security: capabilities: remove __cap_full_set definition security: remove the security_netlink_recv hook as it is equivalent to capable() ptrace: do not audit capability check when outputing /proc/pid/stat capabilities: remove task_ns_* functions capabitlies: ns_capable can use the cap helpers rather than lsm call capabilities: style only - move capable below ns_capable capabilites: introduce new has_ns_capabilities_noaudit capabilities: call has_ns_capability from has_capability capabilities: remove all _real_ interfaces capabilities: introduce security_capable_noaudit capabilities: reverse arguments to security_capable capabilities: remove the task from capable LSM hook entirely selinux: sparse fix: fix several warnings in the security server cod selinux: sparse fix: fix warnings in netlink code selinux: sparse fix: eliminate warnings for selinuxfs selinux: sparse fix: declare selinux_disable() in security.h selinux: sparse fix: move selinux_complete_init selinux: sparse fix: make selinux_secmark_refcount static SELinux: Fix RCU deref check warning in sel_netport_insert() Manually fix up a semantic mis-merge wrt security_netlink_recv(): - the interface was removed in commit fd7784615248 ("security: remove the security_netlink_recv hook as it is equivalent to capable()") - a new user of it appeared in commit a38f7907b926 ("crypto: Add userspace configuration API") causing no automatic merge conflict, but Eric Paris pointed out the issue.
2012-01-12net: reintroduce missing rcu_assign_pointer() callsEric Dumazet
commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER) did a lot of incorrect changes, since it did a complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x, y). We miss needed barriers, even on x86, when y is not NULL. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-09net: Fix build with INET disabled.David S. Miller
> net/core/sock.c: In function 'sk_update_clone': > net/core/sock.c:1278:3: error: implicit declaration of function 'sock_update_memcg' Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-09net: introduce netif_addr_lock_nested() and call if when appropriateJiri Pirko
dev_uc_sync() and dev_mc_sync() are acquiring netif_addr_lock for destination device of synchronization. Since netif_addr_lock is already held at the time for source device, this triggers lockdep deadlock warning. There's no way this deadlock can happen so use spin_lock_nested() to silence the warning. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-09net: correct lock name in dev_[uc/mc]_sync documentations.Jiri Pirko
Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-08net: sk_update_clone is only used in net/core/sock.cStephen Rothwell
so move it there. Fixes build errors when CONFIG_INET is not defined: In file included from include/linux/tcp.h:211:0, from include/linux/ipv6.h:221, from include/net/ipv6.h:16, from include/linux/sunrpc/clnt.h:26, from include/linux/nfs_fs.h:50, from init/do_mounts.c:20: include/net/sock.h: In function 'sk_update_clone': include/net/sock.h:1109:3: error: implicit declaration of function 'sock_update_memcg' [-Werror=implicit-function-declaration] Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-07pktgen: set correct max and min in pktgen_setup_inject()Dan Carpenter
In 882716604ec "pktgen: fix multiple queue warning" we added special logic to handle the case where ntxq is zero. It's not clear to me that ntxq can actually be zero. But if it were then we would set ->queue_map_min and ->queue_map_max to USHRT_MAX when probably we want to set them to zero? Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-07net: fix sock_clone reference mismatch with tcp memcontrolGlauber Costa
Sockets can also be created through sock_clone. Because it copies all data in the sock structure, it also copies the memcg-related pointer, and all should be fine. However, since we now use reference counts in socket creation, we are left with some sockets that have no reference counts. It matters when we destroy them, since it leads to a mismatch. Signed-off-by: Glauber Costa <glommer@parallels.com> CC: David S. Miller <davem@davemloft.net> CC: Greg Thelen <gthelen@google.com> CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: Laurent Chavey <chavey@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-05security: remove the security_netlink_recv hook as it is equivalent to capable()Eric Paris
Once upon a time netlink was not sync and we had to get the effective capabilities from the skb that was being received. Today we instead get the capabilities from the current task. This has rendered the entire purpose of the hook moot as it is now functionally equivalent to the capable() call. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-04ethtool: Remove ethtool_ops::set_rx_ntuple operationBen Hutchings
All implementations have been converted to implement set_rxnfc instead. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-04ethtool: Allow drivers to select RX NFC rule locationsBen Hutchings
Define special location values for RX NFC that request the driver to select the actual rule location. This allows for implementation on devices that use hash-based filter lookup, whereas currently the API is more suited to devices with TCAM lookup or linear search. In ethtool_set_rxnfc() and the compat wrapper ethtool_ioctl(), copy the structure back to user-space after insertion so that the actual location is returned. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-30sock_diag: Introduce the meminfo nla core (v2)Pavel Emelyanov
Add a routine that dumps memory-related values of a socket. It's made as an array to make it possible to add more stuff here later without breaking compatibility. Since v1: The SK_MEMINFO_ constants are in userspace visible part of sock_diag.h, the rest is under __KERNEL__. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-28ipv6: Use universal hash for NDISC.David S. Miller
In order to perform a proper universal hash on a vector of integers, we have to use different universal hashes on each vector element. Which means we need 4 different hash randoms for ipv6. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-24rfs: better sizing of dev_flow_tableEric Dumazet
Aim of this patch is to provide full range of rps_flow_cnt on 64bit arches. Theorical limit on number of flows is 2^32 Fix some buggy RPS/RFS macros as well. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> CC: Xi Wang <xi.wang@gmail.com> CC: Laurent Chavey <chavey@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/bluetooth/l2cap_core.c Just two overlapping changes, one added an initialization of a local variable, and another change added a new local variable. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23net: relax rcvbuf limitsEric Dumazet
skb->truesize might be big even for a small packet. Its even bigger after commit 87fb4b7b533 (net: more accurate skb truesize) and big MTU. We should allow queueing at least one packet per receiver, even with a low RCVBUF setting. Reported-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22rps: fix insufficient bounds checking in store_rps_dev_flow_table_cnt()Xi Wang
Setting a large rps_flow_cnt like (1 << 30) on 32-bit platform will cause a kernel oops due to insufficient bounds checking. if (count > 1<<30) { /* Enforce a limit to prevent overflow */ return -EINVAL; } count = roundup_pow_of_two(count); table = vmalloc(RPS_DEV_FLOW_TABLE_SIZE(count)); Note that the macro RPS_DEV_FLOW_TABLE_SIZE(count) is defined as: ... + (count * sizeof(struct rps_dev_flow)) where sizeof(struct rps_dev_flow) is 8. (1 << 30) * 8 will overflow 32 bits. This patch replaces the magic number (1 << 30) with a symbolic bound. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-21net: Add a flow_cache_flush_deferred functionSteffen Klassert
flow_cach_flush() might sleep but can be called from atomic context via the xfrm garbage collector. So add a flow_cache_flush_deferred() function and use this if the xfrm garbage colector is invoked from within the packet path. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19Revert "net: Remove unused neighbour layer ops."David S. Miller
This reverts commit 5c3ddec73d01a1fae9409c197078cb02c42238c3. S390 qeth driver actually still uses the setup ops. Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16net:core: use IS_ENABLEDIgor Maravić
Use IS_ENABLED(CONFIG_FOO) instead of defined(CONFIG_FOO) || defined (CONFIG_FOO_MODULE) Signed-off-by: Igor Maravić <igorm@etf.rs> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16net: fix sleeping while atomic problem in sock mem_cgroup.Glauber Costa
We can't scan the proto_list to initialize sock cgroups, as it holds a rwlock, and we also want to keep the code generic enough to avoid calling the initialization functions of protocols directly, Convert proto_list_lock into a mutex, so we can sleep and do the necessary allocations. This lock is seldom taken, so there shouldn't be any performance penalties associated with that Signed-off-by: Glauber Costa <glommer@parallels.com> CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Rothwell <sfr@canb.auug.org.au> CC: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16ethtool: Define and apply a default policy for RX flow hash indirectionBen Hutchings
All drivers that support modification of the RX flow hash indirection table initialise it in the same way: RX rings are assigned to table entries in rotation. Make that default policy explicit by having them call a ethtool_rxfh_indir_default() function. In the ethtool core, add support for a zero size value for ETHTOOL_SRXFHINDIR, which resets the table to this default. Partly-suggested-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Shreyas N Bhatewara <sbhatewara@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16ethtool: Centralise validation of ETHTOOL_{G, S}RXFHINDIR parametersBen Hutchings
Add a new ethtool operation (get_rxfh_indir_size) to get the indirectional table size. Use this to validate the user buffer size before calling get_rxfh_indir or set_rxfh_indir. Use get_rxnfc to get the number of RX rings, and validate the contents of the new indirection table before calling set_rxfh_indir. Remove this validation from drivers. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Dimitris Michailidis <dm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16sock_diag: Generalize requests cookies managementsPavel Emelyanov
The sk address is used as a cookie between dump/get_exact calls. It will be required for unix socket sdumping, so move it from inet_diag to sock_diag. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16sock_diag: Fix module netlink aliasesPavel Emelyanov
I've made a mistake when fixing the sock_/inet_diag aliases :( 1. The sock_diag layer should request the family-based alias, not just the IPPROTO_IP one; 2. The inet_diag layer should request for AF_INET+protocol alias, not just the protocol one. Thus fix this. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-14rtnetlink: rtnl_link_register() sanity testEric Dumazet
Before adding a struct rtnl_link_ops into link_ops list, check it doesnt clash with a prior one. Based on a previous patch from Alexander Smirnov Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>