aboutsummaryrefslogtreecommitdiff
path: root/net/openvswitch/datapath.c
AgeCommit message (Collapse)Author
2013-04-30openvswitch: Remove unneeded ovs_netdev_get_ifindex()Thomas Graf
The only user is get_dpifindex(), no need to redirect via the port operations. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-25openvswitch: Use parallel_ops genl.Pravin B Shelar
OVS locking was recently changed to have private OVS lock which simplified overall locking. Therefore there is no need to have another global genl lock to protect OVS data structures. Following patch uses of parallel_ops genl family for OVS. This also allows more granual OVS locking using ovs_mutex for protecting OVS data structures, which gives more concurrencey. E.g multiple genl operations OVS_PACKET_CMD_EXECUTE can run in parallel, etc. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/emulex/benet/be_main.c drivers/net/ethernet/intel/igb/igb_main.c drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c include/net/scm.h net/batman-adv/routing.c net/ipv4/tcp_input.c The e{uid,gid} --> {uid,gid} credentials fix conflicted with the cleanup in net-next to now pass cred structs around. The be2net driver had a bug fix in 'net' that overlapped with the VLAN interface changes by Patrick McHardy in net-next. An IGB conflict existed because in 'net' the build_skb() support was reverted, and in 'net-next' there was a comment style fix within that code. Several batman-adv conflicts were resolved by making sure that all calls to batadv_is_my_mac() are changed to have a new bat_priv first argument. Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO rewrite in 'net-next', mostly overlapping changes. Thanks to Stephen Rothwell and Antonio Quartulli for help with several of these merge resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19net: vlan: add protocol argument to packet tagging functionsPatrick McHardy
Add a protocol argument to the VLAN packet tagging functions. In case of HW tagging, we need that protocol available in the ndo_start_xmit functions, so it is stored in a new field in the skb. The new field fits into a hole (on 64 bit) and doesn't increase the sks's size. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-15openvswitch: Simplify datapath locking.Pravin B Shelar
Currently OVS uses combination of genl and rtnl lock to protect datapath state. This was done due to networking stack locking. But this has complicated locking and there are few lock ordering issues with new tunneling protocols. Following patch simplifies locking by introducing new ovs mutex and now this lock is used to protect entire ovs state. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-03-29openvswitch: Move common genl notify code into ovs_notify()Thomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-03-29openvswitch: Refine Netlink message size calculation and kill FLOW_BUFSIZEThomas Graf
Kills the FLOW_BUFSIZE constant which needs to be calculated manually and replaces it with key_attr_size() based on nla_total_size(). Calculates the size of datapath messages instead of relying on NLMSG_DEFAULT_SIZE and moves the existing message size calculations into own functions for clarity. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-03-29openvswitch: Use nla_memcpy() to memcpy() data from attributesThomas Graf
Less error prone as it takes into account the length of both the destination buffer and the source attribute and documents when data is copied from an attribute. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-03-29openvswitch: Specify the minimal length of OVS_PACKET_ATTR_PACKET in the policyThomas Graf
Specifying the minimal length in the policy makes it reuseable and documents the interface. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-03-28net: add ETH_P_802_3_MINSimon Horman
Add a new constant ETH_P_802_3_MIN, the minimum ethernet type for an 802.3 frame. Frames with a lower value in the ethernet type field are Ethernet II. Also update all the users of this value that David Miller and I could find to use the new constant. Also correct a bug in util.c. The comparison with ETH_P_802_3_MIN should be >= not >. As suggested by Jesse Gross. Compile tested only. Cc: David Miller <davem@davemloft.net> Cc: Jesse Gross <jesse@nicira.com> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: John W. Linville <linville@tuxdriver.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Bart De Schuymer <bart.de.schuymer@pandora.be> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: linux-bluetooth@vger.kernel.org Cc: netfilter-devel@vger.kernel.org Cc: bridge@lists.linux-foundation.org Cc: linux-wireless@vger.kernel.org Cc: linux1394-devel@lists.sourceforge.net Cc: linux-media@vger.kernel.org Cc: netdev@vger.kernel.org Cc: dev@openvswitch.org Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27openvswitch: Preallocate reply skb in ovs_vport_cmd_set().Jesse Gross
Allocation of the Netlink notification skb can potentially fail after changing vport configuration. In general, we try to avoid this by undoing any change we made but that is difficult for existing objects. This avoids the problem by preallocating the buffer (which is fixed size). Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-03-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Pull in the 'net' tree to get Daniel Borkmann's flow dissector infrastructure change. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-17Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Conflicts: net/openvswitch/vport-internal_dev.c Jesse Gross says: ==================== A couple of minor enhancements for net-next/3.10. The largest is an extension to allow variable length metadata to be passed to userspace with packets. There is a merge conflict in net/openvswitch/vport-internal_dev.c: A existing commit modifies internal_dev_mac_addr() and a new commit deletes it. The new one is correct, so you can just remove that function. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-15Merge branch 'fixes' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== A few different bug fixes, including several for issues with userspace communication that have gone unnoticed up until now. These are intended for net/3.9. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-27hlist: drop the node parameter from iteratorsSasha Levin
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22openvswitch: Call genlmsg_end in queue_userspace_packetRich Lane
Without genlmsg_end the upcall message ends (according to nlmsg_len) after the struct ovs_header. Signed-off-by: Rich Lane <rlane@bigswitch.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-02-22openvswitch: Fix ovs_vport_cmd_new return value on successRich Lane
If the pointer does not represent an error then the PTR_ERR macro may still return a nonzero value. Signed-off-by: Rich Lane <rlane@bigswitch.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-02-22openvswitch: Fix ovs_vport_cmd_del return value on successRich Lane
If the pointer does not represent an error then the PTR_ERR macro may still return a nonzero value. The fix is the same as in ovs_vport_cmd_set. Signed-off-by: Rich Lane <rlane@bigswitch.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-02-22openvswitch: Allow OVS_USERSPACE_ATTR_USERDATA to be variable length.Ben Pfaff
Until now, the optional OVS_USERSPACE_ATTR_USERDATA attribute had to be exactly 64 bits long, if it was present. However, 64 bits is not enough space to associate as much information with a flow as would be convenient for some userspace features now under development. This commit generalizes the attribute, allowing it to be any length. This generalization is backward-compatible: if userspace only uses 64-bit attributes, then it will not see any change in behavior. CC: Romain Lenglet <rlenglet@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-02-06net: adjust skb_gso_segment() for calling in rx pathCong Wang
skb_gso_segment() is almost always called in tx path, except for openvswitch. It calls this function when it receives the packet and tries to queue it to user-space. In this special case, the ->ip_summed check inside skb_gso_segment() is no longer true, as ->ip_summed value has different meanings on rx path. This patch adjusts skb_gso_segment() so that we can at least avoid such warnings on checksum. Cc: Jesse Gross <jesse@nicira.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-09openvswitch: Use FIELD_SIZEOF() in dp_init().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-09openvswitch: Change ENOENT return value to ENODEV in lookup_vport().Jarno Rajahalme
This reduces the number of valid "no such device" error values that need special attention by the caller. Userspace code will need to keep on checking for both ENODEV and ENOENT as long as older kernel modules are around. Signed-off-by: Jarno Rajahalme <jarno.rajahalme@nsn.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-11-26openvswitch: add skb mark matching and set actionAnsis Atteka
This patch adds support for skb mark matching and set action. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-11-16net: openvswitch: use this_cpu_ptr per-cpu helperShan Wei
just use more faster this_cpu_ptr instead of per_cpu_ptr(p, smp_processor_id()); Signed-off-by: Shan Wei <davidshan@tencent.com> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-11-13openvswitch: add ipv6 'set' actionAnsis Atteka
This patch adds ipv6 set action functionality. It allows to change traffic class, flow label, hop-limit, ipv6 source and destination address fields. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-09-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/netfilter/nfnetlink_log.c net/netfilter/xt_LOG.c Rather easy conflict resolution, the 'net' tree had bug fixes to make sure we checked if a socket is a time-wait one or not and elide the logging code if so. Whereas on the 'net-next' side we are calculating the UID and GID from the creds using different interfaces due to the user namespace changes from Eric Biederman. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-10netlink: Rename pid to portid to avoid confusionEric W. Biederman
It is a frequent mistake to confuse the netlink port identifier with a process identifier. Try to reduce this confusion by renaming fields that hold port identifiers portid instead of pid. I have carefully avoided changing the structures exported to userspace to avoid changing the userspace API. I have successfully built an allyesconfig kernel with this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-03openvswitch: Increase maximum number of datapath ports.Pravin B Shelar
Use hash table to store ports of datapath. Allow 64K ports per switch. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-08-22openvswitch: Add support for network namespaces.Pravin B Shelar
Following patch adds support for network namespace to openvswitch. Since it must release devices when namespaces are destroyed, a side effect of this patch is that the module no longer keeps a refcount but instead cleans up any state when it is unloaded. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-08-06openvswitch: Relax set header validation.Jesse Gross
When installing a flow with an action to set a particular field we need to validate that the packets that are part of the flow actually contain that header. With IP we use zeroed addresses and with TCP/UDP the check is for zeroed ports. This check is overly broad and can catch packets like DHCP requests that have a zero source address in a legitimate header. This changes the check to look for a zeroed protocol number for IP or for both ports be zero for TCP/UDP before considering the header to not exist. Reported-by: Ethan Jackson <ethan@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-07-20Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== A few bug fixes and small enhancements for net-next/3.6. ... Ansis Atteka (1): openvswitch: Do not send notification if ovs_vport_set_options() failed Ben Pfaff (1): openvswitch: Check gso_type for correct sk_buff in queue_gso_packets(). Jesse Gross (2): openvswitch: Enable retrieval of TCP flags from IPv6 traffic. openvswitch: Reset upper layer protocol info on internal devices. Leo Alterman (1): openvswitch: Fix typo in documentation. Pravin B Shelar (1): openvswitch: Check currect return value from skb_gso_segment() Raju Subramanian (1): openvswitch: Replace Nicira Networks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20openvswitch: Check gso_type for correct sk_buff in queue_gso_packets().Ben Pfaff
At the point where it was used, skb_shinfo(skb)->gso_type referred to a post-GSO sk_buff. Thus, it would always be 0. We want to know the pre-GSO gso_type, so we need to obtain it before segmenting. Before this change, the kernel would pass inconsistent data to userspace: packets for UDP fragments with nonzero offset would be passed along with flow keys that indicate a zero offset (that is, the flow key for "later" fragments claimed to be "first" fragments). This inconsistency tended to confuse Open vSwitch userspace, causing it to log messages about "failed to flow_del" the flows with "later" fragments. Signed-off-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-07-20openvswitch: Check currect return value from skb_gso_segment()Pravin B Shelar
Fix return check typo. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-05-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2012-05-13openvswitch: checking wrong variable in queue_userspace_packet()Dan Carpenter
"skb" is non-NULL here, for example we dereference it in skb_clone(). The intent was to test "nskb" which was just set. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-07openvswitch: Validation of IPv6 set port action uses IPv4 headerPravin B Shelar
When the kernel validates set TCP/UDP port actions, it looks at the ports in the existing flow to make sure that the L4 header exists. However, these actions always use the IPv4 version of the struct. Following patch fixes this by checking for flow ip protocol first. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-05-03openvswitch: Replace Nicira Networks.Raju Subramanian
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc. Signed-off-by: Raju Subramanian <rsubramanian@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-05-03openvswitch: Release rtnl_lock if ovs_vport_cmd_build_info() failed.Ansis Atteka
This patch fixes a possible lock-up bug where rtnl_lock might not get released. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-04-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2012-04-09openvswitch: Do not send notification if ovs_vport_set_options() failedAnsis Atteka
There is no need to send a notification if ovs_vport_set_options() failed and ovs_vport_cmd_set() did not change anything. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-04-01openvswitch: Stop using NLA_PUT*().David S. Miller
These macros contain a hidden goto, and are thus extremely error prone and make code hard to audit. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-28Remove all #inclusions of asm/system.hDavid Howells
Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-06openvswitch: Honor dp_ifindex, when specified, for vport lookup by name.Ben Pfaff
When OVS_VPORT_ATTR_NAME is specified and dp_ifindex is nonzero, the logical behavior would be for the vport name lookup scope to be limited to the specified datapath, but in fact the dp_ifindex value was ignored. This commit causes the search scope to be honored. Signed-off-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-01-17openvswitch: Fix multipart datapath dumps.Ben Pfaff
The logic to split up the list of datapaths into multiple Netlink messages was simply wrong, causing the list to be terminated after the first part. Only about the first 50 datapaths would be dumped. This fixes the problem. Reported-by: Paul Ingram <paul@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-17net: remove version.h includes in net/openvswitch/Devendra Naga
remove version.h includes in net/openswitch/ as reported by make versioncheck. Signed-off-by: Devendra Naga <devendra.aaru@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-03net: Add Open vSwitch kernel components.Jesse Gross
Open vSwitch is a multilayer Ethernet switch targeted at virtualized environments. In addition to supporting a variety of features expected in a traditional hardware switch, it enables fine-grained programmatic extension and flow-based control of the network. This control is useful in a wide variety of applications but is particularly important in multi-server virtualization deployments, which are often characterized by highly dynamic endpoints and the need to maintain logical abstractions for multiple tenants. The Open vSwitch datapath provides an in-kernel fast path for packet forwarding. It is complemented by a userspace daemon, ovs-vswitchd, which is able to accept configuration from a variety of sources and translate it into packet processing rules. See http://openvswitch.org for more information and userspace utilities. Signed-off-by: Jesse Gross <jesse@nicira.com>