aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4/tcp_ipv4.c
diff options
context:
space:
mode:
authorWillem de Bruijn <willemb@google.com>2012-10-26 11:52:08 +0000
committerDavid S. Miller <davem@davemloft.net>2012-10-31 13:56:40 -0400
commitecd5cf5dc5b07bc57aedf4c92453d6c343e3d840 (patch)
tree1e1b88386c2f7ceaabf56d362bf3b7789c91f9e1 /net/ipv4/tcp_ipv4.c
parent9add4d8174907c9977ad29cf6e71460829ec954e (diff)
net: compute skb->rxhash if nic hash may be 3-tuple
Network device drivers can communicate a Toeplitz hash in skb->rxhash, but devices differ in their hashing capabilities. All compute a 5-tuple hash for TCP over IPv4, but for other connection-oriented protocols, they may compute only a 3-tuple. This breaks RPS load balancing, e.g., for TCP over IPv6 flows. Additionally, for GRE and other tunnels, the kernel computes a 5-tuple hash over the inner packet if possible, but devices do not. This patch recomputes the rxhash in software in all cases where it cannot be certain that a 5-tuple was computed. Device drivers can avoid recomputation by setting the skb->l4_rxhash flag. Recomputing adds cycles to each packet when RPS is enabled or the packet arrives over a tunnel. A comparison of 200x TCP_STREAM between two servers running unmodified netnext with rxhash computation in hardware vs software (using ethtool -K eth0 rxhash [on|off]) shows how much time is spent in __skb_get_rxhash in this worst case: 0.03% swapper [kernel.kallsyms] [k] __skb_get_rxhash 0.03% swapper [kernel.kallsyms] [k] __skb_get_rxhash 0.05% swapper [kernel.kallsyms] [k] __skb_get_rxhash With 200x TCP_RR it increases to 0.10% netperf [kernel.kallsyms] [k] __skb_get_rxhash 0.10% netperf [kernel.kallsyms] [k] __skb_get_rxhash 0.10% netperf [kernel.kallsyms] [k] __skb_get_rxhash I considered having the patch explicitly skips recomputation when it knows that it will not improve the hash (TCP over IPv4), but that conditional complicates code without saving many cycles in practice, because it has to take place after flow dissector. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/ipv4/tcp_ipv4.c')
0 files changed, 0 insertions, 0 deletions