aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2013-06-27ipv4: fix redirect handling for TCP packetsMichal Kubecek
[ Upstream commit f96ef988cc603487c03a6de07807b06cbe641829 ] Unlike ipv4_redirect() and ipv4_sk_redirect(), ip_do_redirect() doesn't call __build_flow_key() directly but via ip_rt_build_flow_key() wrapper. This leads to __build_flow_key() getting pointer to IPv4 header of the ICMP redirect packet rather than pointer to the embedded IPv4 header of the packet initiating the redirect. As a result, handling of ICMP redirects initiated by TCP packets is broken. Issue was introduced by 4895c771c ("ipv4: Add FIB nexthop exceptions.") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-27ip_tunnel: fix kernel panic with icmp_dest_unreachEric Dumazet
[ Upstream commit a622260254ee481747cceaaa8609985b29a31565 ] Daniel Petre reported crashes in icmp_dst_unreach() with following call graph: Daniel found a similar problem mentioned in http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html And indeed this is the root cause : skb->cb[] contains data fooling IP stack. We must clear IPCB in ip_tunnel_xmit() sooner in case dst_link_failure() is called. Or else skb->cb[] might contain garbage from GSO segmentation layer. A similar fix was tested on linux-3.9, but gre code was refactored in linux-3.10. I'll send patches for stable kernels as well. Many thanks to Daniel for providing reports, patches and testing ! Reported-by: Daniel Petre <daniel.petre@rcs-rds.ro> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-27tcp: xps: fix reordering issuesEric Dumazet
[ Upstream commit 547669d483e5783d722772af1483fa474da7caf9 ] commit 3853b5841c01a ("xps: Improvements in TX queue selection") introduced ooo_okay flag, but the condition to set it is slightly wrong. In our traces, we have seen ACK packets being received out of order, and RST packets sent in response. We should test if we have any packets still in host queue. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-27tcp: bug fix in proportional rate reduction.Nandita Dukkipati
[ Upstream commit 35f079ebbc860dcd1cca70890c9c8d59c1145525 ] This patch is a fix for a bug triggering newly_acked_sacked < 0 in tcp_ack(.). The bug is triggered by sacked_out decreasing relative to prior_sacked, but packets_out remaining the same as pior_packets. This is because the snapshot of prior_packets is taken after tcp_sacktag_write_queue() while prior_sacked is captured before tcp_sacktag_write_queue(). The problem is: tcp_sacktag_write_queue (tcp_match_skb_to_sack() -> tcp_fragment) adjusts the pcount for packets_out and sacked_out (MSS change or other reason). As a result, this delta in pcount is reflected in (prior_sacked - sacked_out) but not in (prior_packets - packets_out). This patch does the following: 1) initializes prior_packets at the start of tcp_ack() so as to capture the delta in packets_out created by tcp_fragment. 2) introduces a new "previous_packets_out" variable that snapshots packets_out right before tcp_clean_rtx_queue, so pkts_acked can be correctly computed as before. 3) Computes pkts_acked using previous_packets_out, and computes newly_acked_sacked using prior_packets. Signed-off-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-27tcp: fix tcp_md5_hash_skb_data()Eric Dumazet
[ Upstream commit 54d27fcb338bd9c42d1dfc5a39e18f6f9d373c2e ] TCP md5 communications fail [1] for some devices, because sg/crypto code assume page offsets are below PAGE_SIZE. This was discovered using mlx4 driver [2], but I suspect loopback might trigger the same bug now we use order-3 pages in tcp_sendmsg() [1] Failure is giving following messages. huh, entered softirq 3 NET_RX ffffffff806ad230 preempt_count 00000100, exited with 00000101? [2] mlx4 driver uses order-2 pages to allocate RX frags Reported-by: Matt Schnall <mischnal@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Bernhard Beck <bbeck@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-19net: frag, fix race conditions in LRU list maintenanceKonstantin Khlebnikov
[ Upstream commit b56141ab34e2c3e2d7960cea12c20c99530c0c76 ] This patch fixes race between inet_frag_lru_move() and inet_frag_lru_add() which was introduced in commit 3ef0eb0db4bf92c6d2510fe5c4dc51852746f206 ("net: frag, move LRU list maintenance outside of rwlock") One cpu already added new fragment queue into hash but not into LRU. Other cpu found it in hash and tries to move it to the end of LRU. This leads to NULL pointer dereference inside of list_move_tail(). Another possible race condition is between inet_frag_lru_move() and inet_frag_lru_del(): move can happens after deletion. This patch initializes LRU list head before adding fragment into hash and inet_frag_lru_move() doesn't touches it if it's empty. I saw this kernel oops two times in a couple of days. [119482.128853] BUG: unable to handle kernel NULL pointer dereference at (null) [119482.132693] IP: [<ffffffff812ede89>] __list_del_entry+0x29/0xd0 [119482.136456] PGD 2148f6067 PUD 215ab9067 PMD 0 [119482.140221] Oops: 0000 [#1] SMP [119482.144008] Modules linked in: vfat msdos fat 8021q fuse nfsd auth_rpcgss nfs_acl nfs lockd sunrpc ppp_async ppp_generic bridge slhc stp llc w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek kvm_amd k10temp kvm snd_hda_intel snd_hda_codec edac_core radeon snd_hwdep ath9k snd_pcm ath9k_common snd_page_alloc ath9k_hw snd_timer snd soundcore drm_kms_helper ath ttm r8169 mii [119482.152692] CPU 3 [119482.152721] Pid: 20, comm: ksoftirqd/3 Not tainted 3.9.0-zurg-00001-g9f95269 #132 To Be Filled By O.E.M. To Be Filled By O.E.M./RS880D [119482.161478] RIP: 0010:[<ffffffff812ede89>] [<ffffffff812ede89>] __list_del_entry+0x29/0xd0 [119482.166004] RSP: 0018:ffff880216d5db58 EFLAGS: 00010207 [119482.170568] RAX: 0000000000000000 RBX: ffff88020882b9c0 RCX: dead000000200200 [119482.175189] RDX: 0000000000000000 RSI: 0000000000000880 RDI: ffff88020882ba00 [119482.179860] RBP: ffff880216d5db58 R08: ffffffff8155c7f0 R09: 0000000000000014 [119482.184570] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88020882ba00 [119482.189337] R13: ffffffff81c8d780 R14: ffff880204357f00 R15: 00000000000005a0 [119482.194140] FS: 00007f58124dc700(0000) GS:ffff88021fcc0000(0000) knlGS:0000000000000000 [119482.198928] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [119482.203711] CR2: 0000000000000000 CR3: 00000002155f0000 CR4: 00000000000007e0 [119482.208533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [119482.213371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [119482.218221] Process ksoftirqd/3 (pid: 20, threadinfo ffff880216d5c000, task ffff880216d3a9a0) [119482.223113] Stack: [119482.228004] ffff880216d5dbd8 ffffffff8155dcda 0000000000000000 ffff000200000001 [119482.233038] ffff8802153c1f00 ffff880000289440 ffff880200000014 ffff88007bc72000 [119482.238083] 00000000000079d5 ffff88007bc72f44 ffffffff00000002 ffff880204357f00 [119482.243090] Call Trace: [119482.248009] [<ffffffff8155dcda>] ip_defrag+0x8fa/0xd10 [119482.252921] [<ffffffff815a8013>] ipv4_conntrack_defrag+0x83/0xe0 [119482.257803] [<ffffffff8154485b>] nf_iterate+0x8b/0xa0 [119482.262658] [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40 [119482.267527] [<ffffffff815448e4>] nf_hook_slow+0x74/0x130 [119482.272412] [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40 [119482.277302] [<ffffffff8155d068>] ip_rcv+0x268/0x320 [119482.282147] [<ffffffff81519992>] __netif_receive_skb_core+0x612/0x7e0 [119482.286998] [<ffffffff81519b78>] __netif_receive_skb+0x18/0x60 [119482.291826] [<ffffffff8151a650>] process_backlog+0xa0/0x160 [119482.296648] [<ffffffff81519f29>] net_rx_action+0x139/0x220 [119482.301403] [<ffffffff81053707>] __do_softirq+0xe7/0x220 [119482.306103] [<ffffffff81053868>] run_ksoftirqd+0x28/0x40 [119482.310809] [<ffffffff81074f5f>] smpboot_thread_fn+0xff/0x1a0 [119482.315515] [<ffffffff81074e60>] ? lg_local_lock_cpu+0x40/0x40 [119482.320219] [<ffffffff8106d870>] kthread+0xc0/0xd0 [119482.324858] [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40 [119482.329460] [<ffffffff816c32dc>] ret_from_fork+0x7c/0xb0 [119482.334057] [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40 [119482.338661] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08 [119482.343787] RIP [<ffffffff812ede89>] __list_del_entry+0x29/0xd0 [119482.348675] RSP <ffff880216d5db58> [119482.353493] CR2: 0000000000000000 Oops happened on this path: ip_defrag() -> ip_frag_queue() -> inet_frag_lru_move() -> list_move_tail() -> __list_del_entry() Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Florian Westphal <fw@strlen.de> Cc: Eric Dumazet <edumazet@google.com> Cc: David S. Miller <davem@davemloft.net> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-19tcp: reset timer after any SYNACK retransmitYuchung Cheng
[ Upstream commit cd75eff64dae8856afbf6ef0f0ca3c145465d8e0 ] Linux immediately returns SYNACK on (spurious) SYN retransmits, but keeps the SYNACK timer running independently. Thus the timer may fire right after the SYNACK retransmit and causes a SYN-SYNACK cross-fire burst. Adopt the fast retransmit/recovery idea in established state by re-arming the SYNACK timer after the fast (SYNACK) retransmit. The timer may fire late up to 500ms due to the current SYNACK timer wheel, but it's OK to be conservative when network is congested. Eric's new listener design should address this issue. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-19Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== If time allows, please consider pulling the following patchset contains two late Netfilter fixes, they are: * Skip broadcast/multicast locally generated traffic in the rpfilter, (closes netfilter bugzilla #814), from Florian Westphal. * Fix missing elements in the listing of ipset bitmap ip,mac set type with timeout support enabled, from Jozsef Kadlecsik. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19tcp: call tcp_replace_ts_recent() from tcp_ack()Eric Dumazet
commit bd090dfc634d (tcp: tcp_replace_ts_recent() should not be called from tcp_validate_incoming()) introduced a TS ecr bug in slow path processing. 1 A > B P. 1:10001(10000) ack 1 <nop,nop,TS val 1001 ecr 200> 2 B < A . 1:1(0) ack 1 win 257 <sack 9001:10001,TS val 300 ecr 1001> 3 A > B . 1:1001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200> 4 A > B . 1001:2001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200> (ecr 200 should be ecr 300 in packets 3 & 4) Problem is tcp_ack() can trigger send of new packets (retransmits), reflecting the prior TSval, instead of the TSval contained in the currently processed incoming packet. Fix this by calling tcp_replace_ts_recent() from tcp_ack() after the checks, but before the actions. Reported-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netfilter: xt_rpfilter: skip locally generated broadcast/multicast, tooFlorian Westphal
Alex Efros reported rpfilter module doesn't match following packets: IN=br.qemu SRC=192.168.2.1 DST=192.168.2.255 [ .. ] (netfilter bugzilla #814). Problem is that network stack arranges for the locally generated broadcasts to appear on the interface they were sent out, so the IFF_LOOPBACK check doesn't trigger. As -m rpfilter is restricted to PREROUTING, we can check for existing rtable instead, it catches locally-generated broad/multicast case, too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-17net: drop dst before queueing fragmentsEric Dumazet
Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path) added a bug in IP defragmentation handling, as non refcounted dst could escape an RCU protected section. Commit 64f3b9e203bd068 (net: ip_expire() must revalidate route) fixed the case of timeouts, but not the general problem. Tom Parkin noticed crashes in UDP stack and provided a patch, but further analysis permitted us to pinpoint the root cause. Before queueing a packet into a frag list, we must drop its dst, as this dst has limited lifetime (RCU protected) When/if a packet is finally reassembled, we use the dst of the very last skb, still protected by RCU and valid, as the dst of the reassembled packet. Use same logic in IPv6, as there is no need to hold dst references. Reported-by: Tom Parkin <tparkin@katalix.com> Tested-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-15esp4: fix error return code in esp_output()Wei Yongjun
Fix to return a negative error code from the error handling case instead of 0, as returned elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-11tcp: Reallocate headroom if it would overflow csum_startThomas Graf
If a TCP retransmission gets partially ACKed and collapsed multiple times it is possible for the headroom to grow beyond 64K which will overflow the 16bit skb->csum_start which is based on the start of the headroom. It has been observed rarely in the wild with IPoIB due to the 64K MTU. Verify if the acking and collapsing resulted in a headroom exceeding what csum_start can cover and reallocate the headroom if so. A big thank you to Jim Foraker <foraker1@llnl.gov> and the team at LLNL for helping out with the investigation and testing. Reported-by: Jim Foraker <foraker1@llnl.gov> Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-11tcp: incoming connections might use wrong route under synfloodDmitry Popov
There is a bug in cookie_v4_check (net/ipv4/syncookies.c): flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk), RT_SCOPE_UNIVERSE, IPPROTO_TCP, inet_sk_flowi_flags(sk), (opt && opt->srr) ? opt->faddr : ireq->rmt_addr, ireq->loc_addr, th->source, th->dest); Here we do not respect sk->sk_bound_dev_if, therefore wrong dst_entry may be taken. This dst_entry is used by new socket (get_cookie_sock -> tcp_v4_syn_recv_sock), so its packets may take the wrong path. Signed-off-by: Dmitry Popov <dp@highloadlab.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09selinux: add a skb_owned_by() hookEric Dumazet
Commit 90ba9b1986b5ac (tcp: tcp_make_synack() can use alloc_skb()) broke certain SELinux/NetLabel configurations by no longer correctly assigning the sock to the outgoing SYNACK packet. Cost of atomic operations on the LISTEN socket is quite big, and we would like it to happen only if really needed. This patch introduces a new security_ops->skb_owned_by() method, that is a void operation unless selinux is active. Reported-by: Miroslav Vadkerti <mvadkert@redhat.com> Diagnosed-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-security-module@vger.kernel.org Acked-by: James Morris <james.l.morris@oracle.com> Tested-by: Paul Moore <pmoore@redhat.com> Acked-by: Paul Moore <pmoore@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08net: ipv4: fix schedule while atomic bug in check_lifetime()Jiri Pirko
move might_sleep operations out of the rcu_read_lock() section. Also fix iterating over ifa_dev->ifa_list Introduced by: commit 5c766d642bcaf "ipv4: introduce address lifetime" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08net: ipv4: reset check_lifetime_work after changing lifetimeJiri Pirko
This will result in calling check_lifetime in nearest opportunity and that function will adjust next time to call check_lifetime correctly. Without this, check_lifetime is called in time computed by previous run, not affecting modified lifetime. Introduced by: commit 5c766d642bcaf "ipv4: introduce address lifetime" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05net: ipv4: notify when address lifetime changesJiri Pirko
if userspace changes lifetime of address, send netlink notification and call notifier. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26ipv4: Fix ip-header identification for gso packets.Pravin B Shelar
ip-header id needs to be incremented even if IP_DF flag is set. This behaviour was changed in commit 490ab08127cebc25e3a26 (IP_GRE: Fix IP-Identification). Following patch fixes it so that identification is always incremented. Reported-by: Cong Wang <amwang@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24tcp: undo spurious timeout after SACK renegingYuchung Cheng
On SACK reneging the sender immediately retransmits and forces a timeout but disables Eifel (undo). If the (buggy) receiver does not drop any packet this can trigger a false slow-start retransmit storm driven by the ACKs of the original packets. This can be detected with undo and TCP timestamps. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-22tcp: preserve ACK clocking in TSOEric Dumazet
A long standing problem with TSO is the fact that tcp_tso_should_defer() rearms the deferred timer, while it should not. Current code leads to following bad bursty behavior : 20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119 20:11:24.484337 IP B > A: . ack 263721 win 1117 20:11:24.485086 IP B > A: . ack 265241 win 1117 20:11:24.485925 IP B > A: . ack 266761 win 1117 20:11:24.486759 IP B > A: . ack 268281 win 1117 20:11:24.487594 IP B > A: . ack 269801 win 1117 20:11:24.488430 IP B > A: . ack 271321 win 1117 20:11:24.489267 IP B > A: . ack 272841 win 1117 20:11:24.490104 IP B > A: . ack 274361 win 1117 20:11:24.490939 IP B > A: . ack 275881 win 1117 20:11:24.491775 IP B > A: . ack 277401 win 1117 20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119 20:11:24.492620 IP B > A: . ack 278921 win 1117 20:11:24.493448 IP B > A: . ack 280441 win 1117 20:11:24.494286 IP B > A: . ack 281961 win 1117 20:11:24.495122 IP B > A: . ack 283481 win 1117 20:11:24.495958 IP B > A: . ack 285001 win 1117 20:11:24.496791 IP B > A: . ack 286521 win 1117 20:11:24.497628 IP B > A: . ack 288041 win 1117 20:11:24.498459 IP B > A: . ack 289561 win 1117 20:11:24.499296 IP B > A: . ack 291081 win 1117 20:11:24.500133 IP B > A: . ack 292601 win 1117 20:11:24.500970 IP B > A: . ack 294121 win 1117 20:11:24.501388 IP B > A: . ack 295641 win 1117 20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119 While the expected behavior is more like : 20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119 20:19:49.260446 IP B > A: . ack 154281 win 1212 20:19:49.261282 IP B > A: . ack 155801 win 1212 20:19:49.262125 IP B > A: . ack 157321 win 1212 20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119 20:19:49.262958 IP B > A: . ack 158841 win 1212 20:19:49.263795 IP B > A: . ack 160361 win 1212 20:19:49.264628 IP B > A: . ack 161881 win 1212 20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119 20:19:49.265465 IP B > A: . ack 163401 win 1212 20:19:49.265886 IP B > A: . ack 164921 win 1212 20:19:49.266722 IP B > A: . ack 166441 win 1212 20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119 20:19:49.267559 IP B > A: . ack 167961 win 1212 20:19:49.268394 IP B > A: . ack 169481 win 1212 20:19:49.269232 IP B > A: . ack 171001 win 1212 20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20ipconfig: Fix newline handling in log message.Martin Fuzzey
When using ipconfig the logs currently look like: Single name server: [ 3.467270] IP-Config: Complete: [ 3.470613] device=eth0, hwaddr=ac:de:48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1 [ 3.480670] host=infigo-1, domain=, nis-domain=(none) [ 3.486166] bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath= [ 3.492910] nameserver0=172.16.42.1[ 3.496853] ALSA device list: Three name servers: [ 3.496949] IP-Config: Complete: [ 3.500293] device=eth0, hwaddr=ac:de:48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1 [ 3.510367] host=infigo-1, domain=, nis-domain=(none) [ 3.515864] bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath= [ 3.522635] nameserver0=172.16.42.1, nameserver1=172.16.42.100 [ 3.529149] , nameserver2=172.16.42.200 Fix newline handling for these cases Signed-off-by: Martin Fuzzey <mfuzzey@parkeon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20udp: add encap_destroy callbackTom Parkin
Users of udp encapsulation currently have an encap_rcv callback which they can use to hook into the udp receive path. In situations where a encapsulation user allocates resources associated with a udp encap socket, it may be convenient to be able to also hook the proto .destroy operation. For example, if an encap user holds a reference to the udp socket, the destroy hook might be used to relinquish this reference. This patch adds a socket destroy hook into udp, which is set and enabled in the same way as the existing encap_rcv hook. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20Merge branch 'master' of git://1984.lsi.us.es/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains 7 Netfilter/IPVS fixes for 3.9-rc, they are: * Restrict IPv6 stateless NPT targets to the mangle table. Many users are complaining that this target does not work in the nat table, which is the wrong table for it, from Florian Westphal. * Fix possible use before initialization in the netns init path of several conntrack protocol trackers (introduced recently while improving conntrack netns support), from Gao Feng. * Fix incorrect initialization of copy_range in nfnetlink_queue, spotted by Eric Dumazet during the NFWS2013, patch from myself. * Fix wrong calculation of next SCTP chunk in IPVS, from Julian Anastasov. * Remove rcu_read_lock section in IPVS while calling ipv4_update_pmtu not required anymore after change introduced in 3.7, again from Julian. * Fix SYN looping in IPVS state sync if the backup is used a real server in DR/TUN modes, this required a new /proc entry to disable the director function when acting as backup, also from Julian. * Remove leftover IP_NF_QUEUE Kconfig after ip_queue removal, noted by Paul Bolle. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20netfilter: remove unused "config IP_NF_QUEUE"Paul Bolle
Kconfig symbol IP_NF_QUEUE is unused since commit d16cf20e2f2f13411eece7f7fb72c17d141c4a84 ("netfilter: remove ip_queue support"). Let's remove it too. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-19inet: limit length of fragment queue hash table bucket listsHannes Frederic Sowa
This patch introduces a constant limit of the fragment queue hash table bucket list lengths. Currently the limit 128 is choosen somewhat arbitrary and just ensures that we can fill up the fragment cache with empty packets up to the default ip_frag_high_thresh limits. It should just protect from list iteration eating considerable amounts of cpu. If we reach the maximum length in one hash bucket a warning is printed. This is implemented on the caller side of inet_frag_find to distinguish between the different users of inet_fragment.c. I dropped the out of memory warning in the ipv4 fragment lookup path, because we already get a warning by the slab allocator. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jesper Dangaard Brouer <jbrouer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-18tcp: dont handle MTU reduction on LISTEN socketEric Dumazet
When an ICMP ICMP_FRAG_NEEDED (or ICMPV6_PKT_TOOBIG) message finds a LISTEN socket, and this socket is currently owned by the user, we set TCP_MTU_REDUCED_DEFERRED flag in listener tsq_flags. This is bad because if we clone the parent before it had a chance to clear the flag, the child inherits the tsq_flags value, and next tcp_release_cb() on the child will decrement sk_refcnt. Result is that we might free a live TCP socket, as reported by Dormando. IPv4: Attempt to release TCP socket in state 1 Fix this issue by testing sk_state against TCP_LISTEN early, so that we set TCP_MTU_REDUCED_DEFERRED on appropriate sockets (not a LISTEN one) This bug was introduced in commit 563d34d05786 (tcp: dont drop MTU reduction indications) Reported-by: dormando <dormando@rydia.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-16Revert "ip_gre: make ipgre_tunnel_xmit() not parse network header as IP ↵Timo Teräs
unconditionally" This reverts commit 412ed94744d16806fbec3bd250fd94e71cde5a1f. The commit is wrong as tiph points to the outer IPv4 header which is installed at ipgre_header() and not the inner one which is protocol dependant. This commit broke succesfully opennhrp which use PF_PACKET socket with ETH_P_NHRP protocol. Additionally ssl_addr is set to the link-layer IPv4 address. This address is written by ipgre_header() to the skb earlier, and this is the IPv4 header tiph should point to - regardless of the inner protocol payload. Signed-off-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-14tcp: fix skb_availroom()Eric Dumazet
Chrome OS team reported a crash on a Pixel ChromeBook in TCP stack : https://code.google.com/p/chromium/issues/detail?id=182056 commit a21d45726acac (tcp: avoid order-1 allocations on wifi and tx path) did a poor choice adding an 'avail_size' field to skb, while what we really needed was a 'reserved_tailroom' one. It would have avoided commit 22b4a4f22da (tcp: fix retransmit of partially acked frames) and this commit. Crash occurs because skb_split() is not aware of the 'avail_size' management (and should not be aware) Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Mukesh Agrawal <quiche@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12net/ipv4: Ensure that location of timestamp option is storedDavid Ward
This is needed in order to detect if the timestamp option appears more than once in a packet, to remove the option if the packet is fragmented, etc. My previous change neglected to store the option location when the router addresses were prespecified and Pointer > Length. But now the option location is also stored when Flag is an unrecognized value, to ensure these option handling behaviors are still performed. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07Fix: sparse warning in inet_csk_prepare_forced_closeChristoph Paasch
In e337e24d66 (inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock) I introduced the function inet_csk_prepare_forced_close, which does a call to bh_unlock_sock(). This produces a sparse-warning. This patch adds the missing __releases. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06net/ipv4: Timestamp option cannot overflow with prespecified addressesDavid Ward
When a router forwards a packet that contains the IPv4 timestamp option, if there is no space left in the option for the router to add its own timestamp, then the router increments the Overflow value in the option. However, if the addresses of the routers are prespecified in the option, then the overflow condition cannot happen: the option is structured so that each prespecified router has a place to write its timestamp. Other routers do not add a timestamp, so there will never be a lack of space. This fix ensures that the Overflow value in the IPv4 timestamp option is not incremented when the addresses of the routers are prespecified, even if the Pointer value is greater than the Length value. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "A moderately sized pile of fixes, some specifically for merge window introduced regressions although others are for longer standing items and have been queued up for -stable. I'm kind of tired of all the RDS protocol bugs over the years, to be honest, it's way out of proportion to the number of people who actually use it. 1) Fix missing range initialization in netfilter IPSET, from Jozsef Kadlecsik. 2) ieee80211_local->tim_lock needs to use BH disabling, from Johannes Berg. 3) Fix DMA syncing in SFC driver, from Ben Hutchings. 4) Fix regression in BOND device MAC address setting, from Jiri Pirko. 5) Missing usb_free_urb in ISDN Hisax driver, from Marina Makienko. 6) Fix UDP checksumming in bnx2x driver for 57710 and 57711 chips, fix from Dmitry Kravkov. 7) Missing cfgspace_lock initialization in BCMA driver. 8) Validate parameter size for SCTP assoc stats getsockopt(), from Guenter Roeck. 9) Fix SCTP association hangs, from Lee A Roberts. 10) Fix jumbo frame handling in r8169, from Francois Romieu. 11) Fix phy_device memory leak, from Petr Malat. 12) Omit trailing FCS from frames received in BGMAC driver, from Hauke Mehrtens. 13) Missing socket refcount release in L2TP, from Guillaume Nault. 14) sctp_endpoint_init should respect passed in gfp_t, rather than use GFP_KERNEL unconditionally. From Dan Carpenter. 15) Add AISX AX88179 USB driver, from Freddy Xin. 16) Remove MAINTAINERS entries for drivers deleted during the merge window, from Cesar Eduardo Barros. 17) RDS protocol can try to allocate huge amounts of memory, check that the user's request length makes sense, from Cong Wang. 18) SCTP should use the provided KMALLOC_MAX_SIZE instead of it's own, bogus, definition. From Cong Wang. 19) Fix deadlocks in FEC driver by moving TX reclaim into NAPI poll, from Frank Li. Also, fix a build error introduced in the merge window. 20) Fix bogus purging of default routes in ipv6, from Lorenzo Colitti. 21) Don't double count RTT measurements when we leave the TCP receive fast path, from Neal Cardwell." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits) tcp: fix double-counted receiver RTT when leaving receiver fast path CAIF: fix sparse warning for caif_usb rds: simplify a warning message net: fec: fix build error in no MXC platform net: ipv6: Don't purge default router if accept_ra=2 net: fec: put tx to napi poll function to fix dead lock sctp: use KMALLOC_MAX_SIZE instead of its own MAX_KMALLOC_SIZE rds: limit the size allocated by rds_message_alloc() MAINTAINERS: remove eexpress MAINTAINERS: remove drivers/net/wan/cycx* MAINTAINERS: remove 3c505 caif_dev: fix sparse warnings for caif_flow_cb ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver sctp: use the passed in gfp flags instead GFP_KERNEL ipv[4|6]: correct dropwatch false positive in local_deliver_finish l2tp: Restore socket refcount when sendmsg succeeds net/phy: micrel: Disable asymmetric pause for KSZ9021 bgmac: omit the fcs phy: Fix phy_device_free memory leak bnx2x: Fix KR2 work-around condition ...
2013-03-04tcp: fix double-counted receiver RTT when leaving receiver fast pathNeal Cardwell
We should not update ts_recent and call tcp_rcv_rtt_measure_ts() both before and after going to step5. That wastes CPU and double-counts the receiver-side RTT sample. Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-01ipv[4|6]: correct dropwatch false positive in local_deliver_finishNeil Horman
I had a report recently of a user trying to use dropwatch to localise some frame loss, and they were getting false positives. Turned out they were using a user space SCTP stack that used raw sockets to grab frames. When we don't have a registered protocol for a given packet, we record it as a drop, even if a raw socket receieves the frame. We should only record the drop in the event a raw socket doesnt exist to receive the frames Tested by the reported successfully Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: William Reich <reich@ulticom.com> Tested-by: William Reich <reich@ulticom.com> CC: "David S. Miller" <davem@davemloft.net> CC: William Reich <reich@ulticom.com> CC: eric.dumazet@gmail.com Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-27hlist: drop the node parameter from iteratorsSasha Levin
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle *and* ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...
2013-02-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) ping_err() ICMP error handler looks at wrong ICMP header, from Li Wei. 2) TCP socket hash function on ipv6 is too weak, from Eric Dumazet. 3) netif_set_xps_queue() forgets to drop mutex on errors, fix from Alexander Duyck. 4) sum_frag_mem_limit() can deadlock due to lack of BH disabling, fix from Eric Dumazet. 5) TCP SYN data is miscalculated in tcp_send_syn_data(), because the amount of TCP option space was not taken into account properly in this code path. Fix from yuchung Cheng. 6) MLX4 driver allocates device queues with the wrong size, from Kleber Sacilotto. 7) sock_diag can access past the end of the sock_diag_handlers[] array, from Mathias Krause. 8) vlan_set_encap_proto() makes incorrect assumptions about where skb->data points, rework the logic so that it works regardless of where skb->data happens to be. From Jesse Gross. 9) Fix gianfar build failure with NET_POLL enabled, from Paul Gortmaker. 10) Fix Ipv4 ID setting and checksum calculations in GRE driver, from Pravin B Shelar. 11) bgmac driver does: int i; for (i = 0; ...; ...) { ... for (i = 0; ...; ...) { effectively corrupting the outer loop index, use a seperate variable for the inner loops. From Rafał Miłecki. 12) Fix suspend bugs in smsc95xx driver, from Ming Lei. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits) usbnet: smsc95xx: rename FEATURE_AUTOSUSPEND usbnet: smsc95xx: fix broken runtime suspend usbnet: smsc95xx: fix suspend failure bgmac: fix indexing of 2nd level loops b43: Fix lockdep splat on module unload Revert "ip_gre: propogate target device GSO capability to the tunnel device" IP_GRE: Fix GRE_CSUM case. VXLAN: Use tunnel_ip_select_ident() for tunnel IP-Identification. IP_GRE: Fix IP-Identification. net/pasemi: Fix missing coding style vmxnet3: fix ethtool ring buffer size setting vmxnet3: make local function static bnx2x: remove dead code and make local funcs static gianfar: fix compile fail for NET_POLL=y due to struct packing vlan: adjust vlan_set_encap_proto() for its callers sock_diag: Simplify sock_diag_handlers[] handling in __sock_diag_rcv_msg sock_diag: Fix out-of-bounds access to sock_diag_handlers[] vxlan: remove depends on CONFIG_EXPERIMENTAL mlx4_en: fix allocation of CPU affinity reverse-map mlx4_en: fix allocation of device tx_cq ...
2013-02-26Merge branch 'next' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds
Pull slave-dmaengine updates from Vinod Koul: "This is fairly big pull by my standards as I had missed last merge window. So we have the support for device tree for slave-dmaengine, large updates to dw_dmac driver from Andy for reusing on different architectures. Along with this we have fixes on bunch of the drivers" Fix up trivial conflicts, usually due to #include line movement next to each other. * 'next' of git://git.infradead.org/users/vkoul/slave-dma: (111 commits) Revert "ARM: SPEAr13xx: Pass DW DMAC platform data from DT" ARM: dts: pl330: Add #dma-cells for generic dma binding support DMA: PL330: Register the DMA controller with the generic DMA helpers DMA: PL330: Add xlate function DMA: PL330: Add new pl330 filter for DT case. dma: tegra20-apb-dma: remove unnecessary assignment edma: do not waste memory for dma_mask dma: coh901318: set residue only if dma is in progress dma: coh901318: avoid unbalanced locking dmaengine.h: remove redundant else keyword dma: of-dma: protect list write operation by spin_lock dmaengine: ste_dma40: do not remove descriptors for cyclic transfers dma: of-dma.c: fix memory leakage dw_dmac: apply default dma_mask if needed dmaengine: ioat - fix spare sparse complain dmaengine: move drivers/of/dma.c -> drivers/dma/of-dma.c ioatdma: fix race between updating ioat->head and IOAT_COMPLETION_PENDING dw_dmac: add support for Lynxpoint DMA controllers dw_dmac: return proper residue value dw_dmac: fill individual length of descriptor ...
2013-02-25Revert "ip_gre: propogate target device GSO capability to the tunnel device"Pravin B Shelar
This reverts commit eb6b9a8cad65e820b145547844b108117cece3a0. Above commit limits GSO capability of gre device to just TSO, but software GRE-GSO is capable of handling all GSO capabilities. This patch also fixes following panic which reverted commit introduced:- BUG: unable to handle kernel NULL pointer dereference at 00000000000000a2 IP: [<ffffffffa0680fd1>] ipgre_tunnel_bind_dev+0x161/0x1f0 [ip_gre] PGD 42bc19067 PUD 42bca9067 PMD 0 Oops: 0000 [#1] SMP Pid: 2636, comm: ip Tainted: GF 3.8.0+ #83 Dell Inc. PowerEdge R620/0KCKR5 RIP: 0010:[<ffffffffa0680fd1>] [<ffffffffa0680fd1>] ipgre_tunnel_bind_dev+0x161/0x1f0 [ip_gre] RSP: 0018:ffff88042bfcb708 EFLAGS: 00010246 RAX: 00000000000005b6 RBX: ffff88042d2fa000 RCX: 0000000000000044 RDX: 0000000000000018 RSI: 0000000000000078 RDI: 0000000000000060 RBP: ffff88042bfcb748 R08: 0000000000000018 R09: 000000000000000c R10: 0000000000000020 R11: 000000000101010a R12: ffff88042d2fa800 R13: 0000000000000000 R14: ffff88042d2fa800 R15: ffff88042cd7f650 FS: 00007fa784f55700(0000) GS:ffff88043fd20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a2 CR3: 000000042d8b9000 CR4: 00000000000407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ip (pid: 2636, threadinfo ffff88042bfca000, task ffff88042d142a80) Stack: 0000000100000000 002f000000000000 0a01010100000000 000000000b010101 ffff88042d2fa800 ffff88042d2fa000 ffff88042bfcb858 ffff88042f418c00 ffff88042bfcb798 ffffffffa068199a ffff88042bfcb798 ffff88042d2fa830 Call Trace: [<ffffffffa068199a>] ipgre_newlink+0xca/0x160 [ip_gre] [<ffffffff8143b692>] rtnl_newlink+0x532/0x5f0 [<ffffffff8143b2fc>] ? rtnl_newlink+0x19c/0x5f0 [<ffffffff81438978>] rtnetlink_rcv_msg+0x2c8/0x340 [<ffffffff814386b0>] ? rtnetlink_rcv+0x40/0x40 [<ffffffff814560f9>] netlink_rcv_skb+0xa9/0xd0 [<ffffffff81438695>] rtnetlink_rcv+0x25/0x40 [<ffffffff81455ddc>] netlink_unicast+0x1ac/0x230 [<ffffffff81456a45>] netlink_sendmsg+0x265/0x380 [<ffffffff814138c0>] sock_sendmsg+0xb0/0xe0 [<ffffffff8141141e>] ? move_addr_to_kernel+0x4e/0x90 [<ffffffff81420445>] ? verify_iovec+0x85/0xf0 [<ffffffff81414ffd>] __sys_sendmsg+0x3fd/0x420 [<ffffffff8114b701>] ? handle_mm_fault+0x251/0x3b0 [<ffffffff8114f39f>] ? vma_link+0xcf/0xe0 [<ffffffff81415239>] sys_sendmsg+0x49/0x90 [<ffffffff814ffd19>] system_call_fastpath+0x16/0x1b CC: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-25IP_GRE: Fix GRE_CSUM case.Pravin B Shelar
commit "ip_gre: allow CSUM capable devices to handle packets" aa0e51cdda005cd37e2, broke GRE_CSUM case. GRE_CSUM needs checksum computed for inner packet. Therefore csum-calculation can not be offloaded if tunnel device requires GRE_CSUM. Following patch fixes it by computing inner packet checksum for GRE_CSUM type, for all other type of GRE devices csum is offloaded. CC: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-25IP_GRE: Fix IP-Identification.Pravin B Shelar
GRE-GSO generates ip fragments with id 0,2,3,4... for every GSO packet, which is not correct. Following patch fixes it by setting ip-header id unique id of fragments are allowed. As Eric Dumazet suggested it is optimized by using inner ip-header whenever inner packet is ipv4. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-22new helper: file_inode(file)Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22tcp: fix SYN-data space mis-accountingYuchung Cheng