linux/net/ipv4, branch v3.2.38

tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation

2013-01-16T01:13:27Z

[ Upstream commit 354e4aa391ed50a4d827ff6fc11e0667d0859b25 ] RFC 5961 5.2 [Blind Data Injection Attack].[Mitigation] All TCP stacks MAY implement the following mitigation. TCP stacks that implement this mitigation MUST add an additional input check to any incoming segment. The ACK value is considered acceptable only if it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <= SND.NXT). All incoming segments whose ACK value doesn't satisfy the above condition MUST be discarded and an ACK sent back. Move tcp_send_challenge_ack() before tcp_ack() to avoid a forward declaration. Signed-off-by: Eric Dumazet Cc: Neal Cardwell Cc: Yuchung Cheng Cc: Jerry Chu Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings

tcp: tcp_replace_ts_recent() should not be called from tcp_validate_incoming()

2013-01-16T01:13:26Z

[ Upstream commit bd090dfc634ddd711a5fbd0cadc6e0ab4977bcaf ] We added support for RFC 5961 in latest kernels but TCP fails to perform exhaustive check of ACK sequence. We can update our view of peer tsval from a frame that is later discarded by tcp_ack() This makes timestamps enabled sessions vulnerable to injection of a high tsval : peers start an ACK storm, since the victim sends a dupack each time it receives an ACK from the other peer. As tcp_validate_incoming() is called before tcp_ack(), we should not peform tcp_replace_ts_recent() from it, and let callers do it at the right time. Signed-off-by: Eric Dumazet Cc: Neal Cardwell Cc: Yuchung Cheng Cc: Nandita Dukkipati Cc: H.K. Jerry Chu Cc: Romain Francoise Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings

tcp: refine SYN handling in tcp_validate_incoming

2013-01-16T01:13:26Z

[ Upstream commit e371589917011efe6ff8c7dfb4e9e81934ac5855 ] Followup of commit 0c24604b68fc (tcp: implement RFC 5961 4.2) As reported by Vijay Subramanian, we should send a challenge ACK instead of a dup ack if a SYN flag is set on a packet received out of window. This permits the ratelimiting to work as intended, and to increase correct SNMP counters. Suggested-by: Vijay Subramanian Signed-off-by: Eric Dumazet Acked-by: Vijay Subramanian Cc: Kiran Kumar Kella Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings

tcp: implement RFC 5961 4.2

2013-01-16T01:13:25Z

[ Upstream commit 0c24604b68fc7810d429d6c3657b6f148270e528 ] Implement the RFC 5691 mitigation against Blind Reset attack using SYN bit. Section 4.2 of RFC 5961 advises to send a Challenge ACK and drop incoming packet, instead of resetting the session. Add a new SNMP counter to count number of challenge acks sent in response to SYN packets. (netstat -s | grep TCPSYNChallenge) Remove obsolete TCPAbortOnSyn, since we no longer abort a TCP session because of a SYN flag. Signed-off-by: Eric Dumazet Cc: Kiran Kumar Kella Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings

tcp: implement RFC 5961 3.2

2013-01-16T01:13:25Z

[ Upstream commit 282f23c6ee343126156dd41218b22ece96d747e3 ] Implement the RFC 5691 mitigation against Blind Reset attack using RST bit. Idea is to validate incoming RST sequence, to match RCV.NXT value, instead of previouly accepted window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND) If sequence is in window but not an exact match, send a "challenge ACK", so that the other part can resend an RST with the appropriate sequence. Add a new sysctl, tcp_challenge_ack_limit, to limit number of challenge ACK sent per second. Add a new SNMP counter to count number of challenge acks sent. (netstat -s | grep TCPChallengeACK) Signed-off-by: Eric Dumazet Cc: Kiran Kumar Kella Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings

inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock

2013-01-16T01:13:24Z

[ Upstream commit e337e24d6624e74a558aa69071e112a65f7b5758 ] If in either of the above functions inet_csk_route_child_sock() or __inet_inherit_port() fails, the newsk will not be freed: unreferenced object 0xffff88022e8a92c0 (size 1592): comm "softirq", pid 0, jiffies 4294946244 (age 726.160s) hex dump (first 32 bytes): 0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00 ................ 02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [] kmemleak_alloc+0x21/0x3e [] kmem_cache_alloc+0xb5/0xc5 [] sk_prot_alloc.isra.53+0x2b/0xcd [] sk_clone_lock+0x16/0x21e [] inet_csk_clone_lock+0x10/0x7b [] tcp_create_openreq_child+0x21/0x481 [] tcp_v4_syn_recv_sock+0x3a/0x23b [] tcp_check_req+0x29f/0x416 [] tcp_v4_do_rcv+0x161/0x2bc [] tcp_v4_rcv+0x6c9/0x701 [] ip_local_deliver_finish+0x70/0xc4 [] ip_local_deliver+0x4e/0x7f [] ip_rcv_finish+0x1fc/0x233 [] ip_rcv+0x217/0x267 [] __netif_receive_skb+0x49e/0x553 [] netif_receive_skb+0x50/0x82 This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus a single sock_put() is not enough to free the memory. Additionally, things like xfrm, memcg, cookie_values,... may have been initialized. We have to free them properly. This is fixed by forcing a call to tcp_done(), ending up in inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary, because it ends up doing all the cleanup on xfrm, memcg, cookie_values, xfrm,... Before calling tcp_done, we have to set the socket to SOCK_DEAD, to force it entering inet_csk_destroy_sock. To avoid the warning in inet_csk_destroy_sock, inet_num has to be set to 0. As inet_csk_destroy_sock does a dec on orphan_count, we first have to increase it. Calling tcp_done() allows us to remove the calls to tcp_clear_xmit_timer() and tcp_cleanup_congestion_control(). A similar approach is taken for dccp by calling dccp_done(). This is in the kernel since 093d282321 (tproxy: fix hash locking issue when using port redirection in __inet_inherit_port()), thus since version >= 2.6.37. Signed-off-by: Christoph Paasch Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings

ipv4: ip_check_defrag must not modify skb before unsharing

2013-01-03T03:33:52Z

[ Upstream commit 1bf3751ec90cc3174e01f0d701e8449ce163d113 ] ip_check_defrag() might be called from af_packet within the RX path where shared SKBs are used, so it must not modify the input SKB before it has unshared it for defragmentation. Use skb_copy_bits() to get the IP header and only pull in everything later. The same is true for the other caller in macvlan as it is called from dev->rx_handler which can also get a shared SKB. Reported-by: Eric Leblond Signed-off-by: Johannes Berg Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings

ipv4: avoid undefined behavior in do_ip_setsockopt()

2012-12-06T11:20:11Z

[ Upstream commit 0c9f79be295c99ac7e4b569ca493d75fdcc19e4e ] (1< Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings

netfilter: nf_nat: don't check for port change on ICMP tuples

2012-12-06T11:20:10Z

commit 38fe36a248ec3228f8e6507955d7ceb0432d2000 upstream. ICMP tuples have id in src and type/code in dst. So comparing src.u.all with dst.u.all will always fail here and ip_xfrm_me_harder() is called for every ICMP packet, even if there was no NAT. Signed-off-by: Ulrich Weber [Pablo Neira Ayuso: Backported to.3.0] Signed-off-by: Pablo Neira Ayuso Signed-off-by: Ben Hutchings

net: fix divide by zero in tcp algorithm illinois

2012-11-16T16:47:16Z

[ Upstream commit 8f363b77ee4fbf7c3bbcf5ec2c5ca482d396d664 ] Reading TCP stats when using TCP Illinois congestion control algorithm can cause a divide by zero kernel oops. The division by zero occur in tcp_illinois_info() at: do_div(t, ca->cnt_rtt); where ca->cnt_rtt can become zero (when rtt_reset is called) Steps to Reproduce: 1. Register tcp_illinois: # sysctl -w net.ipv4.tcp_congestion_control=illinois 2. Monitor internal TCP information via command "ss -i" # watch -d ss -i 3. Establish new TCP conn to machine Either it fails at the initial conn, or else it needs to wait for a loss or a reset. This is only related to reading stats. The function avg_delay() also performs the same divide, but is guarded with a (ca->cnt_rtt > 0) at its calling point in update_params(). Thus, simply fix tcp_illinois_info(). Function tcp_illinois_info() / get_info() is called without socket lock. Thus, eliminate any race condition on ca->cnt_rtt by using a local stack variable. Simply reuse info.tcpv_rttcnt, as its already set to ca->cnt_rtt. Function avg_delay() is not affected by this race condition, as its called with the socket lock. Cc: Petr Matousek Signed-off-by: Jesper Dangaard Brouer Acked-by: Eric Dumazet Acked-by: Stephen Hemminger Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings