aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2012-04-28net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skbBenjamin LaHaise
This is the first step in reworking the IPv6 UDP code to be structured more like the IPv4 UDP code. This patch creates __udpv6_queue_rcv_skb() with the equivalent sematics to __udp_queue_rcv_skb(), and wires it up to the backlog_rcv method. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28net: Fixed a coding style issue related to spaces.Jeffrin Jose
Fixed a coding style issue relating to spaces in net/core/sock.c Signed-off-by: Jeffrin Jose <ahiliation@yahoo.co.in> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27net: cleanups in sock_setsockopt()Eric Dumazet
Use min_t()/max_t() macros, reformat two comments, use !!test_bit() to match !!sock_flag() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27crush: include header for global symbolshartleys
Include the header to pickup the definitions of the global symbols. Quiets the following sparse warnings: warning: symbol 'crush_find_rule' was not declared. Should it be static? warning: symbol 'crush_do_rule' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Sage Weil <sage@newdream.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizingEric Dumazet
Quoting Tore Anderson from : https://bugzilla.kernel.org/show_bug.cgi?id=42572 When RTAX_FEATURE_ALLFRAG is set on a route, the effective TCP segment size does not take into account the size of the IPv6 Fragmentation header that needs to be included in outbound packets, causing every transmitted TCP segment to be fragmented across two IPv6 packets, the latter of which will only contain 8 bytes of actual payload. RTAX_FEATURE_ALLFRAG is typically set on a route in response to receving a ICMPv6 Packet Too Big message indicating a Path MTU of less than 1280 bytes. 1280 bytes is the minimum IPv6 MTU, however ICMPv6 PTBs with MTU < 1280 are still valid, in particular when an IPv6 packet is sent to an IPv4 destination through a stateless translator. Any ICMPv4 Need To Fragment packets originated from the IPv4 part of the path will be translated to ICMPv6 PTB which may then indicate an MTU of less than 1280. The Linux kernel refuses to reduce the effective MTU to anything below 1280 bytes, instead it sets it to exactly 1280 bytes, and RTAX_FEATURE_ALLFRAG is also set. However, the TCP segment size appears to be set to 1240 bytes (1280 Path MTU - 40 bytes of IPv6 header), instead of 1232 (additionally taking into account the 8 bytes required by the IPv6 Fragmentation extension header). This in turn results in rather inefficient transmission, as every transmitted TCP segment now is split in two fragments containing 1232+8 bytes of payload. After this patch, all the outgoing packets that includes a Fragmentation header all are "atomic" or "non-fragmented" fragments, i.e., they both have Offset=0 and More Fragments=0. With help from David S. Miller Reported-by: Tore Anderson <tore@fud.no> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Tom Herbert <therbert@google.com> Tested-by: Tore Anderson <tore@fud.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-26Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
2012-04-26Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem Conflicts: drivers/net/wireless/iwlwifi/iwl-testmode.c
2012-04-26tcp repair: Fix unaligned access when repairing options (v2)Pavel Emelyanov
Don't pick __u8/__u16 values directly from raw pointers, but instead use an array of structures of code:value pairs. This is OK, since the buffer we take options from is not an skb memory, but a user-to-kernel one. For those options which don't require any value now, require this to be zero (for potential future extension of this API). v2: Changed tcp_repair_opt to use two __u32-s as spotted by David Laight. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-266lowpan: duplicate definition of IEEE802154_ALENalex.bluesman.smirnov@gmail.com
The same macros is defined in 'include/net/af_ieee802154.h' and is called IEEE802154_ADDR_LEN. No need another one, so remove it. Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-266lowpan: move frame allocation code to a separate functionalex.bluesman.smirnov@gmail.com
Separate frame allocation routine from data processing function. This makes code more human readable and easier for understanding. Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-25net: sock_diag_handler structs can be constShan Wei
read only, so change it to const. Signed-off-by: Shan Wei <davidshan@tencent.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-25ipv6: call consume_skb() in frag/reassemblyEric Dumazet
Some kfree_skb() calls should be replaced by consume_skb() to avoid drop_monitor/dropwatch false positives. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-25net: dcb: add CEE notify callsJohn Fastabend
This adds code to trigger CEE events when an APP change or setall command is made from user space. This simplifies user space code significantly by creating a single interface to listen on that works with both firmware and userland agents. And if we end up with multiple agents this keeps every thing in sync userland agents, firmware agents, and kernel notifier consumers. For an example agent that listens for these events see: https://github.com/jrfastab/cgdcbxd cgdcbxd is a daemon used to monitor DCB netlink events and manage the net_prio control group sub-system. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Shmulik Ravid <shmulikr@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24mac80211: Adds clean sdata helperAndrei Emeltchenko
Adds hepler to clean sdata ieee80211_clean_sdata similar way as ieee80211_setup_sdata is implemented. The function will be used by other interfaces later. Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-24mac80211: fix num_mcast_sta counting issuesFelix Fietkau
Moving a STA to an AP VLAN prevents num_mcast_sta from being decremented once the STA leaves, because sta->sdata changes. Fix this by checking for AP VLANs as well. Also exclude 4-addr VLAN stations from num_mcast_sta - remote 4-addr stations ignore 3-address multicast frames anyway. In a typical bridge configuration they receive the same packets as 4-address unicast. This patch also fixes clearing the sdata->u.vlan.sta pointer when the STA is removed from a 4-addr VLAN. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-24mac80211: rename AP variable num_sta_authorized to num_mcast_staFelix Fietkau
It is only used to test for BSS multicast receivers. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-24mac80211: check for non-managed interfaceWey-Yi Guy
Average beacon signal only keep tracked by managed interface, give warning and return 0 for the others. Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-24tipc: remove inline instances from C source files.Paul Gortmaker
Untie gcc's hands and let it do what it wants within the individual source files. There are two files, node.c and port.c -- only the latter effectively changes (gcc-4.5.2). Objdump shows gcc deciding to not inline port_peernode(). Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24af_netlink: drop_monitor/dropwatch friendlyEric Dumazet
Need to consume_skb() instead of kfree_skb() in netlink_dump() and netlink_unicast_kernel() to avoid false dropwatch positives. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24af_netlink: cleanupsEric Dumazet
netlink_destroy_callback() move to avoid forward reference CodingStyle cleanups Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24net: skb_can_coalesce returns a booleanEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23tcp: tcp_try_coalesce returns a booleanEric Dumazet
This clarifies code intention, as suggested by David. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23net: make spd_fill_page() linear argument a boolEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Fix merge between commit 3adadc08cc1e ("net ax25: Reorder ax25_exit to remove races") and commit 0ca7a4c87d27 ("net ax25: Simplify and cleanup the ax25 sysctl handling") The former moved around the sysctl register/unregister calls, the later simply removed them. With help from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23net: Use bool and remove inline in skb_splice_bits() code.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23net: speedup skb_splice_bits()Eric Dumazet
Commit 35f3d14db (pipe: add support for shrinking and growing pipes) added a slowdown for splice(socket -> pipe), as we might grow the spd used in skb_splice_bits() for each skb we process in splice() syscall. Its not needed since skb lengths are capped. The default on-stack arrays are more than enough. Use MAX_SKB_FRAGS instead of PIPE_DEF_BUFFERS to describe the reasonable limit per skb. Add coalescing support to help splicing of GRO skbs built from linear skbs (linked into frag_list) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23tcp: introduce tcp_try_coalesceEric Dumazet
commit c8628155ece3 (tcp: reduce out_of_order memory use) took care of coalescing tcp segments provided by legacy devices (linear skbs) We extend this idea to fragged skbs, as their truesize can be heavy. ixgbe for example uses 256+1024+PAGE_SIZE/2 = 3328 bytes per segment. Use this coalescing strategy for receive queue too. This contributes to reduce number of tcp collapses, at minimal cost, and reduces memory overhead and packets drops. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23tcp: sk_add_backlog() is too agressive for TCPEric Dumazet
While investigating TCP performance problems on 10Gb+ links, we found a tcp sender was dropping lot of incoming ACKS because of sk_rcvbuf limit in sk_add_backlog(), especially if receiver doesnt use GRO/LRO and sends one ACK every two MSS segments. A sender usually tweaks sk_sndbuf, but sk_rcvbuf stays at its default value (87380), allowing a too small backlog. A TCP ACK, even being small, can consume nearly same truesize space than outgoing packets. Using sk_rcvbuf + sk_sndbuf as a limit makes sense and is fast to compute. Performance results on netperf, single flow, receiver with disabled GRO/LRO : 7500 Mbits instead of 6050 Mbits, no more TCPBacklogDrop increments at sender. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23net: add a limit parameter to sk_add_backlog()Eric Dumazet
sk_add_backlog() & sk_rcvqueues_full() hard coded sk_rcvbuf as the memory limit. We need to make this limit a parameter for TCP use. No functional change expected in this patch, all callers still using the old sk_rcvbuf limit. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23cfg80211: Validate legacy rateset.Bala Shanmugam
Legacy rates are not validated while configuring tx rateset using iw. So below cmd is accepted by nl80211. sudo iw wlan2 set bitrates legacy-2.4 1 2 3 Validate legacy rates and return error if any rate in the rateset is not valid. Signed-off-by: Bala Shanmugam <bkamatch@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23mac80211: declare ieee80211_ave_rssi as EXPORTWey-Yi Guy
ieee80211_ave_rssi need to be declare as export for driver to use it. Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23mac80211: fixup for mesh TSF adjustment latency in Toffset setpointJavier Cardona
The original patch defined the correction margin but did not apply it. Signed-off-by: Shinichi Hotori <hotorinn@gmail.com> Signed-off-by: Yu Niiro <yu.niiro@gmail.com> Signed-off-by: Javier Cardona <javier@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23mac80211: fix STA channel width fieldThomas Pedersen
According to IEEE 802.11 8.4.2.59, set the "STA channel width" bit to 0 if transmitting STA is using a 20mhz channel. Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23mac80211: don't set mesh peer ht caps if ht disabledThomas Pedersen
Blindly setting ht caps on a mesh peer's station entry would result in MCS rates being used by the rate control algorithm even if no ht had been configured. Fix this by checking the channel type before assigning ht capabilites. Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23mac80211: refactor mesh peer rate handlingThomas Pedersen
To avoid passing supp_rates and basic_rates around all the time, just derive these when needed in mesh_matches_local() and mesh_peer_init(). Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23mac80211: refactor mesh peer initializationThomas Pedersen
This patch unifies the previous two paths toward mesh peer creation a bit. It also fixes a bug where a peer's changing rates or HT mode wouldn't register on leaving and then returning to the mesh with a sta entry still present. Also clean up locking and clear possibly stale ht cap. Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23mac80211: Support on-channel scan option.Ben Greear
This based on an idea posted by Stanislaw Gruszka, though I accept full blame for the implementation! This has been tested with ath9k. The idea is to let users scan on the current operating channel without interrupting normal traffic more than absolutely necessary (changing power level might reset some hardware, for instance). Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23tcp: Fix build warning after tcp_{v4,v6}_init_sock consolidation.David S. Miller
net/ipv4/tcp_ipv4.c: In function 'tcp_v4_init_sock': net/ipv4/tcp_ipv4.c:1891:19: warning: unused variable 'tp' [-Wunused-variable] net/ipv6/tcp_ipv6.c: In function 'tcp_v6_init_sock': net/ipv6/tcp_ipv6.c:1836:19: warning: unused variable 'tp' [-Wunused-variable] Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-22tcp: fix TCP_MAXSEG for established IPv6 passive socketsNeal Cardwell
Commit f5fff5d forgot to fix TCP_MAXSEG behavior IPv6 sockets, so IPv6 TCP server sockets that used TCP_MAXSEG would find that the advmss of child sockets would be incorrect. This commit mirrors the advmss logic from tcp_v4_syn_recv_sock in tcp_v6_syn_recv_sock. Eventually this logic should probably be shared between IPv4 and IPv6, but this at least fixes this issue. Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21af_packet: packet_getsockopt() cleanupEric Dumazet
Factorize code, since most fetched values are int type. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21tcp: move duplicate code from tcp_v4_init_sock()/tcp_v6_init_sock()Neal Cardwell
This commit moves the (substantial) common code shared between tcp_v4_init_sock() and tcp_v6_init_sock() to a new address-family independent function, tcp_init_sock(). Centralizing this functionality should help avoid drift issues, e.g. where the IPv4 side is updated without a corresponding update to IPv6. There was already some drift: IPv4 initialized snd_cwnd to TCP_INIT_CWND, while the IPv6 side was still initializing snd_cwnd to 2 (in this case it should not matter, since snd_cwnd is also initialized in tcp_init_metrics(), but the general risks and maintenance overhead remain). When diffing the old and new code, note that new tcp_init_sock() function uses the order of steps from the tcp_v4_init_sock() implementation (the order is slightly different in tcp_v6_init_sock()). Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21net: allow better page reuse in splice(sock -> pipe)Eric Dumazet
splice() from socket to pipe needs linear_to_page() helper to transfert skb header to part of page. We can reset the offset in the current sk->sk_sndmsg_page if we are the last user of the page. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21drop_monitor: allow more events per secondEric Dumazet
It seems there is a logic error in trace_drop_common(), since we store only 64 drops, even if they are from same location. This fix is a one liner, but we probably need more work to avoid useless atomic dec/inc Now I can watch 1 Mpps drops through dropwatch... Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neil Horman <nhorman@tuxdriver.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21net: change big iov allocationsEric Dumazet
iov of more than 8 entries are allocated in sendmsg()/recvmsg() through sock_kmalloc() As these allocations are temporary only and small enough, it makes sense to use plain kmalloc() and avoid sk_omem_alloc atomic overhead. Slightly changed fast path to be even faster. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mike Waychison <mikew@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21tcp: Repair connection-time negotiated parametersPavel Emelyanov
There are options, which are set up on a socket while performing TCP handshake. Need to resurrect them on a socket while repairing. A new sockoption accepts a buffer and parses it. The buffer should be CODE:VALUE sequence of bytes, where CODE is standard option code and VALUE is the respective value. Only 4 options should be handled on repaired socket. To read 3 out of 4 of these options the TCP_INFO sockoption can be used. An ability to get the last one (the mss_clamp) was added by the previous patch. Now the restore. Three of these options -- timestamp_ok, mss_clamp and snd_wscale -- are just restored on a coket. The sack_ok flags has 2 issues. First, whether or not to do sacks at all. This flag is just read and set back. No other sack info is saved or restored, since according to the standart and the code dropping all sack-ed segments is OK, the sender will resubmit them again, so after the repair we will probably experience a pause in connection. Next, the fack bit. It's just set back on a socket if the respective sysctl is set. No collected stats about packets flow is preserved. As far as I see (plz, correct me if I'm wrong) the fack-based congestion algorithm survives dropping all of the stats and repairs itself eventually, probably losing the performance for that period. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21tcp: Report mss_clamp with TCP_MAXSEG option in repair modePavel Emelyanov
The mss_clamp is the only connection-time negotiated option which cannot be obtained from the user space. Make the TCP_MAXSEG sockopt report one in the repair mode. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21tcp: Repair socket queuesPavel Emelyanov
Reading queues under repair mode is done with recvmsg call. The queue-under-repair set by TCP_REPAIR_QUEUE option is used to determine which queue should be read. Thus both send and receive queue can be read with this. Caller must pass the MSG_PEEK flag. Writing to queues is done with sendmsg call and yet again -- the repair-queue option can be used to push data into the receive queue. When putting an skb into receive queue a zero tcp header is appented to its head to address the tcp_hdr(skb)->syn and the ->fin checks by the (after repair) tcp_recvmsg. These flags flags are both set to zero and that's why. The fin cannot be met in the queue while reading the source socket, since the repair only works for closed/established sockets and queueing fin packet always changes its state. The syn in the queue denotes that the respective skb's seq is "off-by-one" as compared to the actual payload lenght. Thus, at the rcv queue refill we can just drop this flag and set the skb's sequences to precice values. When the repair mode is turned off, the write queue seqs are updated so that the whole queue is considered to be 'already sent, waiting for ACKs' (write_seq = snd_nxt <= snd_una). From the protocol POV the send queue looks like it was sent, but the data between the write_seq and snd_nxt is lost in the network. This helps to avoid another sockoption for setting the snd_nxt sequence. Leaving the whole queue in a 'not yet sent' state (as it will be after sendmsg-s) will not allow to receive any acks from the peer since the ack_seq will be after the snd_nxt. Thus even the ack for the window probe will be dropped and the connection will be 'locked' with the zero peer window. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21tcp: Initial repair modePavel Emelyanov
This includes (according the the previous description): * TCP_REPAIR sockoption This one just puts the socket in/out of the repair mode. Allowed for CAP_NET_ADMIN and for closed/establised sockets only. When repair mode is turned off and the socket happens to be in the established state the window probe is sent to the peer to 'unlock' the connection. * TCP_REPAIR_QUEUE sockoption This one sets the queue which we're about to repair. The 'no-queue' is set by default. * TCP_QUEUE_SEQ socoption Sets the write_seq/rcv_nxt of a selected repaired queue. Allowed for TCP_CLOSE-d sockets only. When the socket changes its state the other seq-s are changed by the kernel according to the protocol rules (most of the existing code is actually reused). * Ability to forcibly bind a socket to a port The sk->sk_reuse is set to SK_FORCE_REUSE. * Immediate connect modification The connect syscall initializes the connection, then directly jumps to the code which finalizes it. * Silent close modification The close just aborts the connection (similar to SO_LINGER with 0 time) but without sending any FIN/RST-s to peer. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21tcp: Move code aroundPavel Emelyanov
This is just the preparation patch, which makes the needed for TCP repair code ready for use. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21sock: Introduce named constants for sk_reusePavel Emelyanov
Name them in a "backward compatible" manner, i.e. reuse or not are still 1 and 0 respectively. The reuse value of 2 means that the socket with it will forcibly reuse everyone else's port. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>