aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2010-09-02net: arp: code cleanupChangli Gao
Clean the code up according to Documentation/CodingStyle. Don't initialize the variable dont_send in arp_process(). Remove the temporary varialbe flags in arp_state_to_flags(). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-01tcp: update also tcp_output with regard to RFC 5681Gerrit Renker
Thanks to Ilpo Jarvinen, this updates also the initial window setting for tcp_output with regard to RFC 5681. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-01net: tunnels should use rcu_dereferenceEric Dumazet
tunnel4_handlers, tunnel64_handlers, tunnel6_handlers and tunnel46_handlers are protected by RCU, but we dont use appropriate rcu primitives to scan them. rcu_lock() is already held by caller. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-31gro: unexport tcp4_gro_receive and tcp4_gro_completeEric Dumazet
tcp4_gro_receive() and tcp4_gro_complete() dont need to be exported. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-30net: struct xfrm_tunnel in read_mostly sectionEric Dumazet
tunnel4_handlers chain being scanned for each incoming packet, make sure it doesnt share an often dirtied cache line. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-30tcp/dccp: Consolidate common code for RFC 3390 conversionGerrit Renker
This patch consolidates initial-window code common to TCP and CCID-2: * TCP uses RFC 3390 in a packet-oriented manner (tcp_input.c) and * CCID-2 uses RFC 3390 in packet-oriented manner (RFC 4341). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-30tcp: Add TCP_USER_TIMEOUT socket option.Jerry Chu
This patch provides a "user timeout" support as described in RFC793. The socket option is also needed for the the local half of RFC5482 "TCP User Timeout Option". TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int, when > 0, to specify the maximum amount of time in ms that transmitted data may remain unacknowledged before TCP will forcefully close the corresponding connection and return ETIMEDOUT to the application. If 0 is given, TCP will continue to use the system default. Increasing the user timeouts allows a TCP connection to survive extended periods without end-to-end connectivity. Decreasing the user timeouts allows applications to "fail fast" if so desired. Otherwise it may take upto 20 minutes with the current system defaults in a normal WAN environment. The socket option can be made during any state of a TCP connection, but is only effective during the synchronized states of a connection (ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, or LAST-ACK). Moreover, when used with the TCP keepalive (SO_KEEPALIVE) option, TCP_USER_TIMEOUT will overtake keepalive to determine when to close a connection due to keepalive failure. The option does not change in anyway when TCP retransmits a packet, nor when a keepalive probe will be sent. This option, like many others, will be inherited by an acceptor from its listener. Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-24net: ip_append_data() optimEric Dumazet
Compiler is not smart enough to avoid a conditional branch. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-23net: Rename skb_has_frags to skb_has_frag_listDavid S. Miller
SKBs can be "fragmented" in two ways, via a page array (called skb_shinfo(skb)->frags[]) and via a list of SKBs (called skb_shinfo(skb)->frag_list). Since skb_has_frags() tests the latter, it's name is confusing since it sounds more like it's testing the former. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-22tcp: allow effective reduction of TCP's rcv-buffer via setsockoptHagen Paul Pfeifer
Via setsockopt it is possible to reduce the socket RX buffer (SO_RCVBUF). TCP method to select the initial window and window scaling option in tcp_select_initial_window() currently misbehaves and do not consider a reduced RX socket buffer via setsockopt. Even though the server's RX buffer is reduced via setsockopt() to 256 byte (Initial Window 384 byte => 256 * 2 - (256 * 2 / 4)) the window scale option is still 7: 192.168.1.38.40676 > 78.47.222.210.5001: Flags [S], seq 2577214362, win 5840, options [mss 1460,sackOK,TS val 338417 ecr 0,nop,wscale 0], length 0 78.47.222.210.5001 > 192.168.1.38.40676: Flags [S.], seq 1570631029, ack 2577214363, win 384, options [mss 1452,sackOK,TS val 2435248895 ecr 338417,nop,wscale 7], length 0 192.168.1.38.40676 > 78.47.222.210.5001: Flags [.], ack 1, win 5840, options [nop,nop,TS val 338421 ecr 2435248895], length 0 Within tcp_select_initial_window() the original space argument - a representation of the rx buffer size - is expanded during tcp_select_initial_window(). Only sysctl_tcp_rmem[2], sysctl_rmem_max and window_clamp are considered to calculate the initial window. This patch adjust the window_clamp argument if the user explicitly reduce the receive buffer. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Cc: David S. Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-21Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2010-08-21PPTP: PPP over IPv4 (Point-to-Point Tunneling Protocol)Dmitry Kozlov
PPP: introduce "pptp" module which implements point-to-point tunneling protocol using pppox framework NET: introduce the "gre" module for demultiplexing GRE packets on version criteria (required to pptp and ip_gre may coexists) NET: ip_gre: update to use the "gre" module This patch introduces then pptp support to the linux kernel which dramatically speeds up pptp vpn connections and decreases cpu usage in comparison of existing user-space implementation (poptop/pptpclient). There is accel-pptp project (https://sourceforge.net/projects/accel-pptp/) to utilize this module, it contains plugin for pppd to use pptp in client-mode and modified pptpd (poptop) to build high-performance pptp NAS. There was many changes from initial submitted patch, most important are: 1. using rcu instead of read-write locks 2. using static bitmap instead of dynamically allocated 3. using vmalloc for memory allocation instead of BITS_PER_LONG + __get_free_pages 4. fixed many coding style issues Thanks to Eric Dumazet. Signed-off-by: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-20net: build_ehash_secret() and rt_bind_peer() cleanupsEric Dumazet
Now cmpxchg() is available on all arches, we can use it in build_ehash_secret() and rt_bind_peer() instead of using spinlocks. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-19netfilter: ipt_CLUSTERIP: use proto_ports_offset() to support AH messageChangli Gao
Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-19net: simplify flags for tx timestampingOliver Hartkopp
This patch removes the abstraction introduced by the union skb_shared_tx in the shared skb data. The access of the different union elements at several places led to some confusion about accessing the shared tx_flags e.g. in skb_orphan_try(). http://marc.info/?l=linux-netdev&m=128084897415886&w=2 Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-17netfilter: {ip,ip6,arp}_tables: avoid lockdep false positiveEric Dumazet
After commit 24b36f019 (netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary), lockdep can raise a warning because we attempt to lock a spinlock with BH enabled, while the same lock is usually locked by another cpu in a softirq context. Disable again BH to avoid these lockdep warnings. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Diagnosed-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-07tcp: no md5sig option size check bugDmitry Popov
tcp_parse_md5sig_option doesn't check md5sig option (TCPOPT_MD5SIG) length, but tcp_v[46]_inbound_md5_hash assume that it's at least 16 bytes long. Signed-off-by: Dmitry Popov <dp@highloadlab.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-02Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/e1000e/hw.h net/bridge/br_device.c net/bridge/br_input.c
2010-08-02ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twiceChangli Gao
6c79bf0f2440fd250c8fce8d9b82fcf03d4e8350 subtracts PPPOE_SES_HLEN from mtu at the front of ip_fragment(). So the later subtraction should be removed. The MTU of 802.1q is also 1500, so MTU should not be changed. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Bart De Schuymer <bdschuym@pandora.bo> ---- net/ipv4/ip_output.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) Signed-off-by: Bart De Schuymer <bdschuym@pandora.bo> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-02net: Add getsockopt support for TCP thin-streamsJosh Hunt
Initial TCP thin-stream commit did not add getsockopt support for the new socket options: TCP_THIN_LINEAR_TIMEOUTS and TCP_THIN_DUPACK. This adds support for them. Signed-off-by: Josh Hunt <johunt@akamai.com> Tested-by: Andreas Petlund <apetlund@simula.no> Acked-by: Andreas Petlund <apetlund@simula.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-02Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2010-08-02netfilter: nf_nat: don't check if the tuple is unique when there isn't any ↵Changli Gao
other choice The tuple got from unique_tuple() doesn't need to be really unique, so the check for the unique tuple isn't necessary, when there isn't any other choice. Eliminating the unnecessary nf_nat_used_tuple() can save some CPU cycles too. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-08-02netfilter: nf_nat: make unique_tuple return voidChangli Gao
The only user of unique_tuple() get_unique_tuple() doesn't care about the return value of unique_tuple(), so make unique_tuple() return void (nothing). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-08-02netfilter: nf_nat: use local variable hdrlenChangli Gao
Use local variable hdrlen instead of ip_hdrlen(skb). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-08-02netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessaryEric Dumazet
We currently disable BH for the whole duration of get_counters() On machines with a lot of cpus and large tables, this might be too long. We can disable preemption during the whole function, and disable BH only while fetching counters for the current cpu. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-07-30tcp: cookie transactions setsockopt memory leakDmitry Popov
There is a bug in do_tcp_setsockopt(net/ipv4/tcp.c), TCP_COOKIE_TRANSACTIONS case. In some cases (when tp->cookie_values == NULL) new tcp_cookie_values structure can be allocated (at cvp), but not bound to tp->cookie_values. So a memory leak occurs. Signed-off-by: Dmitry Popov <dp@highloadlab.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-23netfilter: iptables: use skb->len for accountingChangli Gao
Use skb->len for accounting as xt_quota does. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-07-23netfilter: arptables: use arp_hdr_len()Changli Gao
use arp_hdr_len(). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-07-23netfilter: nf_nat_core: merge the same linesChangli Gao
proto->unique_tuple() will be called finally, if the previous calls fail. This patch checks the false condition of (range->flags &IP_NAT_RANGE_PROTO_RANDOM) instead to avoid duplicate line of code: proto->unique_tuple(). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-07-22net: RTA_MARK additionEric Dumazet
Add a new rt attribute, RTA_MARK, and use it in rt_fill_info()/inet_rtm_getroute() to support following commands : ip route get 192.168.20.110 mark NUMBER ip route get 192.168.20.108 from 192.168.20.110 iif eth1 mark NUMBER ip route list cache [192.168.20.110] mark NUMBER Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-21net: remove last uses of __attribute__((packed))Gustavo F. Padovan
Network code uses the __packed macro instead of __attribute__((packed)). Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-20Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/vhost/net.c net/bridge/br_device.c Fix merge conflict in drivers/vhost/net.c with guidance from Stephen Rothwell. Revert the effects of net-2.6 commit 573201f36fd9c7c6d5218cdcd9948cee700b277d since net-next-2.6 has fixes that make bridge netpoll work properly thus we don't need it disabled. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-19tcp: fix crash in tcp_xmit_retransmit_queueIlpo Järvinen
It can happen that there are no packets in queue while calling tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns NULL and that gets deref'ed to get sacked into a local var. There is no work to do if no packets are outstanding so we just exit early. This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out guard to make joining diff nicer). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de> Tested-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-15ipmr: Don't leak memory if fib lookup fails.Ben Greear
This was detected using two mcast router tables. The pimreg for the second interface did not have a specific mrule, so packets received by it were handled by the default table, which had nothing configured. This caused the ipmr_fib_lookup to fail, causing the memory leak. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-14rfs: call sock_rps_record_flow() in tcp_splice_read()Changli Gao
rfs: call sock_rps_record_flow() in tcp_splice_read() call sock_rps_record_flow() in tcp_splice_read(), so the applications using splice(2) or sendfile(2) can utilize RFS. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- net/ipv4/tcp.c | 1 + 1 file changed, 1 insertion(+) Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-12inet, inet6: make tcp_sendmsg() and tcp_sendpage() through inet_sendmsg() ↵Changli Gao
and inet_sendpage() a new boolean flag no_autobind is added to structure proto to avoid the autobind calls when the protocol is TCP. Then sock_rps_record_flow() is called int the TCP's sendmsg() and sendpage() pathes. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- include/net/inet_common.h | 4 ++++ include/net/sock.h | 1 + include/net/tcp.h | 8 ++++---- net/ipv4/af_inet.c | 15 +++++++++------ net/ipv4/tcp.c | 11 +++++------ net/ipv4/tcp_ipv4.c | 3 +++ net/ipv6/af_inet6.c | 8 ++++---- net/ipv6/tcp_ipv6.c | 3 +++ 8 files changed, 33 insertions(+), 20 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-12net/ipv4: EXPORT_SYMBOL cleanupsEric Dumazet
CodingStyle cleanups EXPORT_SYMBOL should immediately follow the symbol declaration. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-08gre: propagate ipv6 transport classStephen Hemminger
This patch makes IPV6 over IPv4 GRE tunnel propagate the transport class field from the underlying IPV6 header to the IPV4 Type Of Service field. Without the patch, all IPV6 packets in tunnel look the same to QoS. This assumes that IPV6 transport class is exactly the same as IPv4 TOS. Not sure if that is always the case? Maybe need to mask off some bits. The mask and shift to get tclass is copied from ipv6/datagram.c Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-07Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2010-07-07net/ipv4/ip_output.c: Removal of unused variable in ip_fragment()George Kadianakis
Removal of unused integer variable in ip_fragment(). Signed-off-by: George Kadianakis <desnacked@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-05ipv4: use skb_dst_copy() in ip_copy_metadata()Eric Dumazet
Avoid touching dst refcount in ip_fragment(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-05netfilter: ipt_REJECT: avoid touching dst refEric Dumazet
We can avoid a pair of atomic ops in ipt_REJECT send_reset() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-07-05netfilter: ipt_REJECT: postpone the checksum calculation.Changli Gao
postpone the checksum calculation, then if the output NIC supports checksum offloading, we can utlize it. And though the output NIC doesn't support checksum offloading, but we'll mangle this packet, this can free us from updating the checksum, as the checksum calculation occurs later. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-07-04xfrm: fix xfrm by MARK logicPeter Kosyh
While using xfrm by MARK feature in 2.6.34 - 2.6.35 kernels, the mark is always cleared in flowi structure via memset in _decode_session4 (net/ipv4/xfrm4_policy.c), so the policy lookup fails. IPv6 code is affected by this bug too. Signed-off-by: Peter Kosyh <p.kosyh@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-02Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2010-06-30fragment: add fast path for in-order fragmentsChangli Gao
add fast path for in-order fragments As the fragments are sent in order in most of OSes, such as Windows, Darwin and FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue. In the fast path, we check if the skb at the end of the inet_frag_queue is the prev we expect. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- include/net/inet_frag.h | 1 + net/ipv4/ip_fragment.c | 12 ++++++++++++ net/ipv6/reassembly.c | 11 +++++++++++ 3 files changed, 24 insertions(+) Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-30snmp: 64bit ipstats_mib for all archesEric Dumazet
/proc/net/snmp and /proc/net/netstat expose SNMP counters. Width of these counters is either 32 or 64 bits, depending on the size of "unsigned long" in kernel. This means user program parsing these files must already be prepared to deal with 64bit values, regardless of user program being 32 or 64 bit. This patch introduces 64bit snmp values for IPSTAT mib, where some counters can wrap pretty fast if they are 32bit wide. # netstat -s|egrep "InOctets|OutOctets" InOctets: 244068329096 OutOctets: 244069348848 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-28tcp: tso_fragment() might avoid GFP_ATOMICEric Dumazet
We can pass a gfp argument to tso_fragment() and avoid GFP_ATOMIC allocations sometimes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-28net: use this_cpu_ptr()Eric Dumazet
use this_cpu_ptr(p) instead of per_cpu_ptr(p, smp_processor_id()) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-28netfilter: ipt_LOG/ip6t_LOG: add option to print decoded MAC headerPatrick McHardy
The LOG targets print the entire MAC header as one long string, which is not readable very well: IN=eth0 OUT= MAC=00:15:f2:24:91:f8:00:1b:24:dc:61:e6:08:00 ... Add an option to decode known header formats (currently just ARPHRD_ETHER devices) in their individual fields: IN=eth0 OUT= MACSRC=00:1b:24:dc:61:e6 MACDST=00:15:f2:24:91:f8 MACPROTO=0800 ... IN=eth0 OUT= MACSRC=00:1b:24:dc:61:e6 MACDST=00:15:f2:24:91:f8 MACPROTO=86dd ... The option needs to be explicitly enabled by userspace to avoid breaking existing parsers. Signed-off-by: Patrick McHardy <kaber@trash.net>