aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2010-04-28net: ip_queue_rcv_skb() helperEric Dumazet
When queueing a skb to socket, we can immediately release its dst if target socket do not use IP_CMSG_PKTINFO. tcp_data_queue() can drop dst too. This to benefit from a hot cache line and avoid the receiver, possibly on another cpu, to dirty this cache line himself. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28net: speedup udp receive pathEric Dumazet
Since commit 95766fff ([UDP]: Add memory accounting.), each received packet needs one extra sock_lock()/sock_release() pair. This added latency because of possible backlog handling. Then later, ticket spinlocks added yet another latency source in case of DDOS. This patch introduces lock_sock_bh() and unlock_sock_bh() synchronization primitives, avoiding one atomic operation and backlog processing. skb_free_datagram_locked() uses them instead of full blown lock_sock()/release_sock(). skb is orphaned inside locked section for proper socket memory reclaim, and finally freed outside of it. UDP receive path now take the socket spinlock only once. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28Revert "tcp: bind() fix when many ports are bound"David S. Miller
This reverts two commits: fda48a0d7a8412cedacda46a9c0bf8ef9cd13559 tcp: bind() fix when many ports are bound and a follow-on fix for it: 6443bb1fc2050ca2b6585a3fa77f7833b55329ed ipv6: Fix inet6_csk_bind_conflict() It causes problems with binding listening sockets when time-wait sockets from a previous instance still are alive. It's too late to keep fiddling with this so late in the -rc series, and we'll deal with it in net-next-2.6 instead. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27net: sk_add_backlog() take rmem_alloc into accountEric Dumazet
Current socket backlog limit is not enough to really stop DDOS attacks, because user thread spend many time to process a full backlog each round, and user might crazy spin on socket lock. We should add backlog size and receive_queue size (aka rmem_alloc) to pace writers, and let user run without being slow down too much. Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in stress situations. Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp receiver can now process ~200.000 pps (instead of ~100 pps before the patch) on a 8 core machine. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27net: Make RFS socket operations not be inet specific.David S. Miller
Idea from Eric Dumazet. As for placement inside of struct sock, I tried to choose a place that otherwise has a 32-bit hole on 64-bit systems. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2010-04-27Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/ipmr-2.6
2010-04-27TCP: avoid to send keepalive probes if receiving dataFlavio Leitner
RFC 1122 says the following: ... Keep-alive packets MUST only be sent when no data or acknowledgement packets have been received for the connection within an interval. ... The acknowledgement packet is reseting the keepalive timer but the data packet isn't. This patch fixes it by checking the timestamp of the last received data packet too when the keepalive timer expires. Signed-off-by: Flavio Leitner <fleitner@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/e100.c drivers/net/e1000e/netdev.c
2010-04-26net: ipmr: add support for dumping routing tables over netlinkPatrick McHardy
The ipmr /proc interface (ip_mr_cache) can't be extended to dump routes from any tables but the main table in a backwards compatible fashion since the output format ends in a variable amount of output interfaces. Introduce a new netlink interface to dump multicast routes from all tables, similar to the netlink interface for regular routes. Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-26net: rtnetlink: decouple rtnetlink address families from real address familiesPatrick McHardy
Decouple rtnetlink address families from real address families in socket.h to be able to add rtnetlink interfaces to code that is not a real address family without increasing AF_MAX/NPROTO. This will be used to add support for multicast route dumping from all tables as the proc interface can't be extended to support anything but the main table without breaking compatibility. This partialy undoes the patch to introduce independant families for routing rules and converts ipmr routing rules to a new rtnetlink family. Similar to that patch, values up to 127 are reserved for real address families, values above that may be used arbitrarily. Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-26net: fib_rules: mark arguments to fib_rules_register const and __net_initdataPatrick McHardy
fib_rules_register() duplicates the template passed to it without modification, mark the argument as const. Additionally the templates are only needed when instantiating a new namespace, so mark them as __net_initdata, which means they can be discarded when CONFIG_NET_NS=n. Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-23netfilter: nf_conntrack: extend with extra stat counterJesper Dangaard Brouer
I suspect an unfortunatly series of events occuring under a DDoS attack, in function __nf_conntrack_find() nf_contrack_core.c. Adding a stats counter to see if the search is restarted too often. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-22tcp: bind() fix when many ports are boundEric Dumazet
Port autoselection done by kernel only works when number of bound sockets is under a threshold (typically 30000). When this threshold is over, we must check if there is a conflict before exiting first loop in inet_csk_get_port() Change inet_csk_bind_conflict() to forbid two reuse-enabled sockets to bind on same (address,port) tuple (with a non ANY address) Same change for inet6_csk_bind_conflict() Reported-by: Gaspar Chilingarov <gasparch@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-23Merge branch 'master' into for-nextJiri Kosina
2010-04-22tcp: fix outsegs stat for TSO segmentsTom Herbert
Account for TSO segments of an skb in TCP_MIB_OUTSEGS counter. Without doing this, the counter can be off by orders of magnitude from the actual number of segments sent. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-22netfilter: ip_tables: convert pr_devel() to pr_debug()Patrick McHardy
We want to be able to use CONFIG_DYNAMIC_DEBUG in netfilter code, switch the few existing pr_devel() calls to pr_debug(). Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-21Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-6000.c net/core/dev.c
2010-04-20net: Fix various endianness glitchesEric Dumazet
Sparse can help us find endianness bugs, but we need to make some cleanups to be able to more easily spot real bugs. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20net: sk_sleep() helperEric Dumazet
Define a new function to return the waitqueue of a "struct sock". static inline wait_queue_head_t *sk_sleep(struct sock *sk) { return sk->sk_sleep; } Change all read occurrences of sk_sleep by a call to this function. Needed for a future RCU conversion. sk_sleep wont be a field directly available. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20netfilter: bridge-netfilter: fix refragmenting IP traffic encapsulated in ↵Bart De Schuymer
PPPoE traffic The MTU for IP traffic encapsulated inside PPPoE traffic is smaller than the MTU of the Ethernet device (1500). Connection tracking gathers all IP packets and sometimes will refragment them in ip_fragment(). We then need to subtract the length of the encapsulating header from the mtu used in ip_fragment(). The check in br_nf_dev_queue_xmit() which determines if ip_fragment() has to be called is also updated for the PPPoE-encapsulated packets. nf_bridge_copy_header() is also updated to make sure the PPPoE data length field has the correct value. Signed-off-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-20Merge branch 'master' of /repos/git/net-next-2.6Patrick McHardy
Conflicts: Documentation/feature-removal-schedule.txt net/ipv6/netfilter/ip6t_REJECT.c net/netfilter/xt_limit.c Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-19netfilter: xtables: remove old comments about reentrancyJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-19netfilter: xt_TEE: have cloned packet travel through Xtables tooJan Engelhardt
Since Xtables is now reentrant/nestable, the cloned packet can also go through Xtables and be subject to rules itself. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-19netfilter: xtables: make ip_tables reentrantJan Engelhardt
Currently, the table traverser stores return addresses in the ruleset itself (struct ip6t_entry->comefrom). This has a well-known drawback: the jumpstack is overwritten on reentry, making it necessary for targets to return absolute verdicts. Also, the ruleset (which might be heavy memory-wise) needs to be replicated for each CPU that can possibly invoke ip6t_do_table. This patch decouples the jumpstack from struct ip6t_entry and instead puts it into xt_table_info. Not being restricted by 'comefrom' anymore, we can set up a stack as needed. By default, there is room allocated for two entries into the traverser. arp_tables is not touched though, because there is just one/two modules and further patches seek to collapse the table traverser anyhow. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-19netfilter: xtables: inclusion of xt_TEEJan Engelhardt
xt_TEE can be used to clone and reroute a packet. This can for example be used to copy traffic at a router for logging purposes to another dedicated machine. References: http://www.gossamer-threads.com/lists/iptables/devel/68781 Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-16rfs: Receive Flow SteeringTom Herbert
This patch implements receive flow steering (RFS). RFS steers received packets for layer 3 and 4 processing to the CPU where the application for the corresponding flow is running. RFS is an extension of Receive Packet Steering (RPS). The basic idea of RFS is that when an application calls recvmsg (or sendmsg) the application's running CPU is stored in a hash table that is indexed by the connection's rxhash which is stored in the socket structure. The rxhash is passed in skb's received on the connection from netif_receive_skb. For each received packet, the associated rxhash is used to look up the CPU in the hash table, if a valid CPU is set then the packet is steered to that CPU using the RPS mechanisms. The convolution of the simple approach is that it would potentially allow OOO packets. If threads are thrashing around CPUs or multiple threads are trying to read from the same sockets, a quickly changing CPU value in the hash table could cause rampant OOO packets-- we consider this a non-starter. To avoid OOO packets, this solution implements two types of hash tables: rps_sock_flow_table and rps_dev_flow_table. rps_sock_table is a global hash table. Each entry is just a CPU number and it is populated in recvmsg and sendmsg as described above. This table contains the "desired" CPUs for flows. rps_dev_flow_table is specific to each device queue. Each entry contains a CPU and a tail queue counter. The CPU is the "current" CPU for a matching flow. The tail queue counter holds the value of a tail queue counter for the associated CPU's backlog queue at the time of last enqueue for a flow matching the entry. Each backlog queue has a queue head counter which is incremented on dequeue, and so a queue tail counter is computed as queue head count + queue length. When a packet is enqueued on a backlog queue, the current value of the queue tail counter is saved in the hash entry of the rps_dev_flow_table. And now the trick: when selecting the CPU for RPS (get_rps_cpu) the rps_sock_flow table and the rps_dev_flow table for the RX queue are consulted. When the desired CPU for the flow (found in the rps_sock_flow table) does not match the current CPU (found in the rps_dev_flow table), the current CPU is changed to the desired CPU if one of the following is true: - The current CPU is unset (equal to RPS_NO_CPU) - Current CPU is offline - The current CPU's queue head counter >= queue tail counter in the rps_dev_flow table. This checks if the queue tail has advanced beyond the last packet that was enqueued using this table entry. This guarantees that all packets queued using this entry have been dequeued, thus preserving in order delivery. Making each queue have its own rps_dev_flow table has two advantages: 1) the tail queue counters will be written on each receive, so keeping the table local to interrupting CPU s good for locality. 2) this allows lockless access to the table-- the CPU number and queue tail counter need to be accessed together under mutual exclusion from netif_receive_skb, we assume that this is only called from device napi_poll which is non-reentrant. This patch implements RFS for TCP and connected UDP sockets. It should be usable for other flow oriented protocols. There are two configuration parameters for RFS. The "rps_flow_entries" kernel init parameter sets the number of entries in the rps_sock_flow_table, the per rxqueue sysfs entry "rps_flow_cnt" contains the number of entries in the rps_dev_flow table for the rxqueue. Both are rounded to power of two. The obvious benefit of RFS (over just RPS) is that it achieves CPU locality between the receive processing for a flow and the applications processing; this can result in increased performance (higher pps, lower latency). The benefits of RFS are dependent on cache hierarchy, application load, and other factors. On simple benchmarks, we don't necessarily see improvement and sometimes see degradation. However, for more complex benchmarks and for applications where cache pressure is much higher this technique seems to perform very well. Below are some benchmark results which show the potential benfit of this patch. The netperf test has 500 instances of netperf TCP_RR test with 1 byte req. and resp. The RPC test is an request/response test similar in structure to netperf RR test ith 100 threads on each host, but does more work in userspace that netperf. e1000e on 8 core Intel No RFS or RPS 104K tps at 30% CPU No RFS (best RPS config): 290K tps at 63% CPU RFS 303K tps at 61% CPU RPC test tps CPU% 50/90/99% usec latency Latency StdDev No RFS/RPS 103K 48% 757/900/3185 4472.35 RPS only: 174K 73% 415/993/2468 491.66 RFS 223K 73% 379/651/1382 315.61 Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-15net: replace ipfragok with skb->local_dfShan Wei
As Herbert Xu said: we should be able to simply replace ipfragok with skb->local_df. commit f88037(sctp: Drop ipfargok in sctp_xmit function) has droped ipfragok and set local_df value properly. The patch kills the ipfragok parameter of .queue_xmit(). Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-15ip: Fix ip_dev_loopback_xmit()Eric Dumazet
Eric Paris got following trace with a linux-next kernel [ 14.203970] BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon/2093 [ 14.204025] caller is netif_rx+0xfa/0x110 [ 14.204035] Call Trace: [ 14.204064] [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110 [ 14.204070] [<ffffffff8142163a>] netif_rx+0xfa/0x110 [ 14.204090] [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0 [ 14.204095] [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0 [ 14.204099] [<ffffffff8145d610>] ip_local_out+0x20/0x30 [ 14.204105] [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0 [ 14.204119] [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400 [ 14.204125] [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790 [ 14.204137] [<ffffffff814891d5>] inet_sendmsg+0x45/0x80 [ 14.204149] [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110 [ 14.204189] [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380 [ 14.204233] [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b While current linux-2.6 kernel doesnt emit this warning, bug is latent and might cause unexpected failures. ip_dev_loopback_xmit() runs in process context, preemption enabled, so must call netif_rx_ni() instead of netif_rx(), to make sure that we process pending software interrupt. Same change for ip6_dev_loopback_xmit() Reported-by: Eric Paris <eparis@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-15netfilter: ipt_LOG/ip6t_LOG: use more appropriate log level as defaultPatrick McHardy
Use KERN_NOTICE instead of KERN_EMERG by default. This only affects kernel internal logging (like conntrack), user-specified logging rules contain a seperate log level. Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-15ipv4: ipmr: fix NULL pointer deref during unres queue destructionPatrick McHardy
Fix an oversight in ipmr_destroy_unres() - the net pointer is unconditionally initialized to NULL, resulting in a NULL pointer dereference later on. Fix by adding a net pointer to struct mr_table and using it in ipmr_destroy_unres(). Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-15ipv4: ipmr: fix invalid cache resolving when adding a non-matching entryPatrick McHardy
The patch to convert struct mfc_cache to list_heads (ipv4: ipmr: convert struct mfc_cache to struct list_head) introduced a bug when adding new cache entries that don't match any unresolved entries. The unres queue is searched for a matching entry, which is then resolved. When no matching entry is present, the iterator points to the head of the list, but is treated as a matching entry. Use a seperate variable to indicate that a matching entry was found. Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-15ipv4: ipmr: fix IP_MROUTE_MULTIPLE_TABLES Kconfig dependenciesPatrick McHardy
IP_MROUTE_MULTIPLE_TABLES should depend on IP_MROUTE. Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-14fib: suppress lockdep-RCU false positive in FIB trie.Eric Dumazet
Followup of commit 634a4b20 Allow tnode_get_child_rcu() to be called either under rcu_read_lock() protection or with RTNL held. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13ipv4: ipmr: support multiple tablesPatrick McHardy
This patch adds support for multiple independant multicast routing instances, named "tables". Userspace multicast routing daemons can bind to a specific table instance by issuing a setsockopt call using a new option MRT_TABLE. The table number is stored in the raw socket data and affects all following ipmr setsockopt(), getsockopt() and ioctl() calls. By default, a single table (RT_TABLE_DEFAULT) is created with a default routing rule pointing to it. Newly created pimreg devices have the table number appended ("pimregX"), with the exception of devices created in the default table, which are named just "pimreg" for compatibility reasons. Packets are directed to a specific table instance using routing rules, similar to how regular routing rules work. Currently iif, oif and mark are supported as keys, source and destination addresses could be supported additionally. Example usage: - bind pimd/xorp/... to a specific table: uint32_t table = 123; setsockopt(fd, IPPROTO_IP, MRT_TABLE, &table, sizeof(table)); - create routing rules directing packets to the new table: # ip mrule add iif eth0 lookup 123 # ip mrule add oif eth0 lookup 123 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13ipv4: ipmr: move mroute data into seperate structurePatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13ipv4: ipmr: convert struct mfc_cache to struct list_headPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13ipv4: ipmr: remove net pointer from struct mfc_cachePatrick McHardy
Now that cache entries in unres_queue don't need to be distinguished by their network namespace pointer anymore, we can remove it from struct mfc_cache add pass the namespace as function argument to the functions that need it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13ipv4: ipmr: move unres_queue and timer to per-namespace dataPatrick McHardy
The unres_queue is currently shared between all namespaces. Following patches will additionally allow to create multiple multicast routing tables in each namespace. Having a single shared queue for all these users seems to excessive, move the queue and the cleanup timer to the per-namespace data to unshare it. As a side-effect, this fixes a bug in the seq file iteration functions: the first entry returned is always from the current namespace, entries returned after that may belong to any namespace. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13net: fib_rules: decouple address families from real address familiesPatrick McHardy
Decouple the address family values used for fib_rules from the real address families in socket.h. This allows to use fib_rules for code that is not a real address family without increasing AF_MAX/NPROTO. Values up to 127 are reserved for real address families and map directly to the corresponding AF value, values starting from 128 are for other uses. rtnetlink is changed to invoke the AF_UNSPEC dumpit/doit handlers for these families. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13net: fib_rules: set family in fib_rule_hdr centrallyPatrick McHardy
All fib_rules implementations need to set the family in their ->fill() functions. Since the value is available to the generic fib_nl_fill_rule() function, set it there. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13net: fib_rules: consolidate IPv4 and DECnet ->default_pref() functions.Patrick McHardy
Both functions are equivalent, consolidate them since a following patch needs a third implementation for multicast routing. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13netfilter: fix some coding styles and remove moduleparam.hZhitong Wang
Fix some coding styles and remove moduleparam.h Signed-off-by: Zhitong Wang <zhitong.wangzt@alibaba-inc.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-13net: sk_dst_cache RCUificationEric Dumazet
With latest CONFIG_PROVE_RCU stuff, I felt more comfortable to make this work. sk->sk_dst_cache is currently protected by a rwlock (sk_dst_lock) This rwlock is readlocked for a very small amount of time, and dst entries are already freed after RCU grace period. This calls for RCU again :) This patch converts sk_dst_lock to a spinlock, and use RCU for readers. __sk_dst_get() is supposed to be called with rcu_read_lock() or if socket locked by user, so use appropriate rcu_dereference_check() condition (rcu_read_lock_held() || sock_owned_by_user(sk)) This patch avoids two atomic ops per tx packet on UDP connected sockets, for example, and permits sk_dst_lock to be much less dirtied. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-11tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skbDavid S. Miller
Back in commit 04a0551c87363f100b04d28d7a15a632b70e18e7 ("loopback: Drop obsolete ip_summed setting") we stopped setting CHECKSUM_UNNECESSARY in the loopback xmit. This is because such a setting was a lie since it implies that the checksum field of the packet is properly filled in. Instead what happens normally is that CHECKSUM_PARTIAL is set and skb->csum is calculated as needed. But this was only happening for TCP data packets (via the skb->ip_summed assignment done in tcp_sendmsg()). It doesn't happen for non-data packets like ACKs etc. Fix this by setting skb->ip_summed in the common non-data packet constructor. It already is setting skb->csum to zero. But this reminds us that we still have things like ip_output.c's ip_dev_loopback_xmit() which sets skb->ip_summed to the value CHECKSUM_UNNECESSARY, which Herbert's patch teaches us is not valid. So we'll have to address that at some point too. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-11inet: Remove unused send_check length argumentHerbert Xu
inet: Remove unused send_check length argument This patch removes the unused length argument from the send_check function in struct inet_connection_sock_af_ops. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Yinghai <yinghai.lu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-11tcp: Handle CHECKSUM_PARTIAL for SYNACK packets for IPv4Herbert Xu
tcp: Handle CHECKSUM_PARTIAL for SYNACK packets for IPv4 This patch moves the common code between tcp_v4_send_check and tcp_v4_gso_send_check into a new function __tcp_v4_send_check. It then uses the new function in tcp_v4_send_synack so that it handles CHECKSUM_PARTIAL properly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Yinghai <yinghai.lu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-11Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/stmmac/stmmac_main.c drivers/net/wireless/wl12xx/wl1271_cmd.c drivers/net/wireless/wl12xx/wl1271_main.c drivers/net/wireless/wl12xx/wl1271_spi.c net/core/ethtool.c net/mac80211/scan.c
2010-04-11Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller
2010-04-11Revert "tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb"David S. Miller
This reverts commit 2626419ad5be1a054d350786b684b41d23de1538. It causes regressions for people with IGB cards. Connection requests don't complete etc. The true cause of the issue is still not known, but we should sort this out in net-next-2.6 not net-2.6 Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-08tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skbDavid S. Miller
Back in commit 04a0551c87363f100b04d28d7a15a632b70e18e7 ("loopback: Drop obsolete ip_summed setting") we stopped setting CHECKSUM_UNNECESSARY in the loopback xmit. This is because such a setting was a lie since it implies that the checksum field of the packet is properly filled in. Instead what happens normally is that CHECKSUM_PARTIAL is set and skb->csum is calculated as needed. But this was only happening for TCP data packets (via the skb->ip_summed assignment done in tcp_sendmsg()). It doesn't happen for non-data packets like ACKs etc. Fix this by setting skb->ip_summed in the common non-data packet constructor. It already is setting skb->csum to zero. But this reminds us that we still have things like ip_output.c's ip_dev_loopback_xmit() which sets skb->ip_summed to the value CHECKSUM_UNNECESSARY, which Herbert's patch teaches us is not valid. So we'll have to address that at some point too. Signed-off-by: David S. Miller <davem@davemloft.net>