linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2007-04-25	[IPV4]: align inet_protos[] on SMP	Eric Dumazet
	As IPPROTO_TCP is 6, it makes sense to make sure inet_protos[] array is properly cache line aligned to avoid false sharing on SMP. c0680540 b peer_total c0680544 b inet_peer_unused_head c0680560 B inet_protos On i386 this example, we can see that inet_protos[IPPROTO_TCP] shares a potentially hot (and modified) cache line. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[TCP]: tcp_memory_pressure and tcp_socket are__read_mostly candidates	Eric Dumazet
	tcp_memory_pressure and tcp_socket currently share a cache line with tcp_memory_allocated, tcp_sockets_allocated. (Very hot cache line) It makes sense to declare these variables as __read_mostly, to avoid false sharing on SMP. ffffffff8081d9c0 B tcp_orphan_count ffffffff8081d9c4 B tcp_memory_allocated ffffffff8081d9c8 B tcp_sockets_allocated ffffffff8081d9cc B tcp_memory_pressure ffffffff8081d9d0 b tcp_md5sig_users ffffffff8081d9d8 b tcp_md5sig_pool ffffffff8081d9e0 b warntime.31570 ffffffff8081d9e8 b tcp_socket Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET] fib_rules: Flush route cache after rule modifications	Thomas Graf
	The results of FIB rules lookups are cached in the routing cache except for IPv6 as no such cache exists. So far, it was the responsibility of the user to flush the cache after modifying any rules. This lead to many false bug reports due to misunderstanding of this concept. This patch automatically flushes the route cache after inserting or deleting a rule. Thanks to Muli Ben-Yehuda <muli@il.ibm.com> for catching a bug in the previous patch. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET]: inet_ehash_secret should be __read_mostly and set only once	Eric Dumazet
	There is a very tiny probability that build_ehash_secret() is called at the same time by different CPUS. Also, using __read_mostly is a must for inet_ehash_secret Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET]: Allow forwarding of ip_summed except CHECKSUM_COMPLETE	Herbert Xu
	Right now Xen has a horrible hack that lets it forward packets with partial checksums. One of the reasons that CHECKSUM_PARTIAL and CHECKSUM_COMPLETE were added is so that we can get rid of this hack (where it creates two extra bits in the skbuff to essentially mirror ip_summed without being destroyed by the forwarding code). I had forgotten that I've already gone through all the deivce drivers last time around to make sure that they're looking at ip_summed == CHECKSUM_PARTIAL rather than ip_summed != 0 on transmit. In any case, I've now done that again so it should definitely be safe. Unfortunately nobody has yet added any code to update CHECKSUM_COMPLETE values on forward so we I'm setting that to CHECKSUM_NONE. This should be safe to remove for bridging but I'd like to check that code path first. So here is the patch that lets us get rid of the hack by preserving ip_summed (mostly) on forwarded packets. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[IPV4] LVS: Allow to send ICMP unreachable responses when real-servers are ↵	Janusz Krzysztofik
	removed this is a small patch by Janusz Krzysztofik to ip_route_output_slow() that allows VIP-less LVS linux director to generate packets originating >From VIP if sysctl_ip_nonlocal_bind is set. In a nutshell, the intention is for an LVS linux director to be able to send ICMP unreachable responses to end-users when real-servers are removed. http://archive.linuxvirtualserver.org/html/lvs-users/2007-01/msg00106.html Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET] fib_rules: Add no-operation action	Thomas Graf
	The use of nop rules simplifies the usage of goto rules and adds more flexibility as they allow targets to remain while the actual content of the branches can change easly. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET] fib_rules: Mark rules detached from the device	Thomas Graf
	Rules which match against device names in their selector can remain while the device itself disappears, in fact the device doesn't have to present when the rule is added in the first place. The device name is resolved by trying when the rule is added and later by listening to NETDEV_REGISTER/UNREGISTER notifications. This patch adds the flag FIB_RULE_DEV_DETACHED which is set towards userspace when a rule contains a device match which is unresolved at the moment. This eases spotting the reason why certain rules seem not to function properly. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET] fib_rules: goto rule action	Thomas Graf
	This patch adds a new rule action FR_ACT_GOTO which allows to skip a set of rules by jumping to another rule. The rule to jump to is specified via the FRA_GOTO attribute which carries a rule preference. Referring to a rule which doesn't exists is explicitely allowed. Such goto rules are marked with the flag FIB_RULE_UNRESOLVED and will act like a rule with a non-matching selector. The rule will become functional as soon as its target is present. The goto action enables performance optimizations by reducing the average number of rules that have to be passed per lookup. Example: 0: from all lookup local 40: not from all to 192.168.23.128 goto 32766 41: from all fwmark 0xa blackhole 42: from all fwmark 0xff blackhole 32766: from all lookup main Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[TCP] tcp_probe: improvements for net-2.6.22	Stephen Hemminger
	Change tcp_probe to use ktime (needed to add one export). Add option to only get events when cwnd changes - from Doug Leith Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[TCP]: cubic update for net-2.6.22	Stephen Hemminger
	The following update received from Injong updates TCP cubic to the latest version. I am running more complete tests and will have results after 4/1. According to Injong: the new version improves on its scalability, fairness and stability. So in all properties, we confirmed it shows better performance. NCSU results (for 2.6.18 and 2.6.20) available: http://netsrv.csc.ncsu.edu/wiki/index.php/TCP_Testing This version is described in a new Internet draft for CUBIC. http://www.ietf.org/internet-drafts/draft-rhee-tcp-cubic-00.txt Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET] Move DF check to ip_forward	John Heffner
	Do fragmentation check in ip_forward, similar to ipv6 forwarding. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[INET]: Use jhash + random secret for ehash.	David S. Miller
	The days are gone when this was not an issue, there are folks out there with huge bot networks that can be used to attack the established hash tables on remote systems. So just like the routing cache and connection tracking hash, use Jenkins hash with random secret input. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETLINK]: introduce NLA_BINARY type	Johannes Berg
	This patch introduces a new NLA_BINARY attribute policy type with the verification of simply checking the maximum length of the payload. It also fixes a small typo in the example. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[SCTP]: Implement SCTP_MAX_BURST socket option.	Vlad Yasevich
	Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[SCTP]: Implement sac_info field in SCTP_ASSOC_CHANGE notification.	Vlad Yasevich
	As stated in the sctp socket api draft: sac_info: variable If the sac_state is SCTP_COMM_LOST and an ABORT chunk was received for this association, sac_info[] contains the complete ABORT chunk as defined in the SCTP specification RFC2960 [RFC2960] section 3.3.7. We now save received ABORT chunks into the sac_info field and pass that to the user. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[SCTP]: Honor flags when setting peer address parameters	Vlad Yasevich
	Parameters only take effect when a corresponding flag bit is set and a value is specified. This means we need to check the flags in addition to checking for non-zero value. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[SCTP]: Implement SCTP_ADDR_CONFIRMED state for ADDR_CHNAGE event	Vlad Yasevich
	Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[SCTP]: Implement SCTP_PARTIAL_DELIVERY_POINT option.	Vlad Yasevich
	This option induces partial delivery to run as soon as the specified amount of data has been accumulated on the association. However, we give preference to fully reassembled messages over PD messages. In any case, window and buffer is freed up. Signed-off-by: Vlad Yasevich <vladislav.yasevich@.hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[SCTP]: Implement SCTP_FRAGMENT_INTERLEAVE socket option	Vlad Yasevich
	This option was introduced in draft-ietf-tsvwg-sctpsocket-13. It prevents head-of-line blocking in the case of one-to-many endpoint. Applications enabling this option really must enable SCTP_SNDRCV event so that they would know where the data belongs. Based on an earlier patch by Ivan Skytte Jørgensen. Additionally, this functionality now permits multiple associations on the same endpoint to enter Partial Delivery. Applications should be extra careful, when using this functionality, to track EOR indicators. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET_SCHED]: qdisc: remove unnecessary memory barriers	Patrick McHardy
	We're holding dev->queue_lock in qdisc_watchdog_schedule and qdisc_watchdog_cancel, no need for the barriers. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET_SCHED]: Unline tcf_destroy	Patrick McHardy
	Uninline tcf_destroy and add a helper function to destroy an entire filter chain. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET_SCHED]: turn PSCHED_GET_TIME into inline function	Patrick McHardy
	Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline function	Patrick McHardy
	Also rename to psched_tdiff_bounded. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET_SCHED]: kill PSCHED_TDIFF	Patrick McHardy
	Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET_SCHED]: kill PSCHED_SET_PASTPERFECT/PSCHED_IS_PASTPERFECT	Patrick McHardy
	Use direct assignment and comparison instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET_SCHED]: kill PSCHED_TLESS	Patrick McHardy
	Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET_SCHED]: kill PSCHED_TADD/PSCHED_TADD2	Patrick McHardy
	Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET_SCHED]: kill PSCHED_AUDIT_TDIFF	Patrick McHardy
	Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NET_SCHED]: sch_netem: fix off-by-one in send time comparison	Patrick McHardy
	netem checks PSCHED_TLESS(cb->time_to_send, now) to find out whether it is allowed to send a packet, which is equivalent to cb->time_to_send < now. Use !PSCHED_TLESS(now, cb->time_to_send) instead to properly handle cb->time_to_send == now. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETFILTER] nfnetlink: netlink_run_queue() already checks for NLM_F_REQUEST	Thomas Graf
	Patrick has made use of netlink_run_queue() in nfnetlink while my patches have been waiting for net-2.6.22 to open. So this check for NLM_F_REQUEST can go as well. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETFILTER]: nf_conntrack: kill destroy() in struct nf_conntrack for diet	Yasuyuki Kozakai
	The destructor per conntrack is unnecessary, then this replaces it with system wide destructor. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETFILTER]: nf_conntrack: don't use nfct in skb if conntrack is disabled	Yasuyuki Kozakai
	Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETFILTER]: Use setup_timer	Patrick McHardy
	Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETFILTER]: nfnetlink_log: remove conditional locking	Patrick McHardy
	This is gross, have the wrapper function take the lock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETFILTER]: nfnetlink_log: micro-optimization: inst->skb != NULL in ↵	Michal Miroslaw
	__nfulnl_send() No other function calls __nfulnl_send() with inst->skb == NULL than nfulnl_timer(). Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETFILTER]: nfnetlink_log: iterator functions need iter_state * only	Michal Miroslaw
	get_*() don't need access to seq_file - iter_state is enough for them. Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETFILTER]: nfnetlink_log: micro-optimization: don't modify destroyed instance	Michal Miroslaw
	Simple micro-optimization: Don't change any options if the instance is being destroyed. Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETFILTER]: nfnetlink_log: micro-optimization for inst==NULL in ↵	Michal Miroslaw
	nfulnl_recv_config() Simple micro-optimization: don't call instance_put() on known NULL pointers. Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETFILTER]: nfnetlink_log: kill duplicate code	Michal Miroslaw
	Kill some duplicate code in nfulnl_log_packet(). Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETFILTER]: nfnetlink_log: don't count max(a,b) twice	Michal Miroslaw
	We don't need local nlbufsiz (skb size) as nfulnl_alloc_skb() takes the maximum anyway. Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETFILTER]: Remove changelogs and CVS IDs	Patrick McHardy
	Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETEM]: spelling errors	Stephen Hemminger
	Get rid of some of my creative spelling. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETLINK]: Directly return -EINTR from netlink_dump_start()	Thomas Graf
	Now that all users of netlink_dump_start() use netlink_run_queue() to process the receive queue, it is possible to return -EINTR from netlink_dump_start() directly, therefore simplying the callers. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[IPv4] diag: Use netlink_run_queue() to process the receive queue	Thomas Graf
	Makes use of netlink_run_queue() to process the receive queue and converts inet_diag_rcv_msg() to use the type safe netlink interface. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETLINK]: Remove error pointer from netlink message handler	Thomas Graf
	The error pointer argument in netlink message handlers is used to signal the special case where processing has to be interrupted because a dump was started but no error happened. Instead it is simpler and more clear to return -EINTR and have netlink_run_queue() deal with getting the queue right. nfnetlink passed on this error pointer to its subsystem handlers but only uses it to signal the start of a netlink dump. Therefore it can be removed there as well. This patch also cleans up the error handling in the affected message handlers to be consistent since it had to be touched anyway. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETLINK]: Ignore control messages directly in netlink_run_queue()	Thomas Graf
	Changes netlink_rcv_skb() to skip netlink controll messages and don't pass them on to the message handler. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETLINK]: Ignore !NLM_F_REQUEST messages directly in netlink_run_queue()	Thomas Graf
	netlink_rcv_skb() is changed to skip messages which don't have the NLM_F_REQUEST bit to avoid every netlink family having to perform this check on their own. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[NETLINK]: Remove unused groups variable	Thomas Graf
	Leftover from dynamic multicast groups allocation work. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25	[TCP] westwood: Use type safe netlink interface	Thomas Graf
	Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>