aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2010-09-29Phonet: Correct header retrieval after pskb_may_pullKumar Sanghvi
Retrieve the header after doing pskb_may_pull since, pskb_may_pull could change the buffer structure. This is based on the comment given by Eric Dumazet on Phonet Pipe controller patch for a similar problem. Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com> Acked-by: Linus Walleij <linus.walleij@stericsson.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-28ip_gre: Fix dependencies wrt. ipv6.David S. Miller
The GRE tunnel driver needs to invoke icmpv6 helpers in the ipv6 stack when ipv6 support is enabled. Therefore if IPV6 is enabled, we have to enforce that GRE's enabling (modular or static) matches that of ipv6. Reported-by: Patrick McHardy <kaber@trash.net> Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-28net-2.6: SYN retransmits: Add new parameter to retransmits_timed_out()Damian Lukowski
Fixes kernel Bugzilla Bug 18952 This patch adds a syn_set parameter to the retransmits_timed_out() routine and updates its callers. If not set, TCP_RTO_MIN is taken as the calculation basis as before. If set, TCP_TIMEOUT_INIT is used instead, so that sysctl_syn_retries represents the actual amount of SYN retransmissions in case no SYNACKs are received when establishing a new connection. Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-27tcp: Fix >4GB writes on 64-bit.David S. Miller
Fixes kernel bugzilla #16603 tcp_sendmsg() truncates iov_len to an 'int' which a 4GB write to write zero bytes, for example. There is also the problem higher up of how verify_iovec() works. It wants to prevent the total length from looking like an error return value. However it does this using 'int', but syscalls return 'long' (and thus signed 64-bit on 64-bit machines). So it could trigger false-positives on 64-bit as written. So fix it to use 'long'. Reported-by: Olaf Bonorden <bono@onlinehome.de> Reported-by: Daniel Büse <dbuese@gmx.de> Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-27net/9p: Mount only matching virtio channelsSven Eckelmann
p9_virtio_create will only compare the the channel's tag characters against the device name till the end of the channel's tag but not till the end of the device name. This means that if a user defines channels with the tags foo and foobar then he would mount foo when he requested foonot and may mount foo when he requested foobar. Thus it is necessary to check both string lengths against each other in case of a successful partial string match. Signed-off-by: Sven Eckelmann <sven.eckelmann@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-27ipv6: add IPv6 to neighbour table overflow warningUlrich Weber
IPv4 and IPv6 have separate neighbour tables, so the warning messages should be distinguishable. [ Add a suitable message prefix on the ipv4 side as well -DaveM ] Signed-off-by: Ulrich Weber <uweber@astaro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-27tcp: fix TSO FACK loss marking in tcp_mark_head_lostYuchung Cheng
When TCP uses FACK algorithm to mark lost packets in tcp_mark_head_lost(), if the number of packets in the (TSO) skb is greater than the number of packets that should be marked lost, TCP incorrectly exits the loop and marks no packets lost in the skb. This underestimates tp->lost_out and affects the recovery/retransmission. This patch fargments the skb and marks the correct amount of packets lost. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-26ipv6: add a missing unregister_pernet_subsys callNeil Horman
Clean up a missing exit path in the ipv6 module init routines. In addrconf_init we call ipv6_addr_label_init which calls register_pernet_subsys for the ipv6_addr_label_ops structure. But if module loading fails, or if the ipv6 module is removed, there is no corresponding unregister_pernet_subsys call, which leaves a now-bogus address on the pernet_list, leading to oopses in subsequent registrations. This patch cleans up both the failed load path and the unload path. Tested by myself with good results. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> include/net/addrconf.h | 1 + net/ipv6/addrconf.c | 11 ++++++++--- net/ipv6/addrlabel.c | 5 +++++ 3 files changed, 14 insertions(+), 3 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-26br2684: fix scheduling while atomicKarl Hiramoto
You can't call atomic_notifier_chain_unregister() while in atomic context. Fix, call un/register_atmdevice_notifier in module __init and __exit. Bug report: http://comments.gmane.org/gmane.linux.network/172603 Reported-by: Mikko Vinni <mmvinni@yahoo.com> Tested-by: Mikko Vinni <mmvinni@yahoo.com> Signed-off-by: Karl Hiramoto <karl@hiramoto.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-24net: fix a lockdep splatEric Dumazet
We have for each socket : One spinlock (sk_slock.slock) One rwlock (sk_callback_lock) Possible scenarios are : (A) (this is used in net/sunrpc/xprtsock.c) read_lock(&sk->sk_callback_lock) (without blocking BH) <BH> spin_lock(&sk->sk_slock.slock); ... read_lock(&sk->sk_callback_lock); ... (B) write_lock_bh(&sk->sk_callback_lock) stuff write_unlock_bh(&sk->sk_callback_lock) (C) spin_lock_bh(&sk->sk_slock) ... write_lock_bh(&sk->sk_callback_lock) stuff write_unlock_bh(&sk->sk_callback_lock) spin_unlock_bh(&sk->sk_slock) This (C) case conflicts with (A) : CPU1 [A] CPU2 [C] read_lock(callback_lock) <BH> spin_lock_bh(slock) <wait to spin_lock(slock)> <wait to write_lock_bh(callback_lock)> We have one problematic (C) use case in inet_csk_listen_stop() : local_bh_disable(); bh_lock_sock(child); // spin_lock_bh(&sk->sk_slock) WARN_ON(sock_owned_by_user(child)); ... sock_orphan(child); // write_lock_bh(&sk->sk_callback_lock) lockdep is not happy with this, as reported by Tetsuo Handa It seems only way to deal with this is to use read_lock_bh(callbacklock) everywhere. Thanks to Jarek for pointing a bug in my first attempt and suggesting this solution. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-22xfrm4: strip ECN bits from tos fieldUlrich Weber
otherwise ECT(1) bit will get interpreted as RTO_ONLINK and routing will fail with XfrmOutBundleGenError. Signed-off-by: Ulrich Weber <uweber@astaro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-22netfilter: nf_conntrack_defrag: check socket type before touching nodefrag flagJiri Olsa
we need to check proper socket type within ipv4_conntrack_defrag function before referencing the nodefrag flag. For example the tun driver receive path produces skbs with AF_UNSPEC socket type, and so current code is causing unwanted fragmented packets going out. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-22netfilter: nf_nat_snmp: fix checksum calculation (v4)Patrick McHardy
Fix checksum calculation in nf_nat_snmp_basic. Based on patches by Clark Wang <wtweeker@163.com> and Stephen Hemminger <shemminger@vyatta.com>. https://bugzilla.kernel.org/show_bug.cgi?id=17622 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-22netfilter: fix a race in nf_ct_ext_create()Eric Dumazet
As soon as rcu_read_unlock() is called, there is no guarantee current thread can safely derefence t pointer, rcu protected. Fix is to copy t->alloc_size in a temporary variable. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-22netfilter: fix ipt_REJECT TCP RST routing for indev == outdevChangli Gao
ip_route_me_harder can't create the route cache when the outdev is the same with the indev for the skbs whichout a valid protocol set. __mkroute_input functions has this check: 1998 if (skb->protocol != htons(ETH_P_IP)) { 1999 /* Not IP (i.e. ARP). Do not create route, if it is 2000 * invalid for proxy arp. DNAT routes are always valid. 2001 * 2002 * Proxy arp feature have been extended to allow, ARP 2003 * replies back to the same interface, to support 2004 * Private VLAN switch technologies. See arp.c. 2005 */ 2006 if (out_dev == in_dev && 2007 IN_DEV_PROXY_ARP_PVLAN(in_dev) == 0) { 2008 err = -EINVAL; 2009 goto cleanup; 2010 } 2011 } This patch gives the new skb a valid protocol to bypass this check. In order to make ipt_REJECT work with bridges, you also need to enable ip_forward. This patch also fixes a regression. When we used skb_copy_expand(), we didn't have this issue stated above, as the protocol was properly set. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-22netfilter: nf_ct_sip: default to NF_ACCEPT in sip_help_tcp()Simon Horman
I initially noticed this because of the compiler warning below, but it does seem to be a valid concern in the case where ct_sip_get_header() returns 0 in the first iteration of the while loop. net/netfilter/nf_conntrack_sip.c: In function 'sip_help_tcp': net/netfilter/nf_conntrack_sip.c:1379: warning: 'ret' may be used uninitialized in this function Signed-off-by: Simon Horman <horms@verge.net.au> [Patrick: changed NF_DROP to NF_ACCEPT] Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-22netfilter: tproxy: nf_tproxy_assign_sock() can handle tw socketsEric Dumazet
transparent field of a socket is either inet_twsk(sk)->tw_transparent for timewait sockets, or inet_sk(sk)->transparent for other sockets (TCP/UDP). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-21ip: fix truesize mismatch in ip fragmentationEric Dumazet
Special care should be taken when slow path is hit in ip_fragment() : When walking through frags, we transfert truesize ownership from skb to frags. Then if we hit a slow_path condition, we must undo this or risk uncharging frags->truesize twice, and in the end, having negative socket sk_wmem_alloc counter, or even freeing socket sooner than expected. Many thanks to Nick Bowler, who provided a very clean bug report and test program. Thanks to Jarek for reviewing my first patch and providing a V2 While Nick bisection pointed to commit 2b85a34e911 (net: No more expensive sock_hold()/sock_put() on each tx), underlying bug is older (2.6.12-rc5) A side effect is to extend work done in commit b2722b1c3a893e (ip_fragment: also adjust skb->truesize for packets not owned by a socket) to ipv6 as well. Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com> Tested-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-21Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2010-09-20tcp: Fix race in tcp_pollTom Marshall
If a RST comes in immediately after checking sk->sk_err, tcp_poll will return POLLIN but not POLLOUT. Fix this by checking sk->sk_err at the end of tcp_poll. Additionally, ensure the correct order of operations on SMP machines with memory barriers. Signed-off-by: Tom Marshall <tdm.code@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-20rose: Fix signedness issues wrt. digi count.David S. Miller
Just use explicit casts, since we really can't change the types of structures exported to userspace which have been around for 15 years or so. Reported-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-20xfrm: Allow different selector family in temporary stateThomas Egerer
The family parameter xfrm_state_find is used to find a state matching a certain policy. This value is set to the template's family (encap_family) right before xfrm_state_find is called. The family parameter is however also used to construct a temporary state in xfrm_state_find itself which is wrong for inter-family scenarios because it produces a selector for the wrong family. Since this selector is included in the xfrm_user_acquire structure, user space programs misinterpret IPv6 addresses as IPv4 and vice versa. This patch splits up the original init_tempsel function into a part that initializes the selector respectively the props and id of the temporary state, to allow for differing ip address families whithin the state. Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-20wext: fix potential private ioctl memory content leakJohannes Berg
When a driver doesn't fill the entire buffer, old heap contents may remain, and if it also doesn't update the length properly, this old heap content will be copied back to userspace. It is very unlikely that this happens in any of the drivers using private ioctls since it would show up as junk being reported by iwpriv, but it seems better to be safe here, so use kzalloc. Reported-by: Jeff Mahoney <jeffm@suse.com> Cc: stable@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-09-20ip_gre: CONFIG_IPV6_MODULE supportEric Dumazet
ipv6 can be a module, we should test CONFIG_IPV6 and CONFIG_IPV6_MODULE to enable ipv6 bits in ip_gre. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-17sctp: Do not reset the packet during sctp_packet_config().Vlad Yasevich
sctp_packet_config() is called when getting the packet ready for appending of chunks. The function should not touch the current state, since it's possible to ping-pong between two transports when sending, and that can result packet corruption followed by skb overlfow crash. Reported-by: Thomas Dreibholz <dreibh@iem.uni-due.de> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-16net/llc: storing negative error codes in unsigned shortDan Carpenter
If the alloc_skb() fails then we return 65431 instead of -ENOBUFS (-105). Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-16xfrm: dont assume rcu_read_lock in xfrm_output_one()Eric Dumazet
ip_local_out() is called with rcu_read_lock() held from ip_queue_xmit() but not from other call sites. Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-14net: RPS needs to depend upon USE_GENERIC_SMP_HELPERSDavid S. Miller
You cannot invoke __smp_call_function_single() unless the architecture sets this symbol. Reported-by: Daniel Hellstrom <daniel@gaisler.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-14net: use rcu_barrier() in rollback_registered_manyEric Dumazet
netdev_wait_allrefs() waits that all references to a device vanishes. It currently uses a _very_ pessimistic 250 ms delay between each probe. Some users reported that no more than 4 devices can be dismantled per second, this is a pretty serious problem for some setups. Most of the time, a refcount is about to be released by an RCU callback, that is still in flight because rollback_registered_many() uses a synchronize_rcu() call instead of rcu_barrier(). Problem is visible if number of online cpus is one, because synchronize_rcu() is then a no op. time to remove 50 ipip tunnels on a UP machine : before patch : real 11.910s after patch : real 1.250s Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: Octavian Purdila <opurdila@ixiacom.com> Reported-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-13ipv4: enable getsockopt() for IP_NODEFRAGMichael Kerrisk
While integrating your man-pages patch for IP_NODEFRAG, I noticed that this option is settable by setsockopt(), but not gettable by getsockopt(). I suppose this is not intended. The (untested, trivial) patch below adds getsockopt() support. Signed-off-by: Michael kerrisk <mtk.manpages@gmail.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-13ipv4: force_igmp_version ignored when a IGMPv3 query receivedBob Arendt
After all these years, it turns out that the /proc/sys/net/ipv4/conf/*/force_igmp_version parameter isn't fully implemented. *Symptom*: When set force_igmp_version to a value of 2, the kernel should only perform multicast IGMPv2 operations (IETF rfc2236). An host-initiated Join message will be sent as a IGMPv2 Join message. But if a IGMPv3 query message is received, the host responds with a IGMPv3 join message. Per rfc3376 and rfc2236, a IGMPv2 host should treat a IGMPv3 query as a IGMPv2 query and respond with an IGMPv2 Join message. *Consequences*: This is an issue when a IGMPv3 capable switch is the querier and will only issue IGMPv3 queries (which double as IGMPv2 querys) and there's an intermediate switch that is only IGMPv2 capable. The intermediate switch processes the initial v2 Join, but fails to recognize the IGMPv3 Join responses to the Query, resulting in a dropped connection when the intermediate v2-only switch times it out. *Identifying issue in the kernel source*: The issue is in this section of code (in net/ipv4/igmp.c), which is called when an IGMP query is received (from mainline 2.6.36-rc3 gitweb): ... A IGMPv3 query has a length >= 12 and no sources. This routine will exit after line 880, setting the general query timer (random timeout between 0 and query response time). This calls igmp_gq_timer_expire(): ... .. which only sends a v3 response. So if a v3 query is received, the kernel always sends a v3 response. IGMP queries happen once every 60 sec (per vlan), so the traffic is low. A IGMPv3 query *is* a strict superset of a IGMPv2 query, so this patch properly short circuit's the v3 behaviour. One issue is that this does not address force_igmp_version=1. Then again, I've never seen any IGMPv1 multicast equipment in the wild. However there is a lot of v2-only equipment. If it's necessary to support the IGMPv1 case as well: 837 if (len == 8 || IGMP_V2_SEEN(in_dev) || IGMP_V1_SEEN(in_dev)) { Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-13net/llc: make opt unsigned in llc_ui_setsockopt()Dan Carpenter
The members of struct llc_sock are unsigned so if we pass a negative value for "opt" it can cause a sign bug. Also it can cause an integer overflow when we multiply "opt * HZ". CC: stable@kernel.org Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-12sch_atm: Fix potential NULL deref.David S. Miller
The list_head conversion unearther an unnecessary flow check. Since flow is always NULL here we don't need to see if a matching flow exists already. Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-09sctp: fix test for end of loopJoe Perches
Add a list_has_sctp_addr function to simplify loop Based on a patches by Dan Carpenter and David Miller Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
2010-09-08udp: add rehash on connect()Eric Dumazet
commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation) added a secondary hash on UDP, hashed on (local addr, local port). Problem is that following sequence : fd = socket(...) connect(fd, &remote, ...) not only selects remote end point (address and port), but also sets local address, while UDP stack stored in secondary hash table the socket while its local address was INADDR_ANY (or ipv6 equivalent) Sequence is : - autobind() : choose a random local port, insert socket in hash tables [while local address is INADDR_ANY] - connect() : set remote address and port, change local address to IP given by a route lookup. When an incoming UDP frame comes, if more than 10 sockets are found in primary hash table, we switch to secondary table, and fail to find socket because its local address changed. One solution to this problem is to rehash datagram socket if needed. We add a new rehash(struct socket *) method in "struct proto", and implement this method for UDP v4 & v6, using a common helper. This rehashing only takes care of secondary hash table, since primary hash (based on local port only) is not changed. Reported-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08net: blackhole route should always be recalculatedJianzhao Wang
Blackhole routes are used when xfrm_lookup() returns -EREMOTE (error triggered by IKE for example), hence this kind of route is always temporary and so we should check if a better route exists for next packets. Bug has been introduced by commit d11a4dc18bf41719c9f0d7ed494d295dd2973b92. Signed-off-by: Jianzhao Wang <jianzhao.wang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08ipv4: Suppress lockdep-RCU false positive in FIB trie (3)Jarek Poplawski
Hi, Here is one more of these warnings and a patch below: Sep 5 23:52:33 del kernel: [46044.244833] =================================================== Sep 5 23:52:33 del kernel: [46044.269681] [ INFO: suspicious rcu_dereference_check() usage. ] Sep 5 23:52:33 del kernel: [46044.277000] --------------------------------------------------- Sep 5 23:52:33 del kernel: [46044.285185] net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection! Sep 5 23:52:33 del kernel: [46044.293627] Sep 5 23:52:33 del kernel: [46044.293632] other info that might help us debug this: Sep 5 23:52:33 del kernel: [46044.293634] Sep 5 23:52:33 del kernel: [46044.325333] Sep 5 23:52:33 del kernel: [46044.325335] rcu_scheduler_active = 1, debug_locks = 0 Sep 5 23:52:33 del kernel: [46044.348013] 1 lock held by pppd/1717: Sep 5 23:52:33 del kernel: [46044.357548] #0: (rtnl_mutex){+.+.+.}, at: [<c125dc1f>] rtnl_lock+0xf/0x20 Sep 5 23:52:33 del kernel: [46044.367647] Sep 5 23:52:33 del kernel: [46044.367652] stack backtrace: Sep 5 23:52:33 del kernel: [46044.387429] Pid: 1717, comm: pppd Not tainted 2.6.35.4.4a #3 Sep 5 23:52:33 del kernel: [46044.398764] Call Trace: Sep 5 23:52:33 del kernel: [46044.409596] [<c12f9aba>] ? printk+0x18/0x1e Sep 5 23:52:33 del kernel: [46044.420761] [<c1053969>] lockdep_rcu_dereference+0xa9/0xb0 Sep 5 23:52:33 del kernel: [46044.432229] [<c12b7235>] trie_firstleaf+0x65/0x70 Sep 5 23:52:33 del kernel: [46044.443941] [<c12b74d4>] fib_table_flush+0x14/0x170 Sep 5 23:52:33 del kernel: [46044.455823] [<c1033e92>] ? local_bh_enable_ip+0x62/0xd0 Sep 5 23:52:33 del kernel: [46044.467995] [<c12fc39f>] ? _raw_spin_unlock_bh+0x2f/0x40 Sep 5 23:52:33 del kernel: [46044.480404] [<c12b24d0>] ? fib_sync_down_dev+0x120/0x180 Sep 5 23:52:33 del kernel: [46044.493025] [<c12b069d>] fib_flush+0x2d/0x60 Sep 5 23:52:33 del kernel: [46044.505796] [<c12b06f5>] fib_disable_ip+0x25/0x50 Sep 5 23:52:33 del kernel: [46044.518772] [<c12b10d3>] fib_netdev_event+0x73/0xd0 Sep 5 23:52:33 del kernel: [46044.531918] [<c1048dfd>] notifier_call_chain+0x2d/0x70 Sep 5 23:52:33 del kernel: [46044.545358] [<c1048f0a>] raw_notifier_call_chain+0x1a/0x20 Sep 5 23:52:33 del kernel: [46044.559092] [<c124f687>] call_netdevice_notifiers+0x27/0x60 Sep 5 23:52:33 del kernel: [46044.573037] [<c124faec>] __dev_notify_flags+0x5c/0x80 Sep 5 23:52:33 del kernel: [46044.586489] [<c124fb47>] dev_change_flags+0x37/0x60 Sep 5 23:52:33 del kernel: [46044.599394] [<c12a8a8d>] devinet_ioctl+0x54d/0x630 Sep 5 23:52:33 del kernel: [46044.612277] [<c12aabb7>] inet_ioctl+0x97/0xc0 Sep 5 23:52:34 del kernel: [46044.625208] [<c123f6af>] sock_ioctl+0x6f/0x270 Sep 5 23:52:34 del kernel: [46044.638046] [<c109d2b0>] ? handle_mm_fault+0x420/0x6c0 Sep 5 23:52:34 del kernel: [46044.650968] [<c123f640>] ? sock_ioctl+0x0/0x270 Sep 5 23:52:34 del kernel: [46044.663865] [<c10c3188>] vfs_ioctl+0x28/0xa0 Sep 5 23:52:34 del kernel: [46044.676556] [<c10c38fa>] do_vfs_ioctl+0x6a/0x5c0 Sep 5 23:52:34 del kernel: [46044.688989] [<c1048676>] ? up_read+0x16/0x30 Sep 5 23:52:34 del kernel: [46044.701411] [<c1021376>] ? do_page_fault+0x1d6/0x3a0 Sep 5 23:52:34 del kernel: [46044.714223] [<c10b6588>] ? fget_light+0xf8/0x2f0 Sep 5 23:52:34 del kernel: [46044.726601] [<c1241f98>] ? sys_socketcall+0x208/0x2c0 Sep 5 23:52:34 del kernel: [46044.739140] [<c10c3eb3>] sys_ioctl+0x63/0x70 Sep 5 23:52:34 del kernel: [46044.751967] [<c12fca3d>] syscall_call+0x7/0xb Sep 5 23:52:34 del kernel: [46044.764734] [<c12f0000>] ? cookie_v6_check+0x3d0/0x630 --------------> This patch fixes the warning: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by pppd/1717: #0: (rtnl_mutex){+.+.+.}, at: [<c125dc1f>] rtnl_lock+0xf/0x20 stack backtrace: Pid: 1717, comm: pppd Not tainted 2.6.35.4a #3 Call Trace: [<c12f9aba>] ? printk+0x18/0x1e [<c1053969>] lockdep_rcu_dereference+0xa9/0xb0 [<c12b7235>] trie_firstleaf+0x65/0x70 [<c12b74d4>] fib_table_flush+0x14/0x170 ... Allow trie_firstleaf() to be called either under rcu_read_lock() protection or with RTNL held. The same annotation is added to node_parent_rcu() to prevent a similar warning a bit later. Followup of commits 634a4b20 and 4eaa0e3c. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08ipvs: fix active FTPJulian Anastasov
- Do not create expectation when forwarding the PORT command to avoid blocking the connection. The problem is that nf_conntrack_ftp.c:help() tries to create the same expectation later in POST_ROUTING and drops the packet with "dropping packet" message after failure in nf_ct_expect_related. - Change ip_vs_update_conntrack to alter the conntrack for related connections from real server. If we do not alter the reply in this direction the next packet from client sent to vport 20 comes as NEW connection. We alter it but may be some collision happens for both conntracks and the second conntrack gets destroyed immediately. The connection stucks too. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08gro: Re-fix different skb headroomsJarek Poplawski
The patch: "gro: fix different skb headrooms" in its part: "2) allocate a minimal skb for head of frag_list" is buggy. The copied skb has p->data set at the ip header at the moment, and skb_gro_offset is the length of ip + tcp headers. So, after the change the length of mac header is skipped. Later skb_set_mac_header() sets it into the NET_SKB_PAD area (if it's long enough) and ip header is misaligned at NET_SKB_PAD + NET_IP_ALIGN offset. There is no reason to assume the original skb was wrongly allocated, so let's copy it as it was. bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626 fixes commit: 3d3be4333fdf6faa080947b331a6a19bce1a4f57 Reported-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits) pkt_sched: Fix lockdep warning on est_tree_lock in gen_estimator ipvs: avoid oops for passive FTP Revert "sky2: don't do GRO on second port" gro: fix different skb headrooms bridge: Clear INET control block of SKBs passed into ip_fragment(). 3c59x: Remove incorrect locking; correct documented lock hierarchy sky2: don't do GRO on second port ipv4: minor fix about RPF in help of Kconfig xfrm_user: avoid a warning with some compiler net/sched/sch_hfsc.c: initialize parent's cl_cfmin properly in init_vf() pxa168_eth: fix a mdiobus leak net sched: fix kernel leak in act_police vhost: stop worker only if created MAINTAINERS: Add ehea driver as Supported ath9k_hw: fix parsing of HT40 5 GHz CTLs ath9k_hw: Fix EEPROM uncompress block reading on AR9003 wireless: register wiphy rfkill w/o holding cfg80211_mutex netlink: Make NETLINK_USERSOCK work again. irda: Correctly clean up self->ias_obj on irda_bind() failure. wireless extensions: fix kernel heap content leak ...
2010-09-07ipv4: Fix reverse path filtering with multipath routing.David S. Miller
Actually iterate over the next-hops to make sure we have a device match. Otherwise RP filtering is always elided when the route matched has multiple next-hops. Reported-by: Igor M Podlesny <for.poige@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-07UNIX: Do not loop forever at unix_autobind().Tetsuo Handa
We assumed that unix_autobind() never fails if kzalloc() succeeded. But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can consume all names using fork()/socket()/bind(). If all names are in use, those who call bind() with addr_len == sizeof(short) or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue while (1) yield(); loop at unix_autobind() till a name becomes available. This patch adds a loop counter in order to give up after 1048576 attempts. Calling yield() for once per 256 attempts may not be sufficient when many names are already in use, for __unix_find_socket_byname() can take long time under such circumstance. Therefore, this patch also adds cond_resched() call. Note that currently a local user can consume 2GB of kernel memory if the user is allowed to create and autobind 1048576 UNIX domain sockets. We should consider adding some restriction for autobind operation. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-07irda: off by oneDan Carpenter
This is an off by one. We would go past the end when we NUL terminate the "value" string at end of the function. The "value" buffer is allocated in irlan_client_parse_response() or irlan_provider_parse_command(). CC: stable@kernel.org Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-07netfilter: discard overlapping IPv6 fragmentNicolas Dichtel
RFC5722 prohibits reassembling IPv6 fragments when some data overlaps. Bug spotted by Zhang Zuotao <zuotao.zhang@6wind.com>. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-07ipv6: discard overlapping fragmentNicolas Dichtel
RFC5722 prohibits reassembling fragments when some data overlaps. Bug spotted by Zhang Zuotao <zuotao.zhang@6wind.com>. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-07net: fix tx queue selection for bridged devices implementing select_queueHelmut Schaa
When a net device is implementing the select_queue callback and is part of a bridge, frames coming from the bridge already have a tx queue associated to the socket (introduced in commit a4ee3ce3293dc931fab19beb472a8bde1295aebe, "net: Use sk_tx_queue_mapping for connected sockets"). The call to sk_tx_queue_get will then return the tx queue used by the bridge instead of calling the select_queue callback. In case of mac80211 this broke QoS which is implemented by using the select_queue callback. Furthermore it introduced problems with rt2x00 because frames with the same TID and RA sometimes appeared on different tx queues which the hw cannot handle correctly. Fix this by always calling select_queue first if it is available and only afterwards use the socket tx queue mapping. Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-02pkt_sched: Fix lockdep warning on est_tree_lock in gen_estimatorJarek Poplawski
This patch fixes a lockdep warning: [ 516.287584] ========================================================= [ 516.288386] [ INFO: possible irq lock inversion dependency detected ] [ 516.288386] 2.6.35b #7 [ 516.288386] --------------------------------------------------------- [ 516.288386] swapper/0 just changed the state of lock: [ 516.288386] (&qdisc_tx_lock){+.-...}, at: [<c12eacda>] est_timer+0x62/0x1b4 [ 516.288386] but this lock took another, SOFTIRQ-unsafe lock in the past: [ 516.288386] (est_tree_lock){+.+...} [ 516.288386] [ 516.288386] and interrupts could create inverse lock ordering between them. ... So, est_tree_lock needs BH protection because it's taken by qdisc_tx_lock, which is used both in BH and process contexts. (Full warning with this patch at netdev, 02 Sep 2010.) Fixes commit: ae638c47dc040b8def16d05dc6acdd527628f231 ("pkt_sched: gen_estimator: add a new lock") Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-02ipvs: avoid oops for passive FTPJulian Anastasov
Fix Passive FTP problem in ip_vs_ftp: - Do not oops in nf_nat_set_seq_adjust (adjust_tcp_sequence) when iptable_nat module is not loaded Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-01gro: fix different skb headroomsEric Dumazet
Packets entering GRO might have different headrooms, even for a given flow (because of implementation details in drivers, like copybreak). We cant force drivers to deliver packets with a fixed headroom. 1) fix skb_segment() skb_segment() makes the false assumption headrooms of fragments are same than the head. When CHECKSUM_PARTIAL is used, this can give csum_start errors, and crash later in skb_copy_and_csum_dev() 2) allocate a minimal skb for head of frag_list skb_gro_receive() uses netdev_alloc_skb(headroom + skb_gro_offset(p)) to allocate a fresh skb. This adds NET_SKB_PAD to a padding already provided by netdevice, depending on various things, like copybreak. Use alloc_skb() to allocate an exact padding, to reduce cache line needs: NET_SKB_PAD + NET_IP_ALIGN bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626 Many thanks to Plamen Petrov, testing many debugging patches ! With help of Jarek Poplawski. Reported-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>