aboutsummaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)Author
2009-12-08net: fix sk_forward_alloc corruptionEric Dumazet
[ Upstream commit: 9d410c796067686b1e032d54ce475b7055537138 ] On UDP sockets, we must call skb_free_datagram() with socket locked, or risk sk_forward_alloc corruption. This requirement is not respected in SUNRPC. Add a convenient helper, skb_free_datagram_locked() and use it in SUNRPC Reported-by: Francis Moreau <francis.moro@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12sit: fix off-by-one in ipip6_tunnel_get_prlSascha Hlusiak
[ Upstream commit 298bf12ddb25841804f26234a43b89da1b1c0e21 ] When requesting all prl entries (kprl.addr == INADDR_ANY) and there are more prl entries than there is space passed from userspace, the existing code would always copy cmax+1 entries, which is more than can be handled. This patch makes the kernel copy only exactly cmax entries. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Acked-By: Fred L. Templin <Fred.L.Templin@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-23ipv6: Fix commit 63d9950b08184e6531adceb65f64b429909cc101 (ipv6: Make ↵Bruno Prémont
v4-mapped bindings consistent with IPv4) Commit 63d9950b08184e6531adceb65f64b429909cc101 (ipv6: Make v4-mapped bindings consistent with IPv4) changes behavior of inet6_bind() for v4-mapped addresses so it should behave the same way as inet_bind(). During this change setting of err to -EADDRNOTAVAIL got lost: af_inet.c:469 inet_bind() err = -EADDRNOTAVAIL; if (!sysctl_ip_nonlocal_bind && !(inet->freebind || inet->transparent) && addr->sin_addr.s_addr != htonl(INADDR_ANY) && chk_addr_ret != RTN_LOCAL && chk_addr_ret != RTN_MULTICAST && chk_addr_ret != RTN_BROADCAST) goto out; af_inet6.c:463 inet6_bind() if (addr_type == IPV6_ADDR_MAPPED) { int chk_addr_ret; /* Binding to v4-mapped address on a v6-only socket * makes no sense */ if (np->ipv6only) { err = -EINVAL; goto out; } /* Reproduce AF_INET checks to make the bindings consitant */ v4addr = addr->sin6_addr.s6_addr32[3]; chk_addr_ret = inet_addr_type(net, v4addr); if (!sysctl_ip_nonlocal_bind && !(inet->freebind || inet->transparent) && v4addr != htonl(INADDR_ANY) && chk_addr_ret != RTN_LOCAL && chk_addr_ret != RTN_MULTICAST && chk_addr_ret != RTN_BROADCAST) goto out; } else { Signed-off-by Bruno Prémont <bonbons@linux-vserver.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-20tcp: Use correct peer adr when copying MD5 keysJohn Dykstra
When the TCP connection handshake completes on the passive side, a variety of state must be set up in the "child" sock, including the key if MD5 authentication is being used. Fix TCP for both address families to label the key with the peer's destination address, rather than the address from the listening sock, which is usually the wildcard. Reported-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: John Dykstra <john.dykstra1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-20tcp: Fix MD5 signature checking on IPv4 mapped socketsJohn Dykstra
Fix MD5 signature checking so that an IPv4 active open to an IPv6 socket can succeed. In particular, use the correct address family's signature generation function for the SYN/ACK. Reported-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: John Dykstra <john.dykstra1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-11sit: fix regression: do not release skb->dst before xmitSascha Hlusiak
The sit module makes use of skb->dst in it's xmit function, so since 93f154b594fe47 ("net: release dst entry in dev_hard_start_xmit()") sit tunnels are broken, because the flag IFF_XMIT_DST_RELEASE is not unset. This patch unsets that flag for sit devices to fix this regression. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-11net: ip_push_pending_frames() fixEric Dumazet
After commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) we do not take any more references on sk->sk_refcnt on outgoing packets. I forgot to delete two __sock_put() from ip_push_pending_frames() and ip6_push_pending_frames(). Reported-by: Emil S Tantilov <emils.tantilov@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Emil S Tantilov <emils.tantilov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-03IPv6: preferred lifetime of address not getting updatedBrian Haley
There's a bug in addrconf_prefix_rcv() where it won't update the preferred lifetime of an IPv6 address if the current valid lifetime of the address is less than 2 hours (the minimum value in the RA). For example, If I send a router advertisement with a prefix that has valid lifetime = preferred lifetime = 2 hours we'll build this address: 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic valid_lft 7175sec preferred_lft 7175sec If I then send the same prefix with valid lifetime = preferred lifetime = 0 it will be ignored since the minimum valid lifetime is 2 hours: 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic valid_lft 7161sec preferred_lft 7161sec But according to RFC 4862 we should always reset the preferred lifetime even if the valid lifetime is invalid, which would cause the address to immediately get deprecated. So with this patch we'd see this: 5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2001:1890:1109:a20:21f:29ff:fe5a:ef04/64 scope global deprecated dynamic valid_lft 7163sec preferred_lft 0sec The comment winds-up being 5x the size of the code to fix the problem. Update the preferred lifetime of IPv6 addresses derived from a prefix info option in a router advertisement even if the valid lifetime in the option is invalid, as specified in RFC 4862 Section 5.5.3e. Fixes an issue where an address will not immediately become deprecated. Reported by Jens Rosenboom. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-03xfrm6: fix the proto and ports decode of sctp protocolWei Yongjun
The SCTP pushed the skb above the sctp chunk header, so the check of pskb_may_pull(skb, nh + offset + 1 - skb->data) in _decode_session6() will never return 0 and the ports decode of sctp will always fail. (nh + offset + 1 - skb->data < 0) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-26inet: Call skb_orphan before tproxy activatesHerbert Xu
As transparent proxying looks up the socket early and assigns it to the skb for later processing, we must drop any existing socket ownership prior to that in order to distinguish between the case where tproxy is active and where it is not. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-26ipv6: Use rcu_barrier() on module unload.Jesper Dangaard Brouer
The ipv6 module uses rcu_call() thus it should use rcu_barrier() on module unload. Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-25ipv6: avoid wraparound for expired preferred lifetimeJens Rosenboom
Avoid showing wrong high values when the preferred lifetime of an address is expired. Signed-off-by: Jens Rosenboom <me@jayr.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-23ipv6: Use correct data types for ICMPv6 type and codeBrian Haley
Change all the code that deals directly with ICMPv6 type and code values to use u8 instead of a signed int as that's the actual data type. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18net: correct off-by-one write allocations reportsEric Dumazet
commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) changed initial sk_wmem_alloc value. We need to take into account this offset when reporting sk_wmem_alloc to user, in PROC_FS files or various ioctls (SIOCOUTQ/TIOCOUTQ) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-15Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/scsi/fcoe/fcoe.c net/core/drop_monitor.c net/core/net-traces.c
2009-06-14PIM-SM: namespace changesTom Goff
IPv4: - make PIM register vifs netns local - set the netns when a PIM register vif is created - make PIM available in all network namespaces (if CONFIG_IP_PIMSM_V2) by adding the protocol handler when multicast routing is initialized IPv6: - make PIM register vifs netns local - make PIM available in all network namespaces (if CONFIG_IPV6_PIMSM_V2) by adding the protocol handler when multicast routing is initialized Signed-off-by: Tom Goff <thomas.goff@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-12trivial: Fix a typo in comment of addrconf_dad_start()Masatake YAMATO
Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12trivial: Kconfig: .ko is normally not included in module namesPavel Machek
.ko is normally not included in Kconfig help, make it consistent. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-11Merge branch 'master' of ↵Patrick McHardy
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
2009-06-11net: No more expensive sock_hold()/sock_put() on each txEric Dumazet
One of the problem with sock memory accounting is it uses a pair of sock_hold()/sock_put() for each transmitted packet. This slows down bidirectional flows because the receive path also needs to take a refcount on socket and might use a different cpu than transmit path or transmit completion path. So these two atomic operations also trigger cache line bounces. We can see this in tx or tx/rx workloads (media gateways for example), where sock_wfree() can be in top five functions in profiles. We use this sock_hold()/sock_put() so that sock freeing is delayed until all tx packets are completed. As we also update sk_wmem_alloc, we could offset sk_wmem_alloc by one unit at init time, until sk_free() is called. Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc) to decrement initial offset and atomicaly check if any packets are in flight. skb_set_owner_w() doesnt call sock_hold() anymore sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc reached 0 to perform the final freeing. Drawback is that a skb->truesize error could lead to unfreeable sockets, or even worse, prematurely calling __sk_free() on a live socket. Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt contention point. 5 % speedup on a UDP transmit workload (depends on number of flows), lowering TX completion cpu usage. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09netfilter: Use frag list abstraction interfaces.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-09ipv6: Use frag list abstraction interfaces.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-08netfilter: nf_ct_icmp: keep the ICMP ct entries longerJan Kasprzak
Current conntrack code kills the ICMP conntrack entry as soon as the first reply is received. This is incorrect, as we then see only the first ICMP echo reply out of several possible duplicates as ESTABLISHED, while the rest will be INVALID. Also this unnecessarily increases the conntrackd traffic on H-A firewalls. Make all the ICMP conntrack entries (including the replied ones) last for the default of nf_conntrack_icmp{,v6}_timeout seconds. Signed-off-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-04netfilter: x_tables: added hook number into match extension parameter structure.Evgeniy Polyakov
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-03net: skb->dst accessorsEric Dumazet
Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-02netfilter: conntrack: simplify event caching systemPablo Neira Ayuso
This patch simplifies the conntrack event caching system by removing several events: * IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted since the have no clients. * IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter days. * IPCT_REFRESH which is not of any use since we always include the timeout in the messages. After this patch, the existing events are: * IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify addition and deletion of entries. * IPCT_STATUS, that notes that the status bits have changes, eg. IPS_SEEN_REPLY and IPS_ASSURED. * IPCT_PROTOINFO, that reports that internal protocol information has changed, eg. the TCP, DCCP and SCTP protocol state. * IPCT_HELPER, that a helper has been assigned or unassigned to this entry. * IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this covers the case when a mark is set to zero. * IPCT_NATSEQADJ, to report that there's updates in the NAT sequence adjustment. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02Merge branch 'master' of git://dev.medozas.de/linuxPatrick McHardy
2009-06-02IPv6: Print error value when skb allocation failsBrian Haley
Print-out the error value when sock_alloc_send_skb() fails in the IPv6 neighbor discovery code - can be useful for debugging. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-01IPv6: Add 'autoconf' and 'disable_ipv6' module parametersBrian Haley
Add 'autoconf' and 'disable_ipv6' parameters to the IPv6 module. The first controls if IPv6 addresses are autoconfigured from prefixes received in Router Advertisements. The IPv6 loopback (::1) and link-local addresses are still configured. The second controls if IPv6 addresses are desired at all. No IPv6 addresses will be added to any interfaces. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27gro: Avoid unnecessary comparison after skb_gro_headerHerbert Xu
For the overwhelming majority of cases, skb_gro_header's return value cannot be NULL. Yet we must check it because of its current form. This patch splits it up into multiple functions in order to avoid this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath/ath5k/phy.c drivers/net/wireless/iwlwifi/iwl-agn.c drivers/net/wireless/iwlwifi/iwl3945-base.c
2009-05-22tcp: Unexport TCPv6 GRO functionsHerbert Xu
Sinec the TCPv6 GRO functions are used in the same file where they are defined, we do not need to export them. This was a cut-n-paste from the IPv4 code which does need to export them. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-20IPv6: set RTPROT_KERNEL to initial routeJean-Mickael Guerin
The use of unspecified protocol in IPv6 initial route prevents quagga to install IPv6 default route: # show ipv6 route S ::/0 [1/0] via fe80::1, eth1_0 K>* ::/0 is directly connected, lo, rej C>* ::1/128 is directly connected, lo C>* fe80::/64 is directly connected, eth1_0 # ip -6 route fe80::/64 dev eth1_0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit -1 ff00::/8 dev eth1_0 metric 256 mtu 1500 advmss 1440 hoplimit -1 unreachable default dev lo proto none metric -1 error -101 hoplimit 255 The attached patch ensures RTPROT_KERNEL to the default initial route and fixes the problem for quagga. This is similar to "ipv6: protocol for address routes" f410a1fba7afa79d2992620e874a343fdba28332. # show ipv6 route S>* ::/0 [1/0] via fe80::1, eth1_0 C>* ::1/128 is directly connected, lo C>* fe80::/64 is directly connected, eth1_0 # ip -6 route fe80::/64 dev eth1_0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit -1 fe80::/64 dev eth1_0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit -1 ff00::/8 dev eth1_0 metric 256 mtu 1500 advmss 1440 hoplimit -1 default via fe80::1 dev eth1_0 proto zebra metric 1024 mtu 1500 advmss 1440 hoplimit -1 unreachable default dev lo proto kernel metric -1 error -101 hoplimit 255 Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-20net: Remove unused parameter from fill method in fib_rules_ops.Rami Rosen
The netlink message header (struct nlmsghdr) is an unused parameter in fill method of fib_rules_ops struct. This patch removes this parameter from this method and fixes the places where this method is called. (include/net/fib_rules.h) Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19sit: stateless autoconf for isatapSascha Hlusiak
be sent periodically. The rs_delay can be speficied when adding the PRL entry and defaults to 15 minutes. The RS is sent from every link local adress that's assigned to the tunnel interface. It's directed to the (guessed) linklocal address of the router and is sent through the tunnel. Better: send to ff02::2 encapsuled in unicast directed to router-v4. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19addrconf: refuse isatap eui64 for INADDR_ANYSascha Hlusiak
A tunnel with no local ipv4 endpoint would otherwise use the ISATAP linklocal address fe80::5efe:0:0, which is invalid. Rather not add a linklocal address at all. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19sit: ipip6_tunnel_del_prl: return errSascha Hlusiak
Typo. When deleting a PRL entry, return status to userspace instead of success. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19sit: strictly restrict incoming traffic to tunnel link deviceSascha Hlusiak
Check link device when looking up a tunnel. When a tunnel is linked to a interface, traffic from a different interface must not reach the tunnel. This also allows creating of multiple tunnels with the same endpoints, if the link device differs. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19sit: Fail to create tunnel, if it already existsSascha Hlusiak
When locating the tunnel, do not continue if it is found. Otherwise a different tunnel with similar configuration would be returned and parts could be overwritten. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18net: FIX ipv6_forward sysctl restartEric W. Biederman
Just returning -ERESTARTSYS without a signal pending is not good that will just leak it to userspace. We need return -ERESTARTNOINTR so we always restart and set signal pending so that we fall of the fast path of syscall return and setup the system call restart. So use restart_syscall() which does all of this for us. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17net: remove needless (now buggy) & from dev->dev_addrJiri Pirko
Patch fixes issues with dev->dev_addr changing from array to pointer. Hopefully there are no others. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17ipv4: remove an unused parameter from configure method of fib_rules_ops.Rami Rosen
Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-08Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: include/net/tcp.h
2009-05-08netfilter: xtables: consolidate comefrom debug cast accessJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08netfilter: xtables: remove another level of indentJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08netfilter: xtables: remove some gotoJan Engelhardt
Combining two ifs, and goto is easily gone. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08netfilter: xtables: reduce indent level by oneJan Engelhardt
Cosmetic only. Transformation applied: -if (foo) { long block; } else { short block; } +if (!foo) { short block; continue; } long block; Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08netfilter: xtables: consolidate open-coded logicJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08netfilter: xtables: fix const inconsistencyJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08netfilter: xtables: remove redundant castsJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>