linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2008-02-08	Netfilter: bridge-netfilter: fix net_device refcnt leaks	Patrick McHardy
	[NETFILTER]: bridge-netfilter: fix net_device refcnt leaks Upstream commit 2dc2f207fb251666d2396fe1a69272b307ecc333 When packets are flood-forwarded to multiple output devices, the bridge-netfilter code reuses skb->nf_bridge for each clone to store the bridge port. When queueing packets using NFQUEUE netfilter takes a reference to skb->nf_bridge->physoutdev, which is overwritten when the packet is forwarded to the second port. This causes refcount unterflows for the first device and refcount leaks for all others. Additionally this provides incorrect data to the iptables physdev match. Unshare skb->nf_bridge by copying it if it is shared before assigning the physoutdev device. Reported, tested and based on initial patch by Jan Christoph Nordholz <hesso@pool.math.tu-berlin.de>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-08	Netfilter: bridge: fix double POST_ROUTING invocation	Patrick McHardy
	[NETFILTER]: bridge: fix double POST_ROUTING invocation Upstream commit 2948d2ebbb98747b912ac6d0c864b4d02be8a6f5 The bridge code incorrectly causes two POST_ROUTING hook invocations for DNATed packets that end up on the same bridge device. This happens because packets with a changed destination address are passed to dst_output() to make them go through the neighbour output function again to build a new destination MAC address, before they will continue through the IP hooks simulated by bridge netfilter. The resulting hook order is: PREROUTING (bridge netfilter) POSTROUTING (dst_output -> ip_output) FORWARD (bridge netfilter) POSTROUTING (bridge netfilter) The deferred hooks used to abort the first POST_ROUTING invocation, but since the only thing bridge netfilter actually really wants is a new MAC address, we can avoid going through the IP stack completely by simply calling the neighbour output function directly. Tested, reported and lots of data provided by: Damien Thebault <damien.thebault@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-08	IPV4: ip_gre: set mac_header correctly in receive path	Timo Teras
	[IPV4] ip_gre: set mac_header correctly in receive path [ Upstream commit: 1d0691674764098304ae4c63c715f5883b4d3784 ] mac_header update in ipgre_recv() was incorrectly changed to skb_reset_mac_header() when it was introduced. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-08	NET: Correct two mistaken skb_reset_mac_header() conversions.	David Miller
	[NET]: Correct two mistaken skb_reset_mac_header() conversions. [ Upstream commit: c6e6ca712b5cc06a662f900c0484d49d7334af64 ] This operation helper abstracts: skb->mac_header = skb->data; but it was done in two more places which were actually: skb->mac_header = skb->network_header; and those are corrected here. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-08	IPSEC: Avoid undefined shift operation when testing algorithm ID	Herbert Xu
	[IPSEC]: Avoid undefined shift operation when testing algorithm ID [ Upstream commit: f398035f2dec0a6150833b0bc105057953594edb ] The aalgos/ealgos fields are only 32 bits wide. However, af_key tries to test them with the expression 1 << id where id can be as large as 253. This produces different behaviour on different architectures. The following patch explicitly checks whether ID is greater than 31 and fails the check if that's the case. We cannot easily extend the mask to be longer than 32 bits due to exposure to user-space. Besides, this whole interface is obsolete anyway in favour of the xfrm_user interface which doesn't use this bit mask in templates (well not within the kernel anyway). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-08	IPV4 ROUTE: ip_rt_dump() is unecessary slow	Eric Dumazet
	[IPV4] ROUTE: ip_rt_dump() is unecessary slow [ Upstream commit: d8c9283089287341c85a0a69de32c2287a990e71 ] I noticed "ip route list cache x.y.z.t" can be very slow. While strace-ing -T it I also noticed that first part of route cache is fetched quite fast : recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202 GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.000047> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\ 202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.000042> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\ 202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3740 <0.000055> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\ 202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.000043> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\ 202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3732 <0.000053> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202 GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3708 <0.000052> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202 GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3680 <0.000041> while the part at the end of the table is more expensive: recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3656 <0.003857> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.003891> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.003765> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3700 <0.003879> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3676 <0.003797> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3724 <0.003856> recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.003848> The following patch corrects this performance/latency problem, removing quadratic behavior. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-08	ATM: Check IP header validity in mpc_send_packet	Herbert Xu
	[ATM]: Check IP header validity in mpc_send_packet [ Upstream commit: 1c9b7aa1eb40ab708ef3242f74b9a61487623168 ] Al went through the ip_fast_csum callers and found this piece of code that did not validate the IP header. While root crashing the machine by sending bogus packets through raw or AF_PACKET sockets isn't that serious, it is still nice to react gracefully. This patch ensures that the skb has enough data for an IP header and that the header length field is valid. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-08	INET: Fix netdev renaming and inet address labels	Mark McLoughlin
	[INET]: Fix netdev renaming and inet address labels [ Upstream commit: 44344b2a85f03326c7047a8c861b0c625c674839 ] When re-naming an interface, the previous secondary address labels get lost e.g. $> brctl addbr foo $> ip addr add 192.168.0.1 dev foo $> ip addr add 192.168.0.2 dev foo label foo:00 $> ip addr show dev foo \| grep inet inet 192.168.0.1/32 scope global foo inet 192.168.0.2/32 scope global foo:00 $> ip link set foo name bar $> ip addr show dev bar \| grep inet inet 192.168.0.1/32 scope global bar inet 192.168.0.2/32 scope global bar:2 Turns out to be a simple thinko in inetdev_changename() - clearly we want to look at the address label, rather than the device name, for a suffix to retain. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-08	IRDA: irda_create() nuke user triggable printk	maximilian attems
	[IRDA]: irda_create() nuke user triggable printk [ Upstream commit: 9e8d6f8959c356d8294d45f11231331c3e1bcae6 ] easy to trigger as user with sfuzz. irda_create() is quiet on unknown sock->type, match this behaviour for SOCK_DGRAM unknown protocol Signed-off-by: maximilian attems <max@stro.at> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-08	X25: Add missing x25_neigh_put	Julia Lawall
	[X25]: Add missing x25_neigh_put [ Upstream commit: 76975f8a3186dae501584d0155ea410464f62815 ] The function x25_get_neigh increments a reference count. At the point of the second goto out, the result of calling x25_get_neigh is only stored in a local variable, and thus no one outside the function will be able to decrease the reference count. Thus, x25_neigh_put should be called before the return in this case. The problem was found using the following semantic match. (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ type T,T1,T2; identifier E; statement S; expression x1,x2,x3; int ret; @@ T E; ... * if ((E = x25_get_neigh(...)) == NULL) S ... when != x25_neigh_put(...,(T1)E,...) when != if (E != NULL) { ... x25_neigh_put(...,(T1)E,...); ...} when != x1 = (T1)E when != E = x3; when any if (...) { ... when != x25_neigh_put(...,(T2)E,...) when != if (E != NULL) { ... x25_neigh_put(...,(T2)E,...); ...} when != x2 = (T2)E ( * return; \| * return ret; ) } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-08	IPV4 raw: Strengthen check on validity of iph->ihl	Herbert Xu
	[IPV4] raw: Strengthen check on validity of iph->ihl [ Upstream commit: f844c74fe07321953e2dd227fe35280075f18f60 ] We currently check that iph->ihl is bounded by the real length and that the real length is greater than the minimum IP header length. However, we did not check the caes where iph->ihl is less than the minimum IP header length. This breaks because some ip_fast_csum implementations assume that which is quite reasonable. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-08	VLAN: Lost rtnl_unlock() in vlan_ioctl()	Pavel Emelyanov
	[VLAN]: Lost rtnl_unlock() in vlan_ioctl() [ Upstream commit: e35de02615f97b785dc6f73cba421cea06bcbd10 ] The SET_VLAN_NAME_TYPE_CMD command w/o CAP_NET_ADMIN capability doesn't release the rtnl lock. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-08	IPSEC: Fix potential dst leak in xfrm_lookup	Herbert Xu
	[IPSEC]: Fix potential dst leak in xfrm_lookup [ Upstream commit: 75b8c133267053c9986a7c8db5131f0e7349e806 ] If we get an error during the actual policy lookup we don't free the original dst while the caller expects us to always free the original dst in case of error. This patch fixes that. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	BRIDGE: Section fix.	Andrew Morton
	WARNING: vmlinux.o(.init.text+0x204e2): Section mismatch: reference to .exit.text:br_fdb_fini (between 'br_init' and 'br_fdb_init') Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	BRIDGE: Properly dereference the br_should_route_hook	Pavel Emelyanov
	[BRIDGE]: Properly dereference the br_should_route_hook [ Upstream commit: 82de382ce8e1c7645984616728dc7aaa057821e4 ] This hook is protected with the RCU, so simple if (br_should_route_hook) br_should_route_hook(...) is not enough on some architectures. Use the rcu_dereference/rcu_assign_pointer in this case. Fixed Stephen's comment concerning using the typeof(). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	NETFILTER: xt_TCPMSS: remove network triggerable WARN_ON	Patrick McHardy
	[NETFILTER]: xt_TCPMSS: remove network triggerable WARN_ON [ Upstream commit: 9dc0564e862b1b9a4677dec2c736b12169e03e99 ] ipv6_skip_exthdr() returns -1 for invalid packets. don't WARN_ON that. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	XFRM: Fix leak of expired xfrm_states	Patrick McHardy
	[XFRM]: Fix leak of expired xfrm_states [ Upstream commit: 5dba4797115c8fa05c1a4d12927a6ae0b33ffc41 ] The xfrm_timer calls __xfrm_state_delete, which drops the final reference manually without triggering destruction of the state. Change it to use xfrm_state_put to add the state to the gc list when we're dropping the last reference. The timer function may still continue to use the state safely since the final destruction does a del_timer_sync(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	NETFILTER: fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK	Jan Engelhardt
	[NETFILTER]: fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK [ Upstream commit: 67b4af297033f5f65999885542f95ba7b562848a ] Fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK When xt_CONNMARK is used outside the mangle table and the user specified "--restore-mark", the connmark_tg_check() function will (correctly) error out, but (incorrectly) forgets to release the L3 conntrack module. Same for xt_CONNSECMARK. Fix is to move the call to acquire the L3 module after the basic constraint checks. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	UNIX: EOF on non-blocking SOCK_SEQPACKET	Florian Zumbiehl
	[UNIX]: EOF on non-blocking SOCK_SEQPACKET [ Upstream commit: 0a11225887fe6cbccd882404dc36ddc50f47daf9 ] I am not absolutely sure whether this actually is a bug (as in: I've got no clue what the standards say or what other implementations do), but at least I was pretty surprised when I noticed that a recv() on a non-blocking unix domain socket of type SOCK_SEQPACKET (which is connection oriented, after all) where the remote end has closed the connection returned -1 (EAGAIN) rather than 0 to indicate end of file. This is a test case: \| #include <sys/types.h> \| #include <unistd.h> \| #include <sys/socket.h> \| #include <sys/un.h> \| #include <fcntl.h> \| #include <string.h> \| #include <stdlib.h> \| \| int main(){ \| int sock; \| struct sockaddr_un addr; \| char buf[4096]; \| int pfds[2]; \| \| pipe(pfds); \| sock=socket(PF_UNIX,SOCK_SEQPACKET,0); \| addr.sun_family=AF_UNIX; \| strcpy(addr.sun_path,"/tmp/foobar_testsock"); \| bind(sock,(struct sockaddr )&addr,sizeof(addr)); \| listen(sock,1); \| if(fork()){ \| close(sock); \| sock=socket(PF_UNIX,SOCK_SEQPACKET,0); \| connect(sock,(struct sockaddr )&addr,sizeof(addr)); \| fcntl(sock,F_SETFL,fcntl(sock,F_GETFL)\|O_NONBLOCK); \| close(pfds[1]); \| read(pfds[0],buf,sizeof(buf)); \| recv(sock,buf,sizeof(buf),0); // <-- this one \| }else accept(sock,NULL,NULL); \| exit(0); \| } If you try it, make sure /tmp/foobar_testsock doesn't exist. The marked recv() returns -1 (EAGAIN) on 2.6.23.9. Below you find a patch that fixes that. Signed-off-by: Florian Zumbiehl <florz@florz.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	TCP: illinois: Incorrect beta usage	Stephen Hemminger
	[TCP] illinois: Incorrect beta usage [ Upstream commit: a357dde9df33f28611e6a3d4f88265e39bcc8880 ] Lachlan Andrew observed that my TCP-Illinois implementation uses the beta value incorrectly: The parameter beta in the paper specifies the amount to decrease by: that is, on loss, W <- W - betaW but in tcp_illinois_ssthresh() uses beta as the amount to decrease to: W <- betaW This bug makes the Linux TCP-Illinois get less-aggressive on uncongested network, hurting performance. Note: since the base beta value is .5, it has no impact on a congested network. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	IPV6: Restore IPv6 when MTU is big enough	Evgeniy Polyakov
	[IPV6]: Restore IPv6 when MTU is big enough [ Upstream commit: d31c7b8fa303eb81311f27b80595b8d2cbeef950 ] Avaid provided test application, so bug got fixed. IPv6 addrconf removes ipv6 inner device from netdev each time cmu changes and new value is less than IPV6_MIN_MTU (1280 bytes). When mtu is changed and new value is greater than IPV6_MIN_MTU, it does not add ipv6 addresses and inner device bac. This patch fixes that. Tested with Avaid's application, which works ok now. Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	DECNET: dn_nl_deladdr() almost always returns no error	Pavel Emelyanov
	[DECNET]: dn_nl_deladdr() almost always returns no error [ Upstream commit: 3ccd86241b277249d5ac08e91eddfade47184520 ] As far as I see from the err variable initialization the dn_nl_deladdr() routine was designed to report errors like "EADDRNOTAVAIL" and probaby "ENODEV". But the code sets this err to 0 after the first nlmsg_parse and goes on, returning this 0 in any case. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	VLAN: Fix nested VLAN transmit bug	Joonwoo Park
	[VLAN]: Fix nested VLAN transmit bug [ Upstream commit: 6ab3b487db77fa98a24560f11a5a8e744b98d877 ] Fix misbehavior of vlan_dev_hard_start_xmit() for recursive encapsulations. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	RXRPC: Add missing select on CRYPTO	David Howells
	[RXRPC]: Add missing select on CRYPTO [ Upstream commit: d5a784b3719ae364f49ecff12a0248f6e4252720 ] AF_RXRPC uses the crypto services, so should depend on or select CRYPTO. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	BRIDGE: Lost call to br_fdb_fini() in br_init() error path	Pavel Emelyanov
	[BRIDGE]: Lost call to br_fdb_fini() in br_init() error path [ Upstream commit: 17efdd45755c0eb8d1418a1368ef7c7ebbe98c6e ] In case the br_netfilter_init() (or any subsequent call) fails, the br_fdb_fini() must be called to free the allocated in br_fdb_init() br_fdb_cache kmem cache. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	PFKEY: Sending an SADB_GET responds with an SADB_GET	Charles Hardin
	[PFKEY]: Sending an SADB_GET responds with an SADB_GET [ Upstream commit: 435000bebd94aae3a7a50078d142d11683d3b193 ] Kernel needs to respond to an SADB_GET with the same message type to conform to the RFC 2367 Section 3.1.5 Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	TCP: MTUprobe: fix potential sk_send_head corruption	Ilpo Järvinen
	[TCP] MTUprobe: fix potential sk_send_head corruption [ Upstream commit: 6e42141009ff18297fe19d19296738b742f861db ] When the abstraction functions got added, conversion here was made incorrectly. As a result, the skb may end up pointing to skb which got included to the probe skb and then was freed. For it to trigger, however, skb_transmit must fail sending as well. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	TCP: Problem bug with sysctl_tcp_congestion_control function	Sam Jansen
	[TCP]: Problem bug with sysctl_tcp_congestion_control function [ Upstream commit: 5487796f0c9475586277a0a7a91211ce5746fa6a ] sysctl_tcp_congestion_control seems to have a bug that prevents it from actually calling the tcp_set_default_congestion_control function. This is not so apparent because it does not return an error and generally the /proc interface is used to configure the default TCP congestion control algorithm. This is present in 2.6.18 onwards and probably earlier, though I have not inspected 2.6.15--2.6.17. sysctl_tcp_congestion_control calls sysctl_string and expects a successful return code of 0. In such a case it actually sets the congestion control algorithm with tcp_set_default_congestion_control. Otherwise, it returns the value returned by sysctl_string. This was correct in 2.6.14, as sysctl_string returned 0 on success. However, sysctl_string was updated to return 1 on success around about 2.6.15 and sysctl_tcp_congestion_control was not updated. Even though sysctl_tcp_congestion_control returns 1, do_sysctl_strategy converts this return code to '0', so the caller never notices the error. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	IPV4: Remove bogus ifdef mess in arp_process	Adrian Bunk
	[IPV4]: Remove bogus ifdef mess in arp_process [ Upstream commit: 3660019e5f96fd9a8b7d4214a96523c0bf7b676d ] The #ifdef's in arp_process() were not only a mess, they were also wrong in the CONFIG_NET_ETHERNET=n and (CONFIG_NETDEV_1000=y or CONFIG_NETDEV_10000=y) cases. Since they are not required this patch removes them. Also removed are some #ifdef's around #include's that caused compile errors after this change. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	NET: Corrects a bug in ip_rt_acct_read()	Eric Dumazet
	[NET]: Corrects a bug in ip_rt_acct_read() [ Upstream commit: 483b23ffa3a5f44767038b0a676d757e0668437e ] It seems that stats of cpu 0 are counted twice, since for_each_possible_cpu() is looping on all possible cpus, including 0 Before percpu conversion of ip_rt_acct, we should also remove the assumption that CPU 0 is online (or even possible) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	netfilter: Fix kernel panic with REDIRECT target.	Evgeniy Polyakov
	This patch fixes a NAT regression in 2.6.23, resulting in a crash when a connection is NATed and matches a conntrack helper after NAT. Please apply, thanks. [NETFILTER]: Fix kernel panic with REDIRECT target. Upstream commit 1f305323ff5b9ddc1a4346d36072bcdb58f3f68a When connection tracking entry (nf_conn) is about to copy itself it can have some of its extension users (like nat) as being already freed and thus not required to be copied. Actually looking at this function I suspect it was copied from nf_nat_setup_info() and thus bug was introduced. Report and testing from David <david@unsolicited.net>. [ Patrick McHardy states: I now understand whats happening: - new connection is allocated without helper - connection is REDIRECTed to localhost - nf_nat_setup_info adds NAT extension, but doesn't initialize it yet - nf_conntrack_alter_reply performs a helper lookup based on the new tuple, finds the SIP helper and allocates a helper extension, causing reallocation because of too little space - nf_nat_move_storage is called with the uninitialized nat extension So your fix is entirely correct, thanks a lot :) ] Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	nf_nat: fix memset error	Li Zefan
	This patch fixes an incorrect memset in the NAT code, causing misbehaviour when unloading and reloading the NAT module. Applies to stable-2.6.22 and stable-2.6.23. Please apply, thanks. [NETFILTER]: nf_nat: fix memset error Upstream commit e0bf9cf15fc30d300b7fbd821c6bc975531fab44 The size passing to memset is the size of a pointer. Fixes misbehaviour when unloading and reloading the NAT module. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	PKT_SCHED: Check subqueue status before calling hard_start_xmit	Peter P Waskiewicz Jr
	[PKT_SCHED]: Check subqueue status before calling hard_start_xmit [ Upstream commit: 5f1a485d5905aa641f33009019b3699076666a4c ] The only qdiscs that check subqueue state before dequeue'ing are PRIO and RR. The other qdiscs, including the default pfifo_fast qdisc, will allow traffic bound for subqueue 0 through to hard_start_xmit. The check for netif_queue_stopped() is done above in pkt_sched.h, so it is unnecessary for qdisc_restart(). However, if the underlying driver is multiqueue capable, and only sets queue states on subqueues, this will allow packets to enter the driver when it's currently unable to process packets, resulting in expensive requeues and driver entries. This patch re-adds the check for the subqueue status before calling hard_start_xmit, so we can try and avoid the driver entry when the queues are stopped. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14	NETFILTER: Fix NULL pointer dereference in nf_nat_move_storage()	Evgeniy Polyakov
	[NETFILTER]: Fix NULL pointer dereference in nf_nat_move_storage() [ Upstream commit: 7799652557d966e49512479f4d3b9079bbc01fff ] Reported by Chuck Ebbert as: https://bugzilla.redhat.com/show_bug.cgi?id=259501#c14 This routine is called each time hash should be replaced, nf_conn has extension list which contains pointers to connection tracking users (like nat, which is right now the only such user), so when replace takes place it should copy own extensions. Loop above checks for own extension, but tries to move higer-layer one, which can lead to above oops. Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16	TCP: Make sure write_queue_from does not begin with NULL ptr (CVE-2007-5501)	Ilpo Järvinen
	patch 96a2d41a3e495734b63bff4e5dd0112741b93b38 in mainline. NULL ptr can be returned from tcp_write_queue_head to cached_skb and then assigned to skb if packets_out was zero. Without this, system is vulnerable to a carefully crafted ACKs which obviously is remotely triggerable. Besides, there's very little that needs to be done in sacktag if there weren't any packets outstanding, just skipping the rest doesn't hurt. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-16	mac80211: make ieee802_11_parse_elems return void	John W. Linville
	patch 67a4cce4a89718d252b61aaf58882c69c0e2f6e3 in mainline. Some APs send management frames with junk padding after the last IE. We already account for a similar problem with some Apple Airport devices, but at least one device is known to send more than a single extra byte. The device in question is the Draytek Vigor2900: http://www.draytek.com.au/products/Vigor2900.php The junk in question looks like an IE that runs off the end of the frame. This cause us to return ParseFailed. Since the frame in question is an association response, this causes us to fail to associate with this AP. The return code from ieee802_11_parse_elems is superfluous. All callers still check for the presence of the specific IEs that interest them anyway. So, remove the return code so the parse never "fails". Acked-by: Michael Wu <flamingice@sourmilk.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16	mac80211: only honor IW_SCAN_THIS_ESSID in STA, IBSS, and AP modes	John W. Linville
	patch d114f399b4da6fa7f9da3bbf1fb841370c11e788 in mainline. The previous IW_SCAN_THIS_ESSID patch left a hole allowing scan requests on interfaces in inappropriate modes. Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16	mac80211: honor IW_SCAN_THIS_ESSID in siwscan ioctl	Bill Moss
	patch 107acb23ba763197d390ae9ffd347f3e2a524d39 in mainline. This patch fixes the problem of associating with wpa_secured hidden AP. Please try out. The original author of this patch is Bill Moss <bmoss@clemson.edu> Signed-off-by: Abhijeet Kolekar <abhijeet.kolekar@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16	mac80211: store SSID in sta_bss_list	John W. Linville
	patch cffdd30d20d163343b1c6de25bcb0cc978a1ebf9 in mainline. Some AP equipment "in the wild" services multiple SSIDs using the same BSSID. This patch changes the key of sta_bss_list to include the SSID as well as the BSSID and the channel so as to prevent one SSID from eclipsing another SSID with the same BSSID. Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16	mac80211: store channel info in sta_bss_list	John W. Linville
	patch 65c107ab3befc37b21d1c970a6159525bc0121b8 in mainline. Some AP equipment "in the wild" uses the same BSSID on multiple channels (particularly "a" vs. "b/g"). This patch changes the key of sta_bss_list to include both the BSSID and the channel so as to prevent a BSSID on one channel from eclipsing the same BSSID on another channel. Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16	mac80211: reorder association debug output	Johannes Berg
	patch 1dd84aa213d0f98a91a1ec9be2f750f5f48e75a0 in mainline. There's no reason to warn about an invalid AID field when the association was denied. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Michael Wu <flamingice@sourmilk.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16	ieee80211: fix TKIP QoS bug	Johannes Berg
	patch e797aa1b7da6bfcb2e19a10ae5ead9aa7aea732b in mainline. The commit 65b6a277 titled "ieee80211: Fix header->qos_ctl endian issue" introduced an endianness bug. Partially revert it. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16	NETFILTER: nf_conntrack_tcp: fix connection reopening	Jozsef Kadlecsik
	Upstream commits: 17311393 + bc34b841 merged together. Merge done by Patrick McHardy <kaber@trash.net> [NETFILTER]: nf_conntrack_tcp: fix connection reopening With your description I could reproduce the bug and actually you were completely right: the code above is incorrect. Somehow I was able to misread RFC1122 and mixed the roles :-(: When a connection is >>closed actively<<, it MUST linger in TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime). However, it MAY >>accept<< a new SYN from the remote TCP to reopen the connection directly from TIME-WAIT state, if it: [...] The fix is as follows: if the receiver initiated an active close, then the sender may reopen the connection - otherwise try to figure out if we hold a dead connection. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16	Fix netlink timeouts.	Patrick McHardy
	[NETLINK]: Fix unicast timeouts [ Upstream commit: c3d8d1e30cace31fed6186a4b8c6b1401836d89c ] Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts by moving the schedule_timeout() call to a new function that doesn't propagate the remaining timeout back to the caller. This means on each retry we start with the full timeout again. ipc/mqueue.c seems to actually want to wait indefinitely so this behaviour is retained. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16	Fix crypto_alloc_comp() error checking.	Herbert Xu
	[IPSEC]: Fix crypto_alloc_comp error checking [ Upstream commit: 4999f3621f4da622e77931b3d33ada6c7083c705 ] The function crypto_alloc_comp returns an errno instead of NULL to indicate error. So it needs to be tested with IS_ERR. This is based on a patch by Vicenç Beltran Querol. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16	Fix SET_VLAN_INGRESS_PRIORITY_CMD error return.	Patrick McHardy
	patch fffe470a803e7f7b74c016291e542a0162761209 in mainline. [VLAN]: Fix SET_VLAN_INGRESS_PRIORITY_CMD ioctl Based on report and patch by Doug Kehn <rdkehn@yahoo.com>: vconfig returns the following error when attempting to execute the set_ingress_map command: vconfig: socket or ioctl error for set_ingress_map: Operation not permitted In vlan.c, vlan_ioctl_handler for SET_VLAN_INGRESS_PRIORITY_CMD sets err = -EPERM and calls vlan_dev_set_ingress_priority. vlan_dev_set_ingress_priority is a void function so err remains at -EPERM and results in the vconfig error (even though the ingress map was set). Fix by setting err = 0 after the vlan_dev_set_ingress_priority call. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16	Fix VLAN address syncing.	Patrick McHardy
	patch d932e04a5e7b146c5f9bf517714b986a432a7594 in mainline. [PATCH] [VLAN]: Don't synchronize addresses while the vlan device is down While the VLAN device is down, the unicast addresses are not configured on the underlying device, so we shouldn't attempt to sync them. Noticed by Dmitry Butskoy <buc@odusz.so-cdu.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16	Fix endianness bug in U32 classifier.	Radu Rendec
	changeset 543821c6f5dea5221426eaf1eac98b100249c7ac in mainline. [PKT_SCHED] CLS_U32: Fix endianness problem with u32 classifier hash masks. While trying to implement u32 hashes in my shaping machine I ran into a possible bug in the u32 hash/bucket computing algorithm (net/sched/cls_u32.c). The problem occurs only with hash masks that extend over the octet boundary, on little endian machines (where htonl() actually does something). Let's say that I would like to use 0x3fc0 as the hash mask. This means 8 contiguous "1" bits starting at b6. With such a mask, the expected (and logical) behavior is to hash any address in, for instance, 192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in bucket 1, then 192.168.0.128/26 in bucket 2 and so on. This is exactly what would happen on a big endian machine, but on little endian machines, what would actually happen with current implementation is 0x3fc0 being reversed (into 0xc03f0000) by htonl() in the userspace tool and then applied to 192.168.x.x in the u32 classifier. When shifting right by 16 bits (rank of first "1" bit in the reversed mask) and applying the divisor mask (0xff for divisor 256), what would actually remain is 0x3f applied on the "168" octet of the address. One could say is this can be easily worked around by taking endianness into account in userspace and supplying an appropriate mask (0xfc03) that would be turned into contiguous "1" bits when reversed (0x03fc0000). But the actual problem is the network address (inside the packet) not being converted to host order, but used as a host-order value when computing the bucket. Let's say the network address is written as n31 n30 ... n0, with n0 being the least significant bit. When used directly (without any conversion) on a little endian machine, it becomes n7 ... n0 n8 ..n15 etc in the machine's registers. Thus bits n7 and n8 would no longer be adjacent and 192.168.64.0/26 and 192.168.128.0/26 would no longer be consecutive. The fix is to apply ntohl() on the hmask before computing fshift, and in u32_hash_fold() convert the packet data to host order before shifting down by fshift. With helpful feedback from Jamal Hadi Salim and Jarek Poplawski. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16	Fix TEQL oops.	Evgeniy Polyakov
	[PKT_SCHED]: Fix OOPS when removing devices from a teql queuing discipline [ Upstream commit: 4f9f8311a08c0d95c70261264a2b47f2ae99683a ] tecl_reset() is called from deactivate and qdisc is set to noop already, but subsequent teql_xmit does not know about it and dereference private data as teql qdisc and thus oopses. not catch it first :) Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16	Fix error returns in sys_socketpair()	David Miller
	patch bf3c23d171e35e6e168074a1514b0acd59cfd81a in mainline. [NET]: Fix error reporting in sys_socketpair(). If either of the two sock_alloc_fd() calls fail, we forget to update 'err' and thus we'll erroneously return zero in these cases. Based upon a report and patch from Rich Paul, and commentary from Chuck Ebbert. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>