Age | Commit message (Collapse) | Author |
|
This fixes the RCU race on bridge delete interface. Basically,
the network device has to be detached from the bridge in the first
step (pre-RCU), rather than later. At that point, no more bridge traffic
will come in, and the other code will not think that network device
is part of a bridge.
This should also fix the XEN test problems. If there is another
2.6.13-stable, add it as well.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Please consider this change for 2.6.13-stable Since BIC is
the default congestion control algorithm, this fix is quite
important.
Missing parenthesis in causes BIC to be slow in increasing congestion
window.
Spotted by Injong Rhee.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Handle better the case where the sender sends full sized
frames initially, then moves to a mode where it trickles
out small amounts of data at a time.
This known problem is even mentioned in the comments
above tcp_grow_window() in tcp_input.c, specifically:
...
* The scheme does not work when sender sends good segments opening
* window and then starts to feed us spagetti. But it should work
* in common situations. Otherwise, we have to rely on queue collapsing.
...
When the sender gives full sized frames, the "struct sk_buff" overhead
from each packet is small. So we'll advertize a larger window.
If the sender moves to a mode where small segments are sent, this
ratio becomes tilted to the other extreme and we start overrunning
the socket buffer space.
tcp_clamp_window() tries to address this, but it's clamping of
tp->window_clamp is a wee bit too aggressive for this particular case.
Fix confirmed by Ion Badulescu.
Signed-off-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@osdl.org>
|
|
Patch from Joel Sing to fix the default congestion control algorithm for incoming connections. If a new congestion control handler is added (via module),
it should become the default for new connections. Instead, the incoming
connections use reno. The cause is incorrect
initialisation causes the tcp_init_congestion_control() function to return
after the initial if test fails.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Acked-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@osdl.org>
|
|
ip_vs_ftp when loaded can create NAT connections with unknown
client port for passive FTP. For such expectations we lookup with
cport=0 on incoming packet but it matches the format of the persistence
templates causing packets to other persistent virtual servers to be
forwarded to real server without creating connection. Later the
reply packets are treated as foreign and not SNAT-ed.
If the IPVS box serves both FTP and other services (eg. HTTP)
for the time we wait for first packet for the FTP data connections with
unknown client port (there can be many), other HTTP connections
that have nothing common to the FTP conn break, i.e. HTTP client
sends SYN to the virtual IP but the SYN+ACK is not NAT-ed properly
in IPVS box and the client box returns RST to real server IP. I.e.
the result can be 10% broken HTTP traffic if 10% of the time
there are passive FTP connections in connecting state. It hurts
only IPVS connections.
This patch changes the connection lookup for packets from
clients:
* introduce IP_VS_CONN_F_TEMPLATE connection flag to mark the
connection as template
* create new connection lookup function just for templates - ip_vs_ct_in_get
* make sure ip_vs_conn_in_get hits only connections with
IP_VS_CONN_F_NO_CPORT flag set when s_port is 0. By this way
we avoid returning template when looking for cport=0 (ftp)
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Chris Wright <chrisw@osdl.org>
|
|
per-socket multicast filters were not being applied to all sockets
in the case of an exact-match bound address, due to an over-exuberant
"return" in the look-up code. Fix below. IPv4 does not have this problem.
Thanks to Hoerdt Mickael for reporting the bug.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: Chris Wright <chrisw@osdl.org>
|
|
In 2.6.13-rcX the MASQUERADE target was changed not to exclude local
packets for better source address consistency. This breaks DHCP clients
using UDP sockets when the DHCP requests are caught by a MASQUERADE rule
because the MASQUERADE target drops packets when no address is configured
on the outgoing interface. This patch makes it ignore packets with a
source address of 0.
Thanks to Rusty for this suggestion.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Chris Wright <chrisw@osdl.org>
|
|
Fix unchecked __get_user that could be tricked into generating a
memory read on an arbitrary address. The result of the read is not
returned directly but you may be able to divine some information about
it, or use the read to cause a crash on some architectures by reading
hardware state. CAN-2005-2492.
Fix from Al Viro, ack from Dave Miller.
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
When we copy 32bit ->msg_control contents to kernel, we walk the same
userland data twice without sanity checks on the second pass.
Second version of this patch: the original broke with 64-bit arches
running 32-bit-compat-mode executables doing sendmsg() syscalls with
unaligned CMSG data areas
Another thing is that we use kmalloc() to allocate and sock_kfree_s()
to free afterwards; less serious, but also needs fixing.
Patch by Al Viro, David Miller, David Woodhouse
(sparc64 clean compile fix from David Miller)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[IPV4]: Reassembly trim not clearing CHECKSUM_HW
This was found by inspection while looking for checksum problems
with the skge driver that sets CHECKSUM_HW. It did not fix the
problem, but it looks like it is needed.
If IP reassembly is trimming an overlapping fragment, it
should reset (or adjust) the hardware checksum flag on the skb.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
[NET]: 2.6.13 breaks libpcap (and tcpdump)
Patrick McHardy says:
Never mind, I got it, we never fall through to the second switch
statement anymore. I think we could simply break when load_pointer
returns NULL. The switch statement will fall through to the default
case and return 0 for all cases but 0 > k >= SKF_AD_OFF.
Here's a patch to do just that.
I left BPF_MSH alone because it's really a hack to calculate the IP
header length, which makes no sense when applied to the special data.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
When a semantic match occurs either success, not found or an error
(for matching unreachable routes/blackholes) is returned. fib_trie
ignores the errors and looks for a different matching route. Treat
results other than "no match" as success and end lookup.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Noticed by Coverity checker.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This trips up a lot of folks reading this code.
Put an unlikely() around the port-exhaustion test
for good measure.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Intention of this bit is to force pushing of the existing
send queue when TCP_CORK or TCP_NODELAY state changes via
setsockopt().
But it's easy to create a situation where the bit never
clears. For example, if the send queue starts empty:
1) set TCP_NODELAY
2) clear TCP_NODELAY
3) set TCP_CORK
4) do small write()
The current code will leave TCP_NAGLE_PUSH set after that
sequence. Unconditionally clearing the bit when new data
is added via skb_entail() solves the problem.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
qdisc_create_dflt() is missing to destroy the newly allocated
default qdisc if the initialization fails resulting in leaks
of all kinds. The only caller in mainline which may trigger
this bug is sch_tbf.c in tbf_create_dflt_qdisc().
Note: qdisc_create_dflt() doesn't fulfill the official locking
requirements of qdisc_destroy() but since the qdisc could
never be seen by the outside world this doesn't matter
and it can stay as-is until the locking of pkt_sched
is cleaned up.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add SNMP_MIB_SENTINEL to the definition of the sctp_snmp_list so that
the output routine in proc correctly terminates. This was causing some
problems running on ia64 systems.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
o Brown paperbag bug - ax25_findbyuid() was always returning a NULL pointer
as the result. Breaks ROSE completly and AX.25 if UID policy set to deny.
o While the list structure of AX.25's UID to callsign mapping table was
properly protected by a spinlock, it's elements were not refcounted
resulting in a race between removal and usage of an element.
Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The socket flag cleanups that went into 2.6.12-rc1 are basically oring
the flags of an old socket into the socket just being created.
Unfortunately that one was just initialized by sock_init_data(), so already
has SOCK_ZAPPED set. As the result zapped sockets are created and all
incoming connection will fail due to this bug which again was carefully
replicated to at least AX.25, NET/ROM or ROSE.
In order to keep the abstraction alive I've introduced sock_copy_flags()
to copy the socket flags from one sockets to another and used that
instead of the bitwise copy thing. Anyway, the idea here has probably
been to copy all flags, so sock_copy_flags() should be the right thing.
With this the ham radio protocols are usable again, so I hope this will
make it into 2.6.13.
Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The checksum needs to be filled in on output, after mangling a packet
ip_summed needs to be reset.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>
Found this bug while doing some scaling testing that created 500K inet
peers.
peer_check_expire() in net/ipv4/inetpeer.c isn't using inet_peer_gc_mintime
correctly and will end up creating an expire timer with less than the
minimum duration, and even zero/negative if enough active peers are
present.
If >65K peers, the timer will be less than inet_peer_gc_mintime, and with
>70K peers, the timer duration will reach zero and go negative.
The timer handler will continue to schedule another zero/negative timer in
a loop until peers can be aged. This can continue for at least a few
minutes or even longer if the peers remain active due to arriving packets
while the loop is occurring.
Bug is present in both 2.4 and 2.6. Same patch will apply to both just
fine.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While I was going through the crypto users recently, I noticed this
bogus kmap in sunrpc. It's totally unnecessary since the crypto
layer will do its own kmap before touching the data. Besides, the
kmap is throwing the return value away.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the tail SKB fits into the window, it is still
benefitical to defer until the goal percentage of
the window is available. This give the application
time to feed more data into the send queue and thus
results in larger TSO frames going out.
Patch from Dmitry Yusupov <dima@neterion.com>.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Most importantly, remove bogus BUG() in receive path.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
An incorrect check made it bail out before doing anything.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes a false-positive from debug_smp_processor_id().
The processor ID is only used to look up crypto_tfm objects.
Any processor ID is acceptable here as long as it is one that is
iterated on by for_each_cpu().
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Based upon a bug report and initial patch by
Ollie Wild.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change operations on rif_lock from spin_{un}lock_bh to
spin_{un}lock_irq{save,restore} equivalents. Some of the
rif_lock critical sections are called from interrupt context via
tr_type_trans->tr_add_rif_info. The TR NIC drivers call tr_type_trans
from their packet receive handlers.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Changing it to how ip_input handles should fix it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
1) We send out a normal sized packet with TSO on to start off.
2) ICMP is received indicating a smaller MTU.
3) We send the current sk_send_head which needs to be fragmented
since it was created before the ICMP event. The first fragment
is then sent out.
At this point the remaining fragment is allocated by tcp_fragment.
However, its size is padded to fit the L1 cache-line size therefore
creating tail-room up to 124 bytes long.
This fragment will also be sitting at sk_send_head.
4) tcp_sendmsg is called again and it stores data in the tail-room of
of the fragment.
5) tcp_push_one is called by tcp_sendmsg which then calls tso_fragment
since the packet as a whole exceeds the MTU.
At this point we have a packet that has data in the head area being
fed to tso_fragment which bombs out.
My take on this is that we shouldn't ever call tcp_fragment on a TSO
socket for a packet that is yet to be transmitted since this creates
a packet on sk_send_head that cannot be extended.
So here is a patch to change it so that tso_fragment is always used
in this case.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When packets hit raw sockets the csum update isn't done yet, do it manually.
Packets can also reach rawv6_rcv on the output path through
ip6_call_ra_chain, in this case skb->ip_summed is CHECKSUM_NONE and this
codepath isn't executed.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Remove unused variable
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This fixes a race during initialization with the NAPI softirq
processing by using an RCU approach.
This race was discovered when refill_skbs() was added to
the setup code.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
we could do one thing (see the patch below): i think it would be useful
to fill up the netlogging skb queue straight at initialization time.
Especially if netpoll is used for dumping alone, the system might not be
in a situation to fill up the queue at the point of crash, so better be
a bit more prepared and keep the pipeline filled.
[ I've modified this to be called earlier - mpm ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add limited retry logic to netpoll_send_skb
Each time we attempt to send, decrement our per-device retry counter.
On every successful send, we reset the counter.
We delay 50us between attempts with up to 20000 retries for a total of
1 second. After we've exhausted our retries, subsequent failed
attempts will try only once until reset by success.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Minor netpoll_send_skb restructuring
Restructure to avoid confusing goto and move some bits out of the
retry loop.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This fixes an obvious deadlock in the netpoll code. netpoll_rx takes the
npinfo->rx_lock. netpoll_rx is also the only caller of arp_reply (through
__netpoll_rx). As such, it is not necessary to take this lock.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Initialize npinfo->rx_flags. The way it stands now, this will have random
garbage, and so will incur a locking penalty even when an rx_hook isn't
registered and we are not active in the netpoll polling code.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Well I've only found one potential cause for the assertion
failure in tcp_mark_head_lost. First of all, this can only
occur if cnt > 1 since tp->packets_out is never zero here.
If it did hit zero we'd have much bigger problems.
So cnt is equal to fackets_out - reordering. Normally
fackets_out is less than packets_out. The only reason
I've found that might cause fackets_out to exceed packets_out
is if tcp_fragment is called from tcp_retransmit_skb with a
TSO skb and the current MSS is greater than the MSS stored
in the TSO skb. This might occur as the result of an expiring
dst entry.
In that case, packets_out may decrease (line 1380-1381 in
tcp_output.c). However, fackets_out is unchanged which means
that it may in fact exceed packets_out.
Previously tcp_retrans_try_collapse was the only place where
packets_out can go down and it takes care of this by decrementing
fackets_out.
So we should make sure that fackets_out is reduced by an appropriate
amount here as well.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>
sendmsg()/recvmsg() syscalls from o32/n32 apps to a 64bit kernel will
cause a kernel memory leak if iov_len > UIO_FASTIOV for each syscall!
This is because both sys_sendmsg() and verify_compat_iovec() kmalloc a
new iovec structure. Only the one from sys_sendmsg() is free'ed.
I wrote a simple test program to confirm this after identifying the
problem:
http://davej.org/programs/testsendmsg.c
Note that the below fix will break solaris_sendmsg()/solaris_recvmsg() as
it also calls verify_compat_iovec() but expects it to malloc internally.
[ I fixed that. -DaveM ]
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need to divide, not multiply. While we're here,
use NSEC_PER_USEC instead of a magic constant.
Based upon a report from Josip Loncaric and a patch
by Andrew Morton.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
Here's a small patch to cleanup NETDEBUG() use in net/ipv4/ for Linux
kernel 2.6.13-rc5. Also weird use of indentation is changed in some
places.
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With the introduction of 'rustynat' in 2.6.11, the old tricks of preventing
NAT of 'untracked' connections (e.g. NOTRACK target in 'raw' table) are no
longer sufficient.
The ip_conntrack_untracked.status |= IPS_NAT_DONE_MASK effectively
prevents iteration of the 'nat' table, but doesn't prevent nat_packet()
to be executed. Since nr_manips is gone in 'rustynat', nat_packet() now
implicitly thinks that it has to do NAT on the packet.
This patch fixes that problem by explicitly checking for
ip_conntrack_untracked in ip_nat_fn().
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The interface needs much redesigning if we wish to allow
normal users to do this in some way.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|