Age | Commit message (Collapse) | Author |
|
When only GSO skb was partially ACKed, no hints are reset,
therefore fastpath_cnt_hint must be tweaked too or else it can
corrupt fackets_out. The corruption to occur, one must have
non-trivial ACK/SACK sequence, so this bug is not very often
that harmful. There's a fackets_out state reset in TCP because
fackets_out is known to be inaccurate and that fixes the issue
eventually anyway.
In case there was also at least one skb that got fully ACKed,
the fastpath_skb_hint is set to NULL which causes a recount for
fastpath_cnt_hint (the old value won't be accessed anymore),
thus it can safely be decremented without additional checking.
Reported by Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
|
|
This fixes kernel bugzilla #5731
It should generate an empty packet for datagram protocols when the
socket is connected, for one.
The check is doubly-wrong because all that a write() can be is a
sendmsg() call with a NULL msg_control and a single entry iovec. No
special semantics should be assigned to it, therefore the zero length
check should be removed entirely.
This matches the behavior of BSD and several other systems.
Alan Cox notes that SuSv3 says the behavior of a zero length write on
non-files is "unspecified", but that's kind of useless since BSD has
defined this behavior for a quarter century and BSD is essentially
what application folks code to.
Based upon a patch from Stephen Hemminger.
Adrian Bunk:
Backported to 2.6.16.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
|
|
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
|
|
This reverts commit 3198d0f16dec0c87071cf26f3f11af9c8f0a009b.
|
|
Static functions mustn't be exported.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
|
|
A static function mustn't be exported.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
|
|
In testing our ESP/AH offload hardware, I discovered an issue with how
AH handles mutable fields in IPv4. RFC 4302 (AH) states the following
on the subject:
For IPv4, the entire option is viewed as a unit; so even
though the type and length fields within most options are immutable
in transit, if an option is classified as mutable, the entire option
is zeroed for ICV computation purposes.
The current implementation does not zero the type and length fields,
resulting in authentication failures when communicating with hosts
that do (i.e. FreeBSD).
I have tested record route and timestamp options (ping -R and ping -T)
on a small network involving Windows XP, FreeBSD 6.2, and Linux hosts,
with one router. In the presence of these options, the FreeBSD and
Linux hosts (with the patch or with the hardware) can communicate.
The Windows XP host simply fails to accept these packets with or
without the patch.
I have also been trying to test source routing options (using
traceroute -g), but haven't had much luck getting this option to work
*without* AH, let alone with.
Signed-off-by: Nick Bowler <nbowler@ellipticsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
|
|
It's possible that new SACK blocks that should trigger new LOST
markings arrive with new data (which previously made is_dupack
false). In addition, I think this fixes a case where we get
a cumulative ACK with enough SACK blocks to trigger the fast
recovery (is_dupack would be false there too).
I'm not completely pleased with this solution because readability
of the code is somewhat questionable as 'is_dupack' in SACK case
is no longer about dupacks only but would mean something like
'lost_marker_work_todo' too... But because of Eifel stuff done
in CA_Recovery, the FLAG_DATA_SACKED check cannot be placed to
the if statement which seems attractive solution. Nevertheless,
I didn't like adding another variable just for that either... :-)
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
|
|
Taken from http://bugzilla.kernel.org/show_bug.cgi?id=8747
Problem Description:
It is related to the possibility to obtain MSG_ERRQUEUE messages from the udp
and raw sockets, both connected and unconnected.
There is a little typo in net/ipv6/icmp.c code, which prevents such messages
to be delivered to the errqueue of the correspond raw socket, when the socket
is CONNECTED. The typo is due to swap of local/remote addresses.
Consider __raw_v6_lookup() function from net/ipv6/raw.c. When a raw socket is
looked up usual way, it is something like:
sk = __raw_v6_lookup(sk, nexthdr, daddr, saddr, IP6CB(skb)->iif);
where "daddr" is a destination address of the incoming packet (IOW our local
address), "saddr" is a source address of the incoming packet (the remote end).
But when the raw socket is looked up for some icmp error report, in
net/ipv6/icmp.c:icmpv6_notify() , daddr/saddr are obtained from the echoed
fragment of the "bad" packet, i.e. "daddr" is the original destination
address of that packet, "saddr" is our local address. Hence, for
icmpv6_notify() must use "saddr, daddr" in its arguments, not "daddr, saddr"
...
Steps to reproduce:
Create some raw socket, connect it to an address, and cause some error
situation: f.e. set ttl=1 where the remote address is more than 1 hop to reach.
Set IPV6_RECVERR .
Then send something and wait for the error (f.e. poll() with POLLERR|POLLIN).
You should receive "time exceeded" icmp message (because of "ttl=1"), but the
socket do not receive it.
If you do not connect your raw socket, you will receive MSG_ERRQUEUE
successfully. (The reason is that for unconnected socket there are no actual
checks for local/remote addresses).
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
As noticed by Jarek Poplawski <jarkao2@o2.pl>, the timer removal in
gen_kill_estimator races with the timer function rearming the timer.
Check whether the timer list is empty before rearming the timer
in the timer function to fix this.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
SCTP currently permits users to bind to link-local addresses,
but doesn't verify that the scope id specified at bind matches
the interface that the address is configured on. It was report
that this can hang a system.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
skb_clone_fraglist is static so it shouldn't be exported.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
The conntrack assigned to locally generated ICMP error is usually the one
assigned to the original packet which has caused the error. But if
the original packet is handled as invalid by nf_conntrack, no conntrack
is assigned to the original packet. Then nf_ct_attach() cannot assign
any conntrack to the ICMP error packet. In that case the current
nf_conntrack_icmp assigns appropriate conntrack to it. But the current
code mistakes the direction of the packet. As a result, NAT code mistakes
the address to be mangled.
To fix the bug, this changes nf_conntrack_icmp not to assign conntrack
to such ICMP error. Actually no address is necessary to be mangled
in this case.
Spotted by Jordan Russell.
Upstream commit ID: 130e7a83d7ec8c5c673225e0fa8ea37b1ed507a5
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
This diff changes the default port range used for outgoing connections,
from "use 32768-61000 in most cases, but use N-4999 on small boxes
(where N is a multiple of 1024, depending on just *how* small the box
is)" to just "use 32768-61000 in all cases".
I don't believe there are any drawbacks to this change, and it keeps
outgoing connection ports farther away from the mess of
IANA-registered ports.
Signed-off-by: Mark Glines <mark@glines.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
sys_setsockopt() do not check properly timeout values for
SO_RCVTIMEO/SO_SNDTIMEO, for example it's possible to set negative timeout
values. POSIX do not defines behaviour for sys_setsockopt in case negative
timeouts, but requires that setsockopt() shall fail with -EDOM if the send and
receive timeout values are too big to fit into the timeout fields in the socket
structure.
In current implementation negative timeout can lead to error messages like
"schedule_timeout: wrong timeout value".
Proposed patch:
- checks tv_usec and returns -EDOM if it is wrong
- do not allows to set negative timeout values (sets 0 instead) and outputs
ratelimited information message about such attempts.
Signed-off-By: Vasily Averin <vvs@sw.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
As mentioned in http://bugzilla.kernel.org/show_bug.cgi?id=5015
The helptext implies that this is on by default.
This may be true on some distros (Fedora/RHEL have it enabled
in /etc/sysctl.conf), but the kernel defaults to it off.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
dereference (CVE-2007-2876)
When creating a new connection by sending an unknown chunk type, we don't
transition to a valid state, causing a NULL pointer dereference in
sctp_packet when accessing sctp_timeouts[SCTP_CONNTRACK_NONE].
Fix by don't creating new conntrack entry if initial state is invalid.
Noticed by Vilmos Nebehaj <vilmos.nebehaj@ramsys.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
This fixes an out-of-boundary condition when the classified
band equals q->bands. Caught by Alexey
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Reverse the sense of the promiscuous-mode tests in ip6_mc_input().
Signed-off-by: Corey Mutter <crm-netdev@mutternet.com>
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
When an IPv6 router is forwarding a packet with a link-local scope source
address off-link, RFC 4007 requires it to send an ICMPv6 destination
unreachable with code 2 ("not neighbor"), but Linux doesn't. Fix below.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
When the server drops its connection, NFS client reconnects using the
same socket after disconnecting. If the new connection's SYN,ACK
doesn't contain the TCP timestamp option and the old connection's did,
tp->tcp_header_len is recomputed assuming no timestamp header but
tp->rx_opt.tstamp_ok remains set. Then tcp_build_and_update_options()
adds in a timestamp option past the end of the allocated TCP header,
overwriting TCP data, or when the data is in skb_shinfo(skb)->frags[],
overwriting skb_shinfo(skb) causing a crash soon after. (The issue was
debugged from such a crash.)
Similarly, wscale_ok and sack_ok also get set based on the SYN,ACK
packet but not reset on disconnect, since they are zeroed out at
initialization. The patch zeroes out the entire tp->rx_opt struct in
tcp_disconnect() to avoid this sort of problem.
Signed-off-by: Srinivas Aji <Aji_Srinivas@emc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
When network device's are renamed, the IPV6 snmp6 code
gets confused. It doesn't track name changes so it will OOPS
when network device's are removed.
The fix is trivial, just unregister/re-register in notify handler.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
__x25_find_socket does a sock_hold.
This adds a missing sock_put in x25_receive_data.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
The clusterip_config_find_get() already increases entries reference
counter, so there is no reason to do it twice in checkentry() callback.
This causes the config to be freed before it is removed from the list,
resulting in a crash when adding the next rule.
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
While porting some changes of the 2.6.21-rc7 pptp/proto_gre conntrack
and nat modules to a 2.4.32 kernel I noticed that the gre_key function
returns a wrong pointer to the GRE key of a version 0 packet thus
corrupting the packet payload.
The intended behaviour for GREv0 packets is to act like
ip_conntrack_proto_generic/ip_nat_proto_unknown so I have ripped the
offending functions (not used anymore) and modified the
ip_nat_proto_gre modules to not touch version 0 (non PPTP) packets.
Signed-off-by: Jorge Boncompte <jorge@dti2.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
A security issue is emerging. Disallow Routing Header Type 0 by default
as we have been doing for IPv4.
This version already includes a fix for the original patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel,
which resulted in infinite recursion and stack overflow.
The bug is present in all kernel versions since the feature appeared.
The patch also makes some minimal cleanup:
1. Return something consistent (-ENOENT) when fib table is missing
2. Do not crash when queue is empty (does not happen, but yet)
3. Put result of lookup
Sergey Vlasov:
Oops fix
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Without this initialization one gets
kernel BUG at kernel/rtmutex_common.h:80!
Signed-off-by: G. Liakhovetski <gl@dsa-ac.de>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
We must reserve SAR + MAX_HEADER bytes for IrLMP to fit in.
This fixes an oops reported (and fixed) by Jeet Chaudhuri, when max_sdu_size
is greater than 0.
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
This patch fixes an oops first reported in mid 2006 - see
http://lkml.org/lkml/2006/8/29/358 The cause of this bug report is that
when an error is signalled on the socket, irda_recvmsg_stream returns
without removing a local wait_queue variable from the socket's sk_sleep
queue. This causes havoc further down the road.
In response to this problem, a patch was made that invoked sock_orphan on
the socket when receiving a disconnect indication. This is not a good fix,
as this sets sk_sleep to NULL, causing applications sleeping in recvmsg
(and other places) to oops.
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
In net poll mode, the current checksum function doesn't consider the
kind of packet which is padded to reach a specific minimum length. I
believe that's the problem causing my test case failed. The following
patch fixed this issue.
Signed-off-by: Aubrey Li <aubreylee@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Userspace uses an integer for TCA_TCINDEX_SHIFT, the kernel was changed
to expect and use a u16 value in 2.6.11, which broke compatibility on
big endian machines. Change back to use int.
Reported by Ole Reinartz <ole.reinartz@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Up until this point we've accepted replay window settings greater than
32 but our bit mask can only accomodate 32 packets. Thus any packet
with a sequence number within the window but outside the bit mask would
be accepted.
This patch causes those packets to be rejected instead.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
In article <20070329.142644.70222545.davem@davemloft.net> (at Thu, 29 Mar 2007 14:26:44 -0700 (PDT)), David Miller <davem@davemloft.net> says:
> From: Sridhar Samudrala <sri@us.ibm.com>
> Date: Thu, 29 Mar 2007 14:17:28 -0700
>
> > The check for length in rawv6_sendmsg() is incorrect.
> > As len is an unsigned int, (len < 0) will never be TRUE.
> > I think checking for IPV6_MAXPLEN(65535) is better.
> >
> > Is it possible to send ipv6 jumbo packets using raw
> > sockets? If so, we can remove this check.
>
> I don't see why such a limitation against jumbo would exist,
> does anyone else?
>
> Thanks for catching this Sridhar. A good compiler should simply
> fail to compile "if (x < 0)" when 'x' is an unsigned type, don't
> you think :-)
Dave, we use "int" for returning value,
so we should fix this anyway, IMHO;
we should not allow len > INT_MAX.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
tp->root is not freed on destruction.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
When we receive an AppleTalk frame shorter than what its header says,
we still attempt to verify its checksum, and trip on the BUG_ON() at
the end of function atalk_sum_skb() because of the length mismatch.
This has security implications because this can be triggered by simply
sending a specially crafted ethernet frame to a target victim,
effectively crashing that host. Thus this qualifies, I think, as a
remote DoS. Here is the frame I used to trigger the crash, in npg
format:
<Appletalk Killer>
{
# Ethernet header -----
XX XX XX XX XX XX # Destination MAC
00 00 00 00 00 00 # Source MAC
00 1D # Length
# LLC header -----
AA AA 03
08 00 07 80 9B # Appletalk
# Appletalk header -----
00 1B # Packet length (invalid)
00 01 # Fake checksum
00 00 00 00 # Destination and source networks
00 00 00 00 # Destination and source nodes and ports
# Payload -----
0C 0D 0E 0F 10 11 12 13
14
}
The destination MAC address must be set to those of the victim.
The severity is mitigated by two requirements:
* The target host must have the appletalk kernel module loaded. I
suspect this isn't so frequent.
* AppleTalk frames are non-IP, thus I guess they can only travel on
local networks. I am no network expert though, maybe it is possible
to somehow encapsulate AppleTalk packets over IP.
The bug has been reported back in June 2004:
http://bugzilla.kernel.org/show_bug.cgi?id=2979
But it wasn't investigated, and was closed in July 2006 as both
reporters had vanished meanwhile.
This code was new in kernel 2.6.0-test5:
http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=7ab442d7e0a76402c12553ee256f756097cae2d2
And not modified since then, so we can assume that vanilla kernels
2.6.0-test5 and later, and distribution kernels based thereon, are
affected.
Note that I still do not know for sure what triggered the bug in the
real-world cases. The frame could have been corrupted by the kernel if
we have a bug hiding somewhere. But more likely, we are receiving the
faulty frame from the network.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Doug Leith observed a discrepancy between the version of CUBIC described
in the papers and the version in 2.6.18. A math error related to scaling
causes Cubic to grow too slowly.
Patch is from "Sangtae Ha" <sha2@ncsu.edu>. I validated that
it does fix the problems.
See the following to show behavior over 500ms 100 Mbit link.
Sender (2.6.19-rc3) --- Bridge (2.6.18-rt7) ------- Receiver (2.6.19-rc3)
1G [netem] 100M
http://developer.osdl.org/shemminger/tcp/2.6.19-rc3/cubic-orig.png
http://developer.osdl.org/shemminger/tcp/2.6.19-rc3/cubic-fix.png
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
The input_device pointer is not refcounted, which means the device may
disappear while packets are queued, causing a crash when ifb passes packets
with a stale skb->dev pointer to netif_rx().
Fix by storing the interface index instead and do a lookup where neccessary.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Tetsuo Handa <handat@pm.nttdata.co.jp> told me that connect(2) with TCPv6
socket almost always took a few minutes to return when we did not have any
ports available in the range of net.ipv4.ip_local_port_range.
The reason was that we used incorrect seed for calculating index of
hash when we check established sockets in __inet6_check_established().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Ingress queueing uses a seperate lock for serializing enqueue operations,
but fails to properly protect itself against concurrent changes to the
qdisc tree. Use queue_lock for now since the real fix it quite intrusive.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
cls_basic doesn't allocate tp->root before it is linked into the
active classifier list, resulting in a NULL pointer dereference
when packets hit the classifier before its ->change function is
called.
Reported by Chris Madden <chris@reflexsecurity.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
There are multiple problems related to qlen adjustment that can lead
to an upper qdisc getting out of sync with the real number of packets
queued, leading to endless dequeueing attempts by the upper layer code.
All qdiscs must maintain an accurate q.qlen counter. There are basically
two groups of operations affecting the qlen: operations that propagate
down the tree (enqueue, dequeue, requeue, drop, reset) beginning at the
root qdisc and operations only affecting a subtree or single qdisc
(change, graft, delete class). Since qlen changes during operations from
the second group don't propagate to ancestor qdiscs, their qlen values
become desynchronized.
This patch adds a function to propagate qlen changes up the qdisc tree,
optionally calling a callback function to perform qdisc-internal
maintenance when the child qdisc is deactivated, and converts all
qdiscs to use this where necessary.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Some stacks apparently send packets with SYN|URG set. Linux accepts
these packets, so TCP conntrack should to.
Pointed out by Martijn Posthuma <posthuma@sangine.com>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Paranoia: instance_put() might have freed the inst pointer when we
spin_unlock_bh().
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Stop reference leaking in nfulnl_log_packet(). If we start a timer we
are already taking another reference.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Eliminate possible NULL pointer dereference in nfulnl_recv_config().
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Fix the nasty NULL dereference on multiple packets per netlink message.
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000004
printing eip:
f8a4b3bf
*pde = 00000000
Oops: 0002 [#1]
SMP
Modules linked in: nfnetlink_log ipt_ttl ipt_REDIRECT xt_tcpudp iptable_nat nf_nat nf_conntrack
_ipv4 xt_state ipt_ipp2p xt_NFLOG xt_hashlimit ip6_tables iptable_filter xt_multiport xt_mark i
pt_set iptable_raw xt_MARK iptable_mangle ip_tables cls_fw cls_u32 sch_esfq sch_htb ip_set_ipma
p ip_set ipt_ULOG x_tables dm_snapshot dm_mirror loop e1000 parport_pc parport e100 floppy ide_
cd cdrom
CPU: 0
EIP: 0060:[<f8a4b3bf>] Not tainted VLI
EFLAGS: 00010206 (2.6.20 #5)
EIP is at __nfulnl_send+0x24/0x51 [nfnetlink_log]
eax: 00000000 ebx: f2b5cbc0 ecx: c03f5f54 edx: c03f4000
esi: f2b5cbc8 edi: c03f5f54 ebp: f8a4b3ec esp: c03f5f30
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, ti=c03f4000 task=c03bece0 task.ti=c03f4000)
Stack: f2b5cbc0 f8a4b401 00000100 c0444080 c012af49 00000000 f6f19100 f6f19000
c1707800 c03f5f54 c03f5f54 00000123 00000021 c03e8d08 c0426380 00000009
c0126932 00000000 00000046 c03e9980 c03e6000 0047b007 c01269bd 00000000
Call Trace:
[<f8a4b401>] nfulnl_timer+0x15/0x25 [nfnetlink_log]
[<c012af49>] run_timer_softirq+0x10a/0x164
[<c0126932>] __do_softirq+0x60/0xba
[<c01269bd>] do_softirq+0x31/0x35
[<c0104f6e>] do_IRQ+0x62/0x74
[<c01036cb>] common_interrupt+0x23/0x28
[<c0101018>] default_idle+0x0/0x3f
[<c0101045>] default_idle+0x2d/0x3f
[<c01010fa>] cpu_idle+0xa0/0xb9
[<c03fb7f5>] start_kernel+0x1a8/0x1ac
[<c03fb293>] unknown_bootoption+0x0/0x181
=======================
Code: 5e 5f 5b 5e 5f 5d c3 53 89 c3 8d 40 1c 83 7b 1c 00 74 05 e8 2c ee 6d c7 83 7b 14 00 75 04
31 c0 eb 34 83 7b 10 01 76 09 8b 43 18 <66> c7 40 04 03 00 8b 53 34 8b 43 14 b9 40 00 00 00 e8
08 9a 84
EIP: [<f8a4b3bf>] __nfulnl_send+0x24/0x51 [nfnetlink_log] SS:ESP 0068:c03f5f30
<0>Kernel panic - not syncing: Fatal exception in interrupt
<0>Rebooting in 5 seconds..
Panic no more!
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
physoutdev is only set on purely bridged packet, when nfnetlink_log is used
in the OUTPUT/FORWARD/POSTROUTING hooks on packets forwarded from or to a
bridge it crashes when trying to dereference skb->nf_bridge->physoutdev.
Reported by Holger Eitzenberger <heitzenberger@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|