Age | Commit message (Collapse) | Author |
|
[ Upstream commit b14878ccb7fac0242db82720b784ab62c467c0dc ]
Currently, it is possible to create an SCTP socket, then switch
auth_enable via sysctl setting to 1 and crash the system on connect:
Oops[#1]:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1
task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000
[...]
Call Trace:
[<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80
[<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4
[<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c
[<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8
[<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214
[<ffffffff8043af68>] sctp_rcv+0x588/0x630
[<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24
[<ffffffff803acb50>] ip6_input+0x2c0/0x440
[<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564
[<ffffffff80310650>] process_backlog+0xb4/0x18c
[<ffffffff80313cbc>] net_rx_action+0x12c/0x210
[<ffffffff80034254>] __do_softirq+0x17c/0x2ac
[<ffffffff800345e0>] irq_exit+0x54/0xb0
[<ffffffff800075a4>] ret_from_irq+0x0/0x4
[<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48
[<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148
[<ffffffff805a88b0>] start_kernel+0x37c/0x398
Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0
03e00008 00000000
---[ end trace b530b0551467f2fd ]---
Kernel panic - not syncing: Fatal exception in interrupt
What happens while auth_enable=0 in that case is, that
ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs()
when endpoint is being created.
After that point, if an admin switches over to auth_enable=1,
the machine can crash due to NULL pointer dereference during
reception of an INIT chunk. When we enter sctp_process_init()
via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk,
the INIT verification succeeds and while we walk and process
all INIT params via sctp_process_param() we find that
net->sctp.auth_enable is set, therefore do not fall through,
but invoke sctp_auth_asoc_set_default_hmac() instead, and thus,
dereference what we have set to NULL during endpoint
initialization phase.
The fix is to make auth_enable immutable by caching its value
during endpoint initialization, so that its original value is
being carried along until destruction. The bug seems to originate
from the very first days.
Fix in joint work with Daniel Borkmann.
Reported-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit dc8eaaa006350d24030502a4521542e74b5cb39f ]
When I open the LOCKDEP config and run these steps:
modprobe 8021q
vconfig add eth2 20
vconfig add eth2.20 30
ifconfig eth2 xx.xx.xx.xx
then the Call Trace happened:
[32524.386288] =============================================
[32524.386293] [ INFO: possible recursive locking detected ]
[32524.386298] 3.14.0-rc2-0.7-default+ #35 Tainted: G O
[32524.386302] ---------------------------------------------
[32524.386306] ifconfig/3103 is trying to acquire lock:
[32524.386310] (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0
[32524.386326]
[32524.386326] but task is already holding lock:
[32524.386330] (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40
[32524.386341]
[32524.386341] other info that might help us debug this:
[32524.386345] Possible unsafe locking scenario:
[32524.386345]
[32524.386350] CPU0
[32524.386352] ----
[32524.386354] lock(&vlan_netdev_addr_lock_key/1);
[32524.386359] lock(&vlan_netdev_addr_lock_key/1);
[32524.386364]
[32524.386364] *** DEADLOCK ***
[32524.386364]
[32524.386368] May be due to missing lock nesting notation
[32524.386368]
[32524.386373] 2 locks held by ifconfig/3103:
[32524.386376] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81431d42>] rtnl_lock+0x12/0x20
[32524.386387] #1: (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40
[32524.386398]
[32524.386398] stack backtrace:
[32524.386403] CPU: 1 PID: 3103 Comm: ifconfig Tainted: G O 3.14.0-rc2-0.7-default+ #35
[32524.386409] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[32524.386414] ffffffff81ffae40 ffff8800d9625ae8 ffffffff814f68a2 ffff8800d9625bc8
[32524.386421] ffffffff810a35fb ffff8800d8a8d9d0 00000000d9625b28 ffff8800d8a8e5d0
[32524.386428] 000003cc00000000 0000000000000002 ffff8800d8a8e5f8 0000000000000000
[32524.386435] Call Trace:
[32524.386441] [<ffffffff814f68a2>] dump_stack+0x6a/0x78
[32524.386448] [<ffffffff810a35fb>] __lock_acquire+0x7ab/0x1940
[32524.386454] [<ffffffff810a323a>] ? __lock_acquire+0x3ea/0x1940
[32524.386459] [<ffffffff810a4874>] lock_acquire+0xe4/0x110
[32524.386464] [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0
[32524.386471] [<ffffffff814fc07a>] _raw_spin_lock_nested+0x2a/0x40
[32524.386476] [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0
[32524.386481] [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0
[32524.386489] [<ffffffffa0500cab>] vlan_dev_set_rx_mode+0x2b/0x50 [8021q]
[32524.386495] [<ffffffff8141addf>] __dev_set_rx_mode+0x5f/0xb0
[32524.386500] [<ffffffff8141af8b>] dev_set_rx_mode+0x2b/0x40
[32524.386506] [<ffffffff8141b3cf>] __dev_open+0xef/0x150
[32524.386511] [<ffffffff8141b177>] __dev_change_flags+0xa7/0x190
[32524.386516] [<ffffffff8141b292>] dev_change_flags+0x32/0x80
[32524.386524] [<ffffffff8149ca56>] devinet_ioctl+0x7d6/0x830
[32524.386532] [<ffffffff81437b0b>] ? dev_ioctl+0x34b/0x660
[32524.386540] [<ffffffff814a05b0>] inet_ioctl+0x80/0xa0
[32524.386550] [<ffffffff8140199d>] sock_do_ioctl+0x2d/0x60
[32524.386558] [<ffffffff81401a52>] sock_ioctl+0x82/0x2a0
[32524.386568] [<ffffffff811a7123>] do_vfs_ioctl+0x93/0x590
[32524.386578] [<ffffffff811b2705>] ? rcu_read_lock_held+0x45/0x50
[32524.386586] [<ffffffff811b39e5>] ? __fget_light+0x105/0x110
[32524.386594] [<ffffffff811a76b1>] SyS_ioctl+0x91/0xb0
[32524.386604] [<ffffffff815057e2>] system_call_fastpath+0x16/0x1b
========================================================================
The reason is that all of the addr_lock_key for vlan dev have the same class,
so if we change the status for vlan dev, the vlan dev and its real dev will
hold the same class of addr_lock_key together, so the warning happened.
we should distinguish the lock depth for vlan dev and its real dev.
v1->v2: Convert the vlan_netdev_addr_lock_key to an array of eight elements, which
could support to add 8 vlan id on a same vlan dev, I think it is enough for current
scene, because a netdev's name is limited to IFNAMSIZ which could not hold 8 vlan id,
and the vlan dev would not meet the same class key with its real dev.
The new function vlan_dev_get_lockdep_subkey() will return the subkey and make the vlan
dev could get a suitable class key.
v2->v3: According David's suggestion, I use the subclass to distinguish the lock key for vlan dev
and its real dev, but it make no sense, because the difference for subclass in the
lock_class_key doesn't mean that the difference class for lock_key, so I use lock_depth
to distinguish the different depth for every vlan dev, the same depth of the vlan dev
could have the same lock_class_key, I import the MAX_LOCK_DEPTH from the include/linux/sched.h,
I think it is enough here, the lockdep should never exceed that value.
v3->v4: Add a huge array of locking keys will waste static kernel memory and is not a appropriate method,
we could use _nested() variants to fix the problem, calculate the depth for every vlan dev,
and use the depth as the subclass for addr_lock_key.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 54d63f787b652755e66eb4dd8892ee6d3f5197fc ]
It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but
this is unsafe, the module always supposes that this device exists. For example,
ip6gre_tunnel_lookup() may use it unconditionally.
Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we
let ip6gre_destroy_tunnels() do the job).
Introduced by commit c12b395a4664 ("gre: Support GRE over IPv6").
CC: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 05ab8f2647e4221cbdb3856dd7d32bd5407316b3 ]
The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check
for a minimal message length before testing the supplied offset to be
within the bounds of the message. This allows the subtraction of the nla
header to underflow and therefore -- as the data type is unsigned --
allowing far to big offset and length values for the search of the
netlink attribute.
The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is
also wrong. It has the minuend and subtrahend mixed up, therefore
calculates a huge length value, allowing to overrun the end of the
message while looking for the netlink attribute.
The following three BPF snippets will trigger the bugs when attached to
a UNIX datagram socket and parsing a message with length 1, 2 or 3.
,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]--
| ld #0x87654321
| ldx #42
| ld #nla
| ret a
`---
,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]--
| ld #0x87654321
| ldx #42
| ld #nlan
| ret a
`---
,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]--
| ; (needs a fake netlink header at offset 0)
| ld #0
| ldx #42
| ld #nlan
| ret a
`---
Fix the first issue by ensuring the message length fulfills the minimal
size constrains of a nla header. Fix the second bug by getting the math
for the remainder calculation right.
Fixes: 4738c1db15 ("[SKFILTER]: Add SKF_ADF_NLATTR instruction")
Fixes: d214c7537b ("filter: add SKF_AD_NLATTR_NEST to look for nested..")
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 91146153da2feab18efab2e13b0945b6bb704ded ]
Extend commit 13378cad02afc2adc6c0e07fca03903c7ada0b37
("ipv4: Change rt->rt_iif encoding.") from 3.6 to return valid
RTA_IIF on 'ip route get ... iif DEVICE' instead of rt_iif 0
which is displayed as 'iif *'.
inet_iif is not appropriate to use because skb_iif is not set.
Use the skb->dev->ifindex instead.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b04c46190219a4f845e46a459e3102137b7f6cac ]
Plug a group_info refcount leak in ping_init.
group_info is only needed during initialization and
the code failed to release the reference on exit.
While here move grabbing the reference to a place
where it is actually needed.
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Signed-off-by: Zhang Dongxing <dongxing.zhang@intel.com>
Signed-off-by: xiaoming wang <xiaoming.wang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8d89dcdf80d88007647945a753821a06eb6cc5a5 ]
Before the patch, it was possible to add two times the same tunnel:
ip l a vti1 type vti remote 10.16.0.121 local 10.16.0.249 key 41
ip l a vti2 type vti remote 10.16.0.121 local 10.16.0.249 key 41
It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
argument dev->type, which was set only later (when calling ndo_init handler
in register_netdevice()). Let's set this type in the setup handler, which is
called before newlink handler.
Introduced by commit b9959fd3b0fa ("vti: switch to new ip tunnel code").
CC: Cong Wang <amwang@redhat.com>
CC: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5a4552752d8f7f4cef1d98775ece7adb7616fde2 ]
Before the patch, it was possible to add two times the same tunnel:
ip l a gre1 type gre remote 10.16.0.121 local 10.16.0.249
ip l a gre2 type gre remote 10.16.0.121 local 10.16.0.249
It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
argument dev->type, which was set only later (when calling ndo_init handler
in register_netdevice()). Let's set this type in the setup handler, which is
called before newlink handler.
Introduced by commit c54419321455 ("GRE: Refactor GRE tunneling code.").
CC: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 30f78d8ebf7f514801e71b88a10c948275168518 ]
Francois reported that setting big mtu on loopback device could prevent
tcp sessions making progress.
We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets.
We must limit the IPv6 MTU to (65535 + 40) bytes in theory.
Tested:
ifconfig lo mtu 70000
netperf -H ::1
Before patch : Throughput : 0.05 Mbits
After patch : Throughput : 35484 Mbits
Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit eb7076182d1ae4bc4641534134ed707100d76acc ]
br_allowed_ingress() has two problems.
1. If br_allowed_ingress() is called by br_handle_frame_finish() and
vlan_untag() in br_allowed_ingress() fails, skb will be freed by both
vlan_untag() and br_handle_frame_finish().
2. If br_allowed_ingress() is called by br_dev_xmit() and
br_allowed_ingress() fails, the skb will not be freed.
Fix these two problems by freeing the skb in br_allowed_ingress()
if it fails.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 6d39d589bb76ee8a1c6cde6822006ae0053decff ]
In case of tcp, gso_size contains the tcpmss.
For UFO (udp fragmentation offloading) skbs, gso_size is the fragment
payload size, i.e. we must not account for udp header size.
Otherwise, when using virtio drivers, a to-be-forwarded UFO GSO packet
will be needlessly fragmented in the forward path, because we think its
individual segments are too large for the outgoing link.
Fixes: fe6cc55f3a9a053 ("net: ip, ipv6: handle gso skbs in forwarding path")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f34c4a35d87949fbb0e0f31eba3c054e9f8199ba ]
When l2tp driver tries to get PMTU for the tunnel destination, it uses
the pointer to struct sock that represents PPPoX socket, while it
should use the pointer that represents UDP socket of the tunnel.
Signed-off-by: Dmitry Petukhov <dmgenp@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1e1cdf8ac78793e0875465e98a648df64694a8d0 ]
In function sctp_wake_up_waiters(), we need to involve a test
if the association is declared dead. If so, we don't have any
reference to a possible sibling association anymore and need
to invoke sctp_write_space() instead, and normally walk the
socket's associations and notify them of new wmem space. The
reason for special casing is that otherwise, we could run
into the following issue when a sctp_primitive_SEND() call
from sctp_sendmsg() fails, and tries to flush an association's
outq, i.e. in the following way:
sctp_association_free()
`-> list_del(&asoc->asocs) <-- poisons list pointer
asoc->base.dead = true
sctp_outq_free(&asoc->outqueue)
`-> __sctp_outq_teardown()
`-> sctp_chunk_free()
`-> consume_skb()
`-> sctp_wfree()
`-> sctp_wake_up_waiters() <-- dereferences poisoned pointers
if asoc->ep->sndbuf_policy=0
Therefore, only walk the list in an 'optimized' way if we find
that the current association is still active. We could also use
list_del_init() in addition when we call sctp_association_free(),
but as Vlad suggests, we want to trap such bugs and thus leave
it poisoned as is.
Why is it safe to resolve the issue by testing for asoc->base.dead?
Parallel calls to sctp_sendmsg() are protected under socket lock,
that is lock_sock()/release_sock(). Only within that path under
lock held, we're setting skb/chunk owner via sctp_set_owner_w().
Eventually, chunks are freed directly by an association still
under that lock. So when traversing association list on destruction
time from sctp_wake_up_waiters() via sctp_wfree(), a different
CPU can't be running sctp_wfree() while another one calls
sctp_association_free() as both happens under the same lock.
Therefore, this can also not race with setting/testing against
asoc->base.dead as we are guaranteed for this to happen in order,
under lock. Further, Vlad says: the times we check asoc->base.dead
is when we've cached an association pointer for later processing.
In between cache and processing, the association may have been
freed and is simply still around due to reference counts. We check
asoc->base.dead under a lock, so it should always be safe to check
and not race against sctp_association_free(). Stress-testing seems
fine now, too.
Fixes: cd253f9f357d ("net: sctp: wake up all assocs if sndbuf policy is per socket")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 52c35befb69b005c3fc5afdaae3a5717ad013411 ]
SCTP charges chunks for wmem accounting via skb->truesize in
sctp_set_owner_w(), and sctp_wfree() respectively as the
reverse operation. If a sender runs out of wmem, it needs to
wait via sctp_wait_for_sndbuf(), and gets woken up by a call
to __sctp_write_space() mostly via sctp_wfree().
__sctp_write_space() is being called per association. Although
we assign sk->sk_write_space() to sctp_write_space(), which
is then being done per socket, it is only used if send space
is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE
is set and therefore not invoked in sock_wfree().
Commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf
again when transmitting packet") fixed an issue where in case
sctp_packet_transmit() manages to queue up more than sndbuf
bytes, sctp_wait_for_sndbuf() will never be woken up again
unless it is interrupted by a signal. However, a still
remaining issue is that if net.sctp.sndbuf_policy=0, that is
accounting per socket, and one-to-many sockets are in use,
the reclaimed write space from sctp_wfree() is 'unfairly'
handed back on the server to the association that is the lucky
one to be woken up again via __sctp_write_space(), while
the remaining associations are never be woken up again
(unless by a signal).
The effect disappears with net.sctp.sndbuf_policy=1, that
is wmem accounting per association, as it guarantees a fair
share of wmem among associations.
Therefore, if we have reclaimed memory in case of per socket
accounting, wake all related associations to a socket in a
fair manner, that is, traverse the socket association list
starting from the current neighbour of the association and
issue a __sctp_write_space() to everyone until we end up
waking ourselves. This guarantees that no association is
preferred over another and even if more associations are
taken into the one-to-many session, all receivers will get
messages from the server and are not stalled forever on
high load. This setting still leaves the advantage of per
socket accounting in touch as an association can still use
up global limits if unused by others.
Fixes: 4eb701dfc618 ("[SCTP] Fix SCTP sendbuffer accouting.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c58dd2dd443c26d856a168db108a0cd11c285bf3 upstream.
All xtables variants suffer from the defect that the copy_to_user()
to copy the counters to user memory may fail after the table has
already been exchanged and thus exposed. Return an error at this
point will result in freeing the already exposed table. Any
subsequent packet processing will result in a kernel panic.
We can't copy the counters before exposing the new tables as we
want provide the counter state after the old table has been
unhooked. Therefore convert this into a silent error.
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 764152ff66f4a8be1f9d7981e542ffdaa5bd7aff upstream.
Their power value is initialized to zero. This patch fixes an issue
where the configured power drops to the minimum value when AP_VLAN
interfaces are created/removed.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 115b943a6ea12656088fa1ff6634c0d30815e55b upstream.
Jouni reported that when doing off-channel transmissions mixed
with on-channel transmissions, the on-channel ones ended up on
the off-channel in some cases.
The reason for that is that during the refactoring of the off-
channel code, I lost the part that stopped all activity and as
a consequence the on-channel frames (including data frames)
were no longer queued but would be transmitted on the temporary
channel.
Fix this by simply restoring the lost activity stop call.
Fixes: 2eb278e083549 ("mac80211: unify SW/offload remain-on-channel")
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 112c44b2df0984121a52fbda89425843b8e1a457 upstream.
commit de74a1d9032f4d37ea453ad2a647e1aff4cd2591
"mac80211: fix WPA with VLAN on AP side with ps-sta"
fixed an issue where queued multicast packets would
be sent out encrypted with the key of an other bss.
commit "7cbf9d017dbb5e3276de7d527925d42d4c11e732"
"mac80211: fix oops on mesh PS broadcast forwarding"
essentially reverted it, because vif.type cannot be AP_VLAN
due to the check to vif.type in ieee80211_get_buffered_bc before.
As the later commit intended to fix the MESH case, fix it
by checking for IFTYPE_AP instead of IFTYPE_AP_VLAN.
Fixes: 7cbf9d017dbb ("mac80211: fix oops on mesh PS broadcast forwarding")
Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3064639423c48d6e0eb9ecc27c512a58e38c6c57 upstream.
There could be a case, when NFSd file system is mounted in network, different
to socket's one, like below:
"ip netns exec" creates new network and mount namespace, which duplicates NFSd
mount point, created in init_net context. And thus NFS server stop in nested
network context leads to RPCBIND client destruction in init_net.
Then, on NFSd start in nested network context, rpc.nfsd process creates socket
in nested net and passes it into "write_ports", which leads to RPCBIND sockets
creation in init_net context because of the same reason (NFSd monut point was
created in init_net context). An attempt to register passed socket in nested
net leads to panic, because no RPCBIND client present in nexted network
namespace.
This patch add check that passed socket's net matches NFSd superblock's one.
And returns -EINVAL error to user psace otherwise.
v2: Put socket on exit.
Reported-by: Weng Meiling <wengmeiling.weng@huawei.com>
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5981a8821b774ada0be512fd9bad7c241e17657e upstream.
This patch fixes authentication failure on LE link re-connection when
BlueZ acts as slave (peripheral). LTK is removed from the internal list
after its first use causing PIN or Key missing reply when re-connecting
the link. The LE Long Term Key Request event indicates that the master
is attempting to encrypt or re-encrypt the link.
Pre-condition: BlueZ host paired and running as slave.
How to reproduce(master):
1) Establish an ACL LE encrypted link
2) Disconnect the link
3) Try to re-establish the ACL LE encrypted link (fails)
> HCI Event: LE Meta Event (0x3e) plen 19
LE Connection Complete (0x01)
Status: Success (0x00)
Handle: 64
Role: Slave (0x01)
...
@ Device Connected: 00:02:72:DC:29:C9 (1) flags 0x0000
> HCI Event: LE Meta Event (0x3e) plen 13
LE Long Term Key Request (0x05)
Handle: 64
Random number: 875be18439d9aa37
Encryption diversifier: 0x76ed
< HCI Command: LE Long Term Key Request Reply (0x08|0x001a) plen 18
Handle: 64
Long term key: 2aa531db2fce9f00a0569c7d23d17409
> HCI Event: Command Complete (0x0e) plen 6
LE Long Term Key Request Reply (0x08|0x001a) ncmd 1
Status: Success (0x00)
Handle: 64
> HCI Event: Encryption Change (0x08) plen 4
Status: Success (0x00)
Handle: 64
Encryption: Enabled with AES-CCM (0x01)
...
@ Device Disconnected: 00:02:72:DC:29:C9 (1) reason 3
< HCI Command: LE Set Advertise Enable (0x08|0x000a) plen 1
Advertising: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4
LE Set Advertise Enable (0x08|0x000a) ncmd 1
Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 19
LE Connection Complete (0x01)
Status: Success (0x00)
Handle: 64
Role: Slave (0x01)
...
@ Device Connected: 00:02:72:DC:29:C9 (1) flags 0x0000
> HCI Event: LE Meta Event (0x3e) plen 13
LE Long Term Key Request (0x05)
Handle: 64
Random number: 875be18439d9aa37
Encryption diversifier: 0x76ed
< HCI Command: LE Long Term Key Request Neg Reply (0x08|0x001b) plen 2
Handle: 64
> HCI Event: Command Complete (0x0e) plen 6
LE Long Term Key Request Neg Reply (0x08|0x001b) ncmd 1
Status: Success (0x00)
Handle: 64
> HCI Event: Disconnect Complete (0x05) plen 4
Status: Success (0x00)
Handle: 64
Reason: Authentication Failure (0x05)
@ Device Disconnected: 00:02:72:DC:29:C9 (1) reason 0
Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit bf39b4247b8799935ea91d90db250ab608a58e50 ]
Binding might result in a NULL device which is later dereferenced
without checking.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 43a43b6040165f7b40b5b489fe61a4cb7f8c4980 ]
After commit c15b1ccadb323ea ("ipv6: move DAD and addrconf_verify
processing to workqueue") some counters are now updated in process context
and thus need to disable bh before doing so, otherwise deadlocks can
happen on 32-bit archs. Fabio Estevam noticed this while while mounting
a NFS volume on an ARM board.
As a compensation for missing this I looked after the other *_STATS_BH
and found three other calls which need updating:
1) icmp6_send: ip6_fragment -> icmpv6_send -> icmp6_send (error handling)
2) ip6_push_pending_frames: rawv6_sendmsg -> rawv6_push_pending_frames -> ...
(only in case of icmp protocol with raw sockets in error handling)
3) ping6_v6_sendmsg (error handling)
Fixes: c15b1ccadb323ea ("ipv6: move DAD and addrconf_verify processing to workqueue")
Reported-by: Fabio Estevam <festevam@gmail.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit fc0d48b8fb449ca007b2057328abf736cb516168 ]
Currently, if the card supports CTAG acceleration we do not
account for the vlan header even if we are configuring an
8021AD vlan. This may not be best since we'll do software
tagging for 8021AD which will cause data copy on skb head expansion
Configure the length based on available hw offload capabilities and
vlan protocol.
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Not applicable upstream commit, the code here has been removed
upstream. ]
Neighbor Solicitation is ipv6 protocol, so we should check
skb->protocol with ETH_P_IPV6
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Cc: WANG Cong <amwang@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f518338b16038beeb73e155e60d0f70beb9379f4 ]
Commit 812e44dd1829 ("ip6mr: advertise new mfc entries via rtnl") reuses the
function ip6mr_fill_mroute() to notify mfc events.
But this function was used only for dump and thus was always setting the
flag NLM_F_MULTI, which is wrong in case of a single notification.
Libraries like libnl will wait forever for NLMSG_DONE.
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 65886f439ab0fdc2dff20d1fa87afb98c6717472 ]
Commit 8cd3ac9f9b7b ("ipmr: advertise new mfc entries via rtnl") reuses the
function ipmr_fill_mroute() to notify mfc events.
But this function was used only for dump and thus was always setting the
flag NLM_F_MULTI, which is wrong in case of a single notification.
Libraries like libnl will wait forever for NLMSG_DONE.
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1c104a6bebf3c16b6248408b84f91d09ac8a26b6 ]
Commit 3ff661c38c84 ("net: rtnetlink notify events for FDB NTF_SELF adds and
deletes") reuses the function nlmsg_populate_fdb_fill() to notify fdb events.
But this function was used only for dump and thus was always setting the
flag NLM_F_MULTI, which is wrong in case of a single notification.
Libraries like libnl will wait forever for NLMSG_DONE.
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e367c2d03dba4c9bcafad24688fadb79dd95b218 ]
In ip6_append_data_mtu(), when the xfrm mode is not tunnel(such as
transport),the ipsec header need to be added in the first fragment, so the mtu
will decrease to reserve space for it, then the second fragment come, the mtu
should be turn back, as the commit 0c1833797a5a6ec23ea9261d979aa18078720b74
said. however, in the commit a493e60ac4bbe2e977e7129d6d8cbb0dd236be, it use
*mtu = min(*mtu, ...) to change the mtu, which lead to the new mtu is alway
equal with the first fragment's. and cannot turn back.
when I test through ping6 -c1 -s5000 $ip (mtu=1280):
...frag (0|1232) ESP(spi=0x00002000,seq=0xb), length 1232
...frag (1232|1216)
...frag (2448|1216)
...frag (3664|1216)
...frag (4880|164)
which should be:
...frag (0|1232) ESP(spi=0x00001000,seq=0x1), length 1232
...frag (1232|1232)
...frag (2464|1232)
...frag (3696|1232)
...frag (4928|116)
so delete the min() when change back the mtu.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Fixes: 75a493e60ac4bb ("ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_size")
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ecab67015ef6e3f3635551dcc9971cf363cc1cd5 ]
tmp_prefered_lft is an offset to ifp->tstamp, not now. Therefore
age needs to be added to the condition.
Age calculation in ipv6_create_tempaddr is different from the one
in addrconf_verify and doesn't consider ADDRCONF_TIMER_FUZZ_MINUS.
This can cause age in ipv6_create_tempaddr to be less than the one
in addrconf_verify and therefore unnecessary temporary address to
be generated.
Use age calculation as in addrconf_modify to avoid this.
Signed-off-by: Heiner Kallweit <heiner.kallweit@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit dbb490b96584d4e958533fb637f08b557f505657 ]
When copying in a struct msghdr from the user, if the user has set the
msg_namelen parameter to a negative value it gets clamped to a valid
size due to a comparison between signed and unsigned values.
Ensure the syscall errors when the user passes in a negative value.
Signed-off-by: Matthew Leach <matthew.leach@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c3f9b01849ef3bc69024990092b9f42e20df7797 ]
Lars Persson reported following deadlock :
-000 |M:0x0:0x802B6AF8(asm) <-- arch_spin_lock
-001 |tcp_v4_rcv(skb = 0x8BD527A0) <-- sk = 0x8BE6B2A0
-002 |ip_local_deliver_finish(skb = 0x8BD527A0)
-003 |__netif_receive_skb_core(skb = 0x8BD527A0, ?)
-004 |netif_receive_skb(skb = 0x8BD527A0)
-005 |elk_poll(napi = 0x8C770500, budget = 64)
-006 |net_rx_action(?)
-007 |__do_softirq()
-008 |do_softirq()
-009 |local_bh_enable()
-010 |tcp_rcv_established(sk = 0x8BE6B2A0, skb = 0x87D3A9E0, th = 0x814EBE14, ?)
-011 |tcp_v4_do_rcv(sk = 0x8BE6B2A0, skb = 0x87D3A9E0)
-012 |tcp_delack_timer_handler(sk = 0x8BE6B2A0)
-013 |tcp_release_cb(sk = 0x8BE6B2A0)
-014 |release_sock(sk = 0x8BE6B2A0)
-015 |tcp_sendmsg(?, sk = 0x8BE6B2A0, ?, ?)
-016 |sock_sendmsg(sock = 0x8518C4C0, msg = 0x87D8DAA8, size = 4096)
-017 |kernel_sendmsg(?, ?, ?, ?, size = 4096)
-018 |smb_send_kvec()
-019 |smb_send_rqst(server = 0x87C4D400, rqst = 0x87D8DBA0)
-020 |cifs_call_async()
-021 |cifs_async_writev(wdata = 0x87FD6580)
-022 |cifs_writepages(mapping = 0x852096E4, wbc = 0x87D8DC88)
-023 |__writeback_single_inode(inode = 0x852095D0, wbc = 0x87D8DC88)
-024 |writeback_sb_inodes(sb = 0x87D6D800, wb = 0x87E4A9C0, work = 0x87D8DD88)
-025 |__writeback_inodes_wb(wb = 0x87E4A9C0, work = 0x87D8DD88)
-026 |wb_writeback(wb = 0x87E4A9C0, work = 0x87D8DD88)
-027 |wb_do_writeback(wb = 0x87E4A9C0, force_wait = 0)
-028 |bdi_writeback_workfn(work = 0x87E4A9CC)
-029 |process_one_work(worker = 0x8B045880, work = 0x87E4A9CC)
-030 |worker_thread(__worker = 0x8B045880)
-031 |kthread(_create = 0x87CADD90)
-032 |ret_from_kernel_thread(asm)
Bug occurs because __tcp_checksum_complete_user() enables BH, assuming
it is running from softirq context.
Lars trace involved a NIC without RX checksum support but other points
are problematic as well, like the prequeue stuff.
Problem is triggered by a timer, that found socket being owned by user.
tcp_release_cb() should call tcp_write_timer_handler() or
tcp_delack_timer_handler() in the appropriate context :
BH disabled and socket lock held, but 'owned' field cleared,
as if they were running from timer handlers.
Fixes: 6f458dfb4092 ("tcp: improve latencies of timer triggered events")
Reported-by: Lars Persson <lars.persson@axis.com>
Tested-by: Lars Persson <lars.persson@axis.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit dd38743b4cc2f86be250eaf156cf113ba3dd531a ]
With TX VLAN offload enabled the source MAC address for frames sent using the
VLAN interface is currently set to the address of the real interface. This is
wrong since the VLAN interface may be configured with a different address.
The bug was introduced in commit 2205369a314e12fcec4781cc73ac9c08fc2b47de
("vlan: Fix header ops passthru when doing TX VLAN offload.").
This patch sets the source address before calling the create function of the
real interface.
Signed-off-by: Peter Boström <peter.bostrom@netrounds.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c88507fbad8055297c1d1e21e599f46960cbee39 ]
DST_NOCOUNT should only be used if an authorized user adds routes
locally. In case of routes which are added on behalf of router
advertisments this flag must not get used as it allows an unlimited
number of routes getting added remotely.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d2d273ffabd315eecefce21a4391d44b6e156b73 ]
Without this fix, ipv6_exthdrs_offload_init doesn't register IPPROTO_DSTOPTS
offload, but returns 0 (as the IPPROTO_ROUTING registration actually succeeds).
This then causes the ipv6_gso_segment to drop IPv6 packets with IPPROTO_DSTOPTS
header.
The issue detected and the fix verified by running MS HCK Offload LSO test on
top of QEMU Windows guests, as this test sends IPv6 packets with
IPPROTO_DSTOPTS.
Signed-off-by: Anton Nayshtut <anton@swortex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit de1443916791d75fdd26becb116898277bb0273f ]
Some applications didn't expect recvmsg() on a non blocking socket
could return -EINTR. This possibility was added as a side effect
of commit b3ca9b02b00704 ("net: fix multithreaded signal handling in
unix recv routines").
To hit this bug, you need to be a bit unlucky, as the u->readlock
mutex is usually held for very small periods.
Fixes: b3ca9b02b00704 ("net: fix multithreaded signal handling in unix recv routines")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e588e2f286ed7da011ed357c24c5b9a554e26595 ]
Quoting Alexander Aring:
While fragmentation and unloading of 6lowpan module I got this kernel Oops
after few seconds:
BUG: unable to handle kernel paging request at f88bbc30
[..]
Modules linked in: ipv6 [last unloaded: 6lowpan]
Call Trace:
[<c012af4c>] ? call_timer_fn+0x54/0xb3
[<c012aef8>] ? process_timeout+0xa/0xa
[<c012b66b>] run_timer_softirq+0x140/0x15f
Problem is that incomplete frags are still around after unload; when
their frag expire timer fires, we get crash.
When a netns is removed (also done when unloading module), inet_frag
calls the evictor with 'force' argument to purge remaining frags.
The evictor loop terminates when accounted memory ('work') drops to 0
or the lru-list becomes empty. However, the mem accounting is done
via percpu counters and may not be accurate, i.e. loop may terminate
prematurely.
Alter evictor to only stop once the lru list is empty when force is
requested.
Reported-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Reported-by: Alexander Aring <alex.aring@gmail.com>
Tested-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 6565b9eeef194afbb3beec80d6dd2447f4091f8c ]
MLD queries are supposed to have an IPv6 link-local source address
according to RFC2710, section 4 and RFC3810, section 5.1.14. This patch
adds a sanity check to ignore such broken MLD queries.
Without this check, such malformed MLD queries can result in a
denial of service: The queries are ignored by any MLD listener
therefore they will not respond with an MLD report. However,
without this patch these malformed MLD queries would enable the
snooping part in the bridge code, potentially shutting down the
according ports towards these hosts for multicast traffic as the
bridge did not learn about these listeners.
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c485658bae87faccd7aed540fd2ca3ab37992310 ]
While working on ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to
verify if we/peer is AUTH capable"), we noticed that there's a skb
memory leakage in the error path.
Running the same reproducer as in ec0223ec48a9 and by unconditionally
jumping to the error label (to simulate an error condition) in
sctp_sf_do_5_1D_ce() receive path lets kmemleak detector bark about
the unfreed chunk->auth_chunk skb clone:
Unreferenced object 0xffff8800b8f3a000 (size 256):
comm "softirq", pid 0, jiffies 4294769856 (age 110.757s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
89 ab 75 5e d4 01 58 13 00 00 00 00 00 00 00 00 ..u^..X.........
backtrace:
[<ffffffff816660be>] kmemleak_alloc+0x4e/0xb0
[<ffffffff8119f328>] kmem_cache_alloc+0xc8/0x210
[<ffffffff81566929>] skb_clone+0x49/0xb0
[<ffffffffa0467459>] sctp_endpoint_bh_rcv+0x1d9/0x230 [sctp]
[<ffffffffa046fdbc>] sctp_inq_push+0x4c/0x70 [sctp]
[<ffffffffa047e8de>] sctp_rcv+0x82e/0x9a0 [sctp]
[<ffffffff815abd38>] ip_local_deliver_finish+0xa8/0x210
[<ffffffff815a64af>] nf_reinject+0xbf/0x180
[<ffffffffa04b4762>] nfqnl_recv_verdict+0x1d2/0x2b0 [nfnetlink_queue]
[<ffffffffa04aa40b>] nfnetlink_rcv_msg+0x14b/0x250 [nfnetlink]
[<ffffffff815a3269>] netlink_rcv_skb+0xa9/0xc0
[<ffffffffa04aa7cf>] nfnetlink_rcv+0x23f/0x408 [nfnetlink]
[<ffffffff815a2bd8>] netlink_unicast+0x168/0x250
[<ffffffff815a2fa1>] netlink_sendmsg+0x2e1/0x3f0
[<ffffffff8155cc6b>] sock_sendmsg+0x8b/0xc0
[<ffffffff8155d449>] ___sys_sendmsg+0x369/0x380
What happens is that commit bbd0d59809f9 clones the skb containing
the AUTH chunk in sctp_endpoint_bh_rcv() when having the edge case
that an endpoint requires COOKIE-ECHO chunks to be authenticated:
---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
<------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
------------------ AUTH; COOKIE-ECHO ---------------->
<-------------------- COOKIE-ACK ---------------------
When we enter sctp_sf_do_5_1D_ce() and before we actually get to
the point where we process (and subsequently free) a non-NULL
chunk->auth_chunk, we could hit the "goto nomem_init" path from
an error condition and thus leave the cloned skb around w/o
freeing it.
The fix is to centrally free such clones in sctp_chunk_destroy()
handler that is invoked from sctp_chunk_free() after all refs have
dropped; and also move both kfree_skb(chunk->auth_chunk) there,
so that chunk->auth_chunk is either NULL (since sctp_chunkify()
allocs new chunks through kmem_cache_zalloc()) or non-NULL with
a valid skb pointer. chunk->skb and chunk->auth_chunk are the
only skbs in the sctp_chunk structure that need to be handeled.
While at it, we should use consume_skb() for both. It is the same
as dev_kfree_skb() but more appropriately named as we are not
a device but a protocol. Also, this effectively replaces the
kfree_skb() from both invocations into consume_skb(). Functions
are the same only that kfree_skb() assumes that the frame was
being dropped after a failure (e.g. for tools like drop monitor),
usage of consume_skb() seems more appropriate in function
sctp_chunk_destroy() though.
Fixes: bbd0d59809f9 ("[SCTP]: Implement the receive and verification of AUTH chunk")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <yasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 24b9bf43e93e0edd89072da51cf1fab95fc69dec ]
I stumbled upon this very serious bug while hunting for another one,
it's a very subtle race condition between inet_frag_evictor,
inet_frag_intern and the IPv4/6 frag_queue and expire functions
(basically the users of inet_frag_kill/inet_frag_put).
What happens is that after a fragment has been added to the hash chain
but before it's been added to the lru_list (inet_frag_lru_add) in
inet_frag_intern, it may get deleted (either by an expired timer if
the system load is high or the timer sufficiently low, or by the
fraq_queue function for different reasons) before it's added to the
lru_list, then after it gets added it's a matter of time for the
evictor to get to a piece o |