Age | Commit message (Collapse) | Author |
|
commit a8170c35e738d62e9919ce5b109cf4ed66e95bde upstream.
When calculating the INIT/INIT-ACK chunk length, we should not
only account the length of parameters, but also the parameters
zero padding length, such as AUTH HMACS parameter and CHUNKS
parameter. Without the parameters zero padding length we may get
following oops.
skb_over_panic: text:ce2068d2 len:130 put:6 head:cac3fe00 data:cac3fe00 tail:0xcac3fe82 end:0xcac3fe80 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:127!
invalid opcode: 0000 [#2] SMP
last sysfs file: /sys/module/aes_generic/initstate
Modules linked in: authenc ......
Pid: 4102, comm: sctp_darn Tainted: G D 2.6.34-rc2 #6
EIP: 0060:[<c0607630>] EFLAGS: 00010282 CPU: 0
EIP is at skb_over_panic+0x37/0x3e
EAX: 00000078 EBX: c07c024b ECX: c07c02b9 EDX: cb607b78
ESI: 00000000 EDI: cac3fe7a EBP: 00000002 ESP: cb607b74
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process sctp_darn (pid: 4102, ti=cb607000 task=cabdc990 task.ti=cb607000)
Stack:
c07c02b9 ce2068d2 00000082 00000006 cac3fe00 cac3fe00 cac3fe82 cac3fe80
<0> c07c024b cac3fe7c cac3fe7a c0608dec ca986e80 ce2068d2 00000006 0000007a
<0> cb8120ca ca986e80 cb812000 00000003 cb8120c4 ce208a25 cb8120ca cadd9400
Call Trace:
[<ce2068d2>] ? sctp_addto_chunk+0x45/0x85 [sctp]
[<c0608dec>] ? skb_put+0x2e/0x32
[<ce2068d2>] ? sctp_addto_chunk+0x45/0x85 [sctp]
[<ce208a25>] ? sctp_make_init+0x279/0x28c [sctp]
[<c0686a92>] ? apic_timer_interrupt+0x2a/0x30
[<ce1fdc0b>] ? sctp_sf_do_prm_asoc+0x2b/0x7b [sctp]
[<ce202823>] ? sctp_do_sm+0xa0/0x14a [sctp]
[<ce2133b9>] ? sctp_pname+0x0/0x14 [sctp]
[<ce211d72>] ? sctp_primitive_ASSOCIATE+0x2b/0x31 [sctp]
[<ce20f3cf>] ? sctp_sendmsg+0x7a0/0x9eb [sctp]
[<c064eb1e>] ? inet_sendmsg+0x3b/0x43
[<c04244b7>] ? task_tick_fair+0x2d/0xd9
[<c06031e1>] ? sock_sendmsg+0xa7/0xc1
[<c0416afe>] ? smp_apic_timer_interrupt+0x6b/0x75
[<c0425123>] ? dequeue_task_fair+0x34/0x19b
[<c0446abb>] ? sched_clock_local+0x17/0x11e
[<c052ea87>] ? _copy_from_user+0x2b/0x10c
[<c060ab3a>] ? verify_iovec+0x3c/0x6a
[<c06035ca>] ? sys_sendmsg+0x186/0x1e2
[<c042176b>] ? __wake_up_common+0x34/0x5b
[<c04240c2>] ? __wake_up+0x2c/0x3b
[<c057e35c>] ? tty_wakeup+0x43/0x47
[<c04430f2>] ? remove_wait_queue+0x16/0x24
[<c0580c94>] ? n_tty_read+0x5b8/0x65e
[<c042be02>] ? default_wake_function+0x0/0x8
[<c0604e0e>] ? sys_socketcall+0x17f/0x1cd
[<c040264c>] ? sysenter_do_call+0x12/0x22
Code: 0f 45 de 53 ff b0 98 00 00 00 ff b0 94 ......
EIP: [<c0607630>] skb_over_panic+0x37/0x3e SS:ESP 0068:cb607b74
To reproduce:
# modprobe sctp
# echo 1 > /proc/sys/net/sctp/addip_enable
# echo 1 > /proc/sys/net/sctp/auth_enable
# sctp_test -H 3ffe:501:ffff:100:20c:29ff:fe4d:f37e -P 800 -l
# sctp_darn -H 3ffe:501:ffff:100:20c:29ff:fe4d:f37e -P 900 -h 192.168.0.21 -p 800 -I -s -t
sctp_darn ready to send...
3ffe:501:ffff:100:20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> bindx-add=192.168.0.21
3ffe:501:ffff:100:20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> bindx-add=192.168.1.21
3ffe:501:ffff:100:20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> snd=10
------------------------------------------------------------------
eth0 has addresses: 3ffe:501:ffff:100:20c:29ff:fe4d:f37e and 192.168.0.21
eth1 has addresses: 192.168.1.21
------------------------------------------------------------------
Reported-by: George Cheimonidis <gchimon@gmail.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c0786693404cffd80ca3cb6e75ee7b35186b2825 upstream.
When we finish processing ASCONF_ACK chunk, we try to send
the next queued ASCONF. This action runs the sctp state
machine recursively and it's not prepared to do so.
kernel BUG at kernel/timer.c:790!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/module/ipv6/initstate
Modules linked in: sha256_generic sctp libcrc32c ipv6 dm_multipath
uinput 8139too i2c_piix4 8139cp mii i2c_core pcspkr virtio_net joydev
floppy virtio_blk virtio_pci [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.34-rc4 #15 /Bochs
EIP: 0060:[<c044a2ef>] EFLAGS: 00010286 CPU: 0
EIP is at add_timer+0xd/0x1b
EAX: cecbab14 EBX: 000000f0 ECX: c0957b1c EDX: 03595cf4
ESI: cecba800 EDI: cf276f00 EBP: c0957aa0 ESP: c0957aa0
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper (pid: 0, ti=c0956000 task=c0988ba0 task.ti=c0956000)
Stack:
c0957ae0 d1851214 c0ab62e4 c0ab5f26 0500ffff 00000004 00000005 00000004
<0> 00000000 d18694fd 00000004 1666b892 cecba800 cecba800 c0957b14
00000004
<0> c0957b94 d1851b11 ceda8b00 cecba800 cf276f00 00000001 c0957b14
000000d0
Call Trace:
[<d1851214>] ? sctp_side_effects+0x607/0xdfc [sctp]
[<d1851b11>] ? sctp_do_sm+0x108/0x159 [sctp]
[<d1863386>] ? sctp_pname+0x0/0x1d [sctp]
[<d1861a56>] ? sctp_primitive_ASCONF+0x36/0x3b [sctp]
[<d185657c>] ? sctp_process_asconf_ack+0x2a4/0x2d3 [sctp]
[<d184e35c>] ? sctp_sf_do_asconf_ack+0x1dd/0x2b4 [sctp]
[<d1851ac1>] ? sctp_do_sm+0xb8/0x159 [sctp]
[<d1863334>] ? sctp_cname+0x0/0x52 [sctp]
[<d1854377>] ? sctp_assoc_bh_rcv+0xac/0xe1 [sctp]
[<d1858f0f>] ? sctp_inq_push+0x2d/0x30 [sctp]
[<d186329d>] ? sctp_rcv+0x797/0x82e [sctp]
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Yuansong Qiao <ysqiao@research.ait.ie>
Signed-off-by: Shuaijun Zhang <szhang@research.ait.ie>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 51e97a12bef19b7e43199fc153cf9bd5f2140362 upstream.
The sctp_asoc_get_hmac() function iterates through a peer's hmac_ids
array and attempts to ensure that only a supported hmac entry is
returned. The current code fails to do this properly - if the last id
in the array is out of range (greater than SCTP_AUTH_HMAC_ID_MAX), the
id integer remains set after exiting the loop, and the address of an
out-of-bounds entry will be returned and subsequently used in the parent
function, causing potentially ugly memory corruption. This patch resets
the id integer to 0 on encountering an invalid id so that NULL will be
returned after finishing the loop if no valid ids are found.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 50b5d6ad63821cea324a5a7a19854d4de1a0a819 upstream.
ICMP protocol unreachable handling completely disregarded
the fact that the user may have locked the socket. It proceeded
to destroy the association, even though the user may have
held the lock and had a ref on the association. This resulted
in the following:
Attempt to release alive inet socket f6afcc00
=========================
[ BUG: held lock freed! ]
-------------------------
somenu/2672 is freeing memory f6afcc00-f6afcfff, with a lock still held
there!
(sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c
1 lock held by somenu/2672:
#0: (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c
stack backtrace:
Pid: 2672, comm: somenu Not tainted 2.6.32-telco #55
Call Trace:
[<c1232266>] ? printk+0xf/0x11
[<c1038553>] debug_check_no_locks_freed+0xce/0xff
[<c10620b4>] kmem_cache_free+0x21/0x66
[<c1185f25>] __sk_free+0x9d/0xab
[<c1185f9c>] sk_free+0x1c/0x1e
[<c1216e38>] sctp_association_put+0x32/0x89
[<c1220865>] __sctp_connect+0x36d/0x3f4
[<c122098a>] ? sctp_connect+0x13/0x4c
[<c102d073>] ? autoremove_wake_function+0x0/0x33
[<c12209a8>] sctp_connect+0x31/0x4c
[<c11d1e80>] inet_dgram_connect+0x4b/0x55
[<c11834fa>] sys_connect+0x54/0x71
[<c103a3a2>] ? lock_release_non_nested+0x88/0x239
[<c1054026>] ? might_fault+0x42/0x7c
[<c1054026>] ? might_fault+0x42/0x7c
[<c11847ab>] sys_socketcall+0x6d/0x178
[<c10da994>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c1002959>] syscall_call+0x7/0xb
This was because the sctp_wait_for_connect() would aqcure the socket
lock and then proceed to release the last reference count on the
association, thus cause the fully destruction path to finish freeing
the socket.
The simplest solution is to start a very short timer in case the socket
is owned by user. When the timer expires, we can do some verification
and be able to do the release properly.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
overflows.
[ Upstream fixed this in a different way as parts of the commits:
8d987e5c7510 (net: avoid limits overflow)
a9febbb4bd13 (sysctl: min/max bounds are optional)
27b3d80a7b6a (sysctl: fix min/max handling in __do_proc_doulongvec_minmax())
-DaveM ]
On a 16TB x86_64 machine, sysctl_tcp_mem[2], sysctl_udp_mem[2], and
sysctl_sctp_mem[2] can integer overflow. Set limit such that they are
maximized without overflowing.
Signed-off-by: Robin Holt <holt@sgi.com>
To: "David S. Miller" <davem@davemloft.net>
Cc: Willy Tarreau <w@1wt.eu>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-sctp@vger.kernel.org
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 4bdab43323b459900578b200a4b8cf9713ac8fab upstream.
sctp_packet_config() is called when getting the packet ready
for appending of chunks. The function should not touch the
current state, since it's possible to ping-pong between two
transports when sending, and that can result packet corruption
followed by skb overlfow crash.
Reported-by: Thomas Dreibholz <dreibh@iem.uni-due.de>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 2e3219b5c8a2e44e0b83ae6e04f52f20a82ac0f2 upstream.
commit 5fa782c2f5ef6c2e4f04d3e228412c9b4a4c8809
sctp: Fix skb_over_panic resulting from multiple invalid \
parameter errors (CVE-2010-1173) (v4)
cause 'error cause' never be add the the ERROR chunk due to
some typo when check valid length in sctp_init_cause_fixed().
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(CVE-2010-1173) (v4)
commit 5fa782c2f5ef6c2e4f04d3e228412c9b4a4c8809 upstream.
Ok, version 4
Change Notes:
1) Minor cleanups, from Vlads notes
Summary:
Hey-
Recently, it was reported to me that the kernel could oops in the
following way:
<5> kernel BUG at net/core/skbuff.c:91!
<5> invalid operand: 0000 [#1]
<5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
mptbase sd_mod scsi_mod
<5> CPU: 0
<5> EIP: 0060:[<c02bff27>] Not tainted VLI
<5> EFLAGS: 00010216 (2.6.9-89.0.25.EL)
<5> EIP is at skb_over_panic+0x1f/0x2d
<5> eax: 0000002c ebx: c033f461 ecx: c0357d96 edx: c040fd44
<5> esi: c033f461 edi: df653280 ebp: 00000000 esp: c040fd40
<5> ds: 007b es: 007b ss: 0068
<5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
<5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
e0c2947d
<5> 00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
df653490
<5> 00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
00000004
<5> Call Trace:
<5> [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
<5> [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
<5> [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
<5> [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
<5> [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
<5> [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
<5> [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
<5> [<c01555a4>] cache_grow+0x140/0x233
<5> [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
<5> [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
<5> [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
<5> [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
<5> [<c02d005e>] nf_iterate+0x40/0x81
<5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5> [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
<5> [<c02d0362>] nf_hook_slow+0x83/0xb5
<5> [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
<5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5> [<c02e103e>] ip_rcv+0x334/0x3b4
<5> [<c02c66fd>] netif_receive_skb+0x320/0x35b
<5> [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
<5> [<c02c67a4>] process_backlog+0x6c/0xd9
<5> [<c02c690f>] net_rx_action+0xfe/0x1f8
<5> [<c012a7b1>] __do_softirq+0x35/0x79
<5> [<c0107efb>] handle_IRQ_event+0x0/0x4f
<5> [<c01094de>] do_softirq+0x46/0x4d
Its an skb_over_panic BUG halt that results from processing an init chunk in
which too many of its variable length parameters are in some way malformed.
The problem is in sctp_process_unk_param:
if (NULL == *errp)
*errp = sctp_make_op_error_space(asoc, chunk,
ntohs(chunk->chunk_hdr->length));
if (*errp) {
sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
WORD_ROUND(ntohs(param.p->length)));
sctp_addto_chunk(*errp,
WORD_ROUND(ntohs(param.p->length)),
param.v);
When we allocate an error chunk, we assume that the worst case scenario requires
that we have chunk_hdr->length data allocated, which would be correct nominally,
given that we call sctp_addto_chunk for the violating parameter. Unfortunately,
we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
chunk, so the worst case situation in which all parameters are in violation
requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.
The result of this error is that a deliberately malformed packet sent to a
listening host can cause a remote DOS, described in CVE-2010-1173:
http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173
I've tested the below fix and confirmed that it fixes the issue. We move to a
strategy whereby we allocate a fixed size error chunk and ignore errors we don't
have space to report. Tested by me successfully
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
When retransmitting due to T3 timeout, retransmit all the
in-flight chunks for the corresponding transport/path, including
chunks sent less then 1 rto ago.
This is the correct behaviour according to rfc4960 section 6.3.3
E3 and
"Note: Any DATA chunks that were sent to the address for which the
T3-rtx timer expired but did not fit in one MTU (rule E3 above)
should be marked for retransmission and sent as soon as cwnd
allows (normally, when a SACK arrives). ".
This fixes problems when more then one path is present and the T3
retransmission of the first chunk that timeouts stops the T3 timer
for the initial active path, leaving all the other in-flight
chunks waiting forever or until a new chunk is transmitted on the
same path and timeouts (and this will happen only if the cwnd
allows sending new chunks, but since cwnd was dropped to MTU by
the timeout => it will wait until the first heartbeat).
Example: 10 packets in flight, sent at 0.1 s intervals on the
primary path. The primary path is down and the first packet
timeouts. The first packet is retransmitted on another path, the
T3 timer for the primary path is stopped and cwnd is set to MTU.
All the other 9 in-flight packets will not be retransmitted
(unless more new packets are sent on the primary path which depend
on cwnd allowing it, and even in this case the 9 packets will be
retransmitted only after a new packet timeouts which even in the
best case would be more then RTO).
This commit reverts d0ce92910bc04e107b2f3f2048f07e94f570035d and
also removes the now unused transport->last_rto, introduced in
b6157d8e03e1e780660a328f7183bcbfa4a93a19.
p.s The problem is not only when multiple paths are there. It
can happen in a single homed environment. If the application
stops sending data, it possible to have a hung association.
Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recent commits
sctp: Get rid of an extra routing lookup when adding a transport
and
sctp: Set source addresses on the association before adding transports
changed when routes are added to the sctp transports. As such,
we didn't set the socket source address correctly when adding the first
transport. The first transport is always the primary/active one, so
when adding it, set the socket source address. This was causing
regression failures in SCTP tests.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A new (unrealeased to the user) sctp_connectx api
c6ba68a26645dbc5029a9faa5687ebe6fcfc53e4
sctp: support non-blocking version of the new sctp_connectx() API
introduced a regression cought by the user regression test
suite. In particular, the API requires the user library to
re-allocate the buffer and could potentially trigger a SIGFAULT.
This change corrects that regression by passing the original
address buffer to the kernel unmodified, but still allows for
a returned association id.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recent commit 8da645e101a8c20c6073efda3c7cc74eec01b87f
sctp: Get rid of an extra routing lookup when adding a transport
introduced a regression in the connection setup. The behavior was
different between IPv4 and IPv6. IPv4 case ended up working because the
route lookup routing returned a NULL route, which triggered another
route lookup later in the output patch that succeeded. In the IPv6 case,
a valid route was returned for first call, but we could not find a valid
source address at the time since the source addresses were not set on the
association yet. Thus resulted in a hung connection.
The solution is to set the source addresses on the association prior to
adding peers.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.
Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sizing of memory allocations shouldn't depend on the number of physical
pages found in a system, as that generally includes (perhaps a huge amount
of) non-RAM pages. The amount of what actually is usable as storage
should instead be used as a basis here.
Some of the calculations (i.e. those not intending to use high memory)
should likely even use (totalram_pages - totalhigh_pages).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove long removed "inet_protocol_base" declaration.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since our TSN map is capable of holding at most a 4K chunk gap,
there is no way that during this gap, a stream sequence number
(unsigned short) can wrap such that the new number is smaller
then the next expected one. If such a case is encountered,
this is a protocol violation.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
Use sctp_packet_reset() instead of dup code.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
This patch introduces a new sysctl option to make IPv4 Address Scoping
configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>.
In networking environments where DNAT rules in iptables prerouting
chains convert destination IP's to link-local/private IP addresses,
SCTP connections fail to establish as the INIT chunk is dropped by the
kernel due to address scope match failure.
For example to support overlapping IP addresses (same IP address with
different vlan id) a Layer-5 application listens on link local IP's,
and there is a DNAT rule that maps the destination IP to a link local
IP. Such applications never get the SCTP INIT if the address-scoping
draft is strictly followed.
This sysctl configuration allows SCTP to function in such
unconventional networking environments.
Sysctl options:
0 - Disable IPv4 address scoping draft altogether
1 - Enable IPv4 address scoping (default, current behavior)
2 - Enable address scoping but allow IPv4 private addresses in init/init-ack
3 - Enable address scoping but allow IPv4 link local address in init/init-ack
Signed-off-by: Bhaskar Dutta <bhaskar.dutta@globallogic.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
We used to perform 2 routing lookups for a new transport: one
just for path mtu detection, and one to actually route to destination
and path mtu update when sending a packet. There is no point in doing
both of them, especially since the first one just for path mtu doesn't
take into account source address and sometimes gives the wrong route,
causing path mtu updates anyway.
We now do just the one call to do both route to destination and get
path mtu updates.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
We currently track if AUTH has been bundled using the 'auth'
pointer to the chunk. However, AUTH is disallowed after DATA
is already in the packet, so we need to instead use the
'has_auth' field.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
The packet information does not reset after packet transmit, this
may cause some problems such as following DATA chunk be sent without
AUTH chunk, even if the authentication of DATA chunk has been
requested by the peer.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
Add-IP feature allows users to delete an active transport. If that
transport has chunks in flight, those chunks need to be moved to another
transport or association may get into unrecoverable state.
Reported-by: Rafael Laufer <rlaufer@cisco.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
We had a bug that we never stored the user-defined value for
MAXSEG when setting the value on an association. Thus future
PMTU events ended up re-writing the frag point and increasing
it past user limit. Additionally, when setting the option on
the socket/endpoint, we effect all current associations, which
is against spec.
Now, we store the user 'maxseg' value along with the computed
'frag_point'. We inherit 'maxseg' from the socket at association
creation and use it as an upper limit for 'frag_point' when its
set.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
SCTP will delay the last part of a large write due to NAGLE, if that
part is smaller then MTU. Since we are doing large writes, we might
as well send the last portion now instead of waiting untill the next
large write happens. The small portion will be sent as is regardless,
so it's better to not delay it.
This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com>
and Doug Graham <dgraham@nortel.com>. Many thanks go out to them.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
The decision to delay due to Nagle should be based on the path mtu
and future packet size. We currently incorrectly base it on
'frag_point' which is the SCTP DATA segment size, and also we do
not count DATA chunk header overhead in the computation. This
actuall allows situations where a user can set low 'frag_point',
and then send small messages without delay.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
We currently set a_rwnd to 0 when faking a SACK from SHUTDOWN.
This results in an hung association if the remote only uses
SHUTDOWNs (which it's allowed to do) to acknowlege DATA when
closing. The reason for that is that we simply honor the a_rwnd
from the sack, but since we faked it to be 0, we enter 0-window
probing. The fix is to use the peers old rwnd and add our flight
size to it.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
SCTP has a problem that when small chunks are used, it is possible
to exhaust the receiver buffer without fully closing receive window.
This happens due to all overhead that we have account for with small
messages. To fix this, when receive buffer is exceeded, we'll drop
the window to 0 and save the 'drop' portion. When application starts
reading data and freeing up recevie buffer space, we'll wait until
we've reached the 'drop' window and then add back this 'drop' one
mtu at a time. This worked well in testing and under stress produced
rather even recovery.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
If T3 timer expires, we are retransmitting data due to timeout any
any fast recovery is null and void. We can clear the fast recovery
flag.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
SCTP RFC 4960 states that unacknowledged HEARTBEATS count as
errors agains a given transport or endpoint. As such, we
should increment the error counts for only for unacknowledged
HB, otherwise we detect failure too soon. This goes for both
the overall error count and the path error count.
Now, there is a difference in how the detection is done
between the two. The path error detection is done after
the increment, so to detect it properly, we actually need
to exceed the path threshold. The overall error detection
is done _BEFORE_ the increment. Thus to detect the failure,
it's enough for the error count to match the threshold.
This is why all the state functions use '>=' to detect failure,
while path detection uses '>'.
Thanks goes to Chunbo Luo <chunbo.luo@windriver.com> who first
proposed patches to fix this issue and made me re-read the spec
and the code to figure out how this cruft really works.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
create_proc_entry() is deprecated (not formally, though).
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
The receiver of the HEARTBEAT should respond with a HEARTBEAT ACK
that contains the Heartbeat Information field copied from the
received HEARTBEAT chunk. So the received HEARTBEAT-ACK chunk
must have a length of:
sizeof(sctp_chunkhdr_t) + sizeof(sctp_sender_hb_info_t)
A badly formatted HB-ACK chunk, it is possible that we may access
invalid memory. We should really make sure that the chunk format
is what we expect, before attempting to touch the data.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
If Cumulative TSN Ack field of SHUTDOWN chunk is less than the
Cumulative TSN Ack Point then drop the SHUTDOWN chunk.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
Currenlty, sctp breaks up user messages into fragments and
sends each fragment to the lower layer by itself. This means
that for each fragment we go all the way down the stack
and back up. This also discourages bundling of multiple
fragments when they can fit into a sigle packet (ex: due
to user setting a low fragmentation threashold).
We introduce a new command SCTP_CMD_SND_MSG and hand the
whole message down state machine. The state machine and
the side-effect parser will cork the queue, add all chunks
from the message to the queue, and then un-cork the queue
thus causing the chunks to get transmitted.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
If the association has a SACK timer pending and now DATA queued
to be send, we'll try to bundle the SACK with the next application send.
As such, try encourage bundling by accounting for SACK in the size
of the first chunk fragment.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
We are now trying to bundle SACKs when we have outbound
DATA to send. However, there are situations where this
outbound DATA will not be sent (due to congestion or
available window). In such cases it's ok to wait for the
timer to expire. This patch refactors the sending code
so that betfore attempting to bundle the SACK we check
to see if the DATA will actually be transmitted.
Based on eirlier works for Doug Graham <dgraham@nortel.com> and
Wei Youngjun <yjwei@cn.fujitsu.com>.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
Since an application may specify the maximum SCTP fragment size
that all data should be fragmented to, we need to fix how
we do segmentation. Right now, if a user specifies a small
fragment size, the segment size can go negative in the presence
of AUTH or COOKIE_ECHO bundling.
What we need to do is track the largest possbile DATA chunk that
can fit into the mtu. Then if the fragment size specified is
bigger then this maximum length, we'll shrink it down. Otherwise,
we just use the smaller segment size without changing it further.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
If a socket has a lot of association that are in the process of
of being closed/aborted, it is possible for a remote to establish
new associations during the time period that the old ones are shutting
down. If this was a result of a close() call, there will be no socket
and will cause a memory leak. We'll prevent this by setting the
socket state to CLOSING and disallow new associations when in this state.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
This patch corrects the conditions under which a SACK will be piggybacked
on a DATA packet. The previous condition was incorrect due to a
misinterpretation of RFC 4960 and/or RFC 2960. Specifically, the
following paragraph from section 6.2 had not been implemented correctly:
Before an endpoint transmits a DATA chunk, if any received DATA
chunks have not been acknowledged (e.g., due to delayed ack), the
sender should create a SACK and bundle it with the outbound DATA
chunk, as long as the size of the final SCTP packet does not exceed
the current MTU. See Section 6.2.
When about to send a DATA chunk, the code now checks to see if the SACK
timer is running. If it is, we know we have a SACK to send to the
peer, so we append the SACK (assuming available space in the packet)
and turn off the timer. For a simple request-response scenario, this
will result in the SACK being bundled with the response, meaning the
the SACK is received quickly by the client, and also meaning that no
separate SACK packet needs to be sent by the server to acknowledge the
request. Prior to this patch, a separate SACK packet would have been
sent by the server SCTP only after its delayed-ACK timer had expired
(usually 200ms). This is wasteful of bandwidth, and can also have a
major negative impact on performance due the interaction of delayed ACKs
with the Nagle algorithm.
Signed-off-by: Doug Graham <dgraham@nortel.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
When the sctp transport is marked down, we can release the
cached route and force a new lookup when attempting to use
this transport for anything. This way, if a better route
or source address is available, we'll try to use it.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
Update the route and saddr entries for the non-active transports as some
of the added addresses can be used as better source addresses, or may
be there is a better route.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
This patch fix to check the unrecognized ASCONF parameter before
access it.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
The return value of sctp_process_asconf_ack() may be
overwritten while process parameters with no error.
This patch fixed the problem.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
arch/microblaze/include/asm/socket.h
|
|
Commit 1748376b6626acf59c24e9592ac67b3fe2a0e026,
net: Use a percpu_counter for sockets_allocated
added percpu_counter function calls to sctp_proc_init code path, but
forgot to add them to sctp_proc_exit(). This resulted in a following
Ooops when performing this test
# modprobe sctp
# rmmod -f sctp
# modprobe sctp
[ 573.862512] BUG: unable to handle kernel paging request at f8214a24
[ 573.862518] IP: [<c0308b8f>] __percpu_counter_init+0x3f/0x70
[ 573.862530] *pde = 37010067 *pte = 00000000
[ 573.862534] Oops: 0002 [#1] SMP
[ 573.862537] last sysfs file: /sys/module/libcrc32c/initstate
[ 573.862540] Modules linked in: sctp(+) crc32c libcrc32c binfmt_misc bridge
stp bnep lp snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep
snd_pcm_oss snd_mixer_oss arc4 joydev snd_pcm ecb pcmcia snd_seq_dummy
snd_seq_oss iwlagn iwlcore snd_seq_midi snd_rawmidi snd_seq_midi_event
yenta_socket rsrc_nonstatic thinkpad_acpi snd_seq snd_timer snd_seq_device
mac80211 psmouse sdhci_pci sdhci nvidia(P) ppdev video snd soundcore serio_raw
pcspkr iTCO_wdt iTCO_vendor_support led_class ricoh_mmc pcmcia_core intel_agp
nvram agpgart usbhid parport_pc parport output snd_page_alloc cfg80211 btusb
ohci1394 ieee1394 e1000e [last unloaded: sctp]
[ 573.862589]
[ 573.862593] Pid: 5373, comm: modprobe Tainted: P R (2.6.31-rc3 #6)
7663B15
[ 573.862596] EIP: 0060:[<c0308b8f>] EFLAGS: 00010286 CPU: 1
[ 573.862599] EIP is at __percpu_counter_init+0x3f/0x70
[ 573.862602] EAX: f8214a20 EBX: f80faa14 ECX: c48c0000 EDX: f80faa20
[ 573.862604] ESI: f80a7000 EDI: 00000000 EBP: f69d5ef0 ESP: f69d5eec
[ 573.862606] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 573.862610] Process modprobe (pid: 5373, ti=f69d4000 task=c2130c70
task.ti=f69d4000)
[ 573.862612] Stack:
[ 573.862613] 00000000 f69d5f18 f80a70a8 f80fa9fc 00000000 fffffffc f69d5f30
c018e2d4
[ 573.862619] <0> 00000000 f80a7000 00000000 f69d5f88 c010112b 00000000
c07029c0 fffffffb
[ 573.862626] <0> 00000000 f69d5f38 c018f83f f69d5f54 c0557cad f80fa860
00000001 c07010c0
[ 573.862634] Call Trace:
[ 573.862644] [<f80a70a8>] ? sctp_init+0xa8/0x7d4 [sctp]
[ 573.862650] [<c018e2d4>] ? marker_update_probe_range+0x184/0x260
[ 573.862659] [<f80a7000>] ? sctp_init+0x0/0x7d4 [sctp]
[ 573.862662] [<c010112b>] ? do_one_initcall+0x2b/0x160
[ 573.862666] [<c018f83f>] ? tracepoint_module_notify+0x2f/0x40
[ 573.862671] [<c0557cad>] ? notifier_call_chain+0x2d/0x70
[ 573.862678] [<c01588fd>] ? __blocking_notifier_call_chain+0x4d/0x60
[ 573.862682] [<c016b2f1>] ? sys_init_module+0xb1/0x1f0
[ 573.862686] [<c0102ffc>] ? sysenter_do_call+0x12/0x28
[ 573.862688] Code: 89 48 08 b8 04 00 00 00 e8 df aa ec ff ba f4 ff ff ff 85
c0 89 43 14 74 31 b8 b0 18 71 c0 e8 19 b9 24 00 a1 c4 18 71 c0 8d 53 0c <89> 50
04 89 43 0c b8 b0 18 71 c0 c7 43 10 c4 18 71 c0 89 15 c4
[ 573.862725] EIP: [<c0308b8f>] __percpu_counter_init+0x3f/0x70 SS:ESP
0068:f69d5eec
[ 573.862730] CR2: 00000000f8214a24
[ 573.862734] ---[ end trace 39c4e0b55e7cf54d ]---
Signed-off-by: Rafael Laufer <rlaufer@cisco.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
String literals are constant, and usually, we can also tag the array
of pointers const too, moving it to the .rodata section.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 'net: Move rx skb_orphan call to where needed' broken sctp protocol
with warning at inet_sock_destruct(). Actually, sctp can do this right with
sctp_sock_rfree_frag() and sctp_skb_set_owner_r_frag() pair.
sctp_sock_rfree_frag(skb);
sctp_skb_set_owner_r_frag(skb, newsk);
This patch not revert the commit d55d87fdff8252d0e2f7c28c2d443aee17e9d70f,
instead remove the sctp_sock_rfree_frag() function.
------------[ cut here ]------------
WARNING: at net/ipv4/af_inet.c:151 inet_sock_destruct+0xe0/0x142()
Modules linked in: sctp ipv6 dm_mirror dm_region_hash dm_log dm_multipath
scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
Pid: 1808, comm: sctp_test Not tainted 2.6.31-rc2 #40
Call Trace:
[<c042dd06>] warn_slowpath_common+0x6a/0x81
[<c064a39a>] ? inet_sock_destruct+0xe0/0x142
[<c042dd2f>] warn_slowpath_null+0x12/0x15
[<c064a39a>] inet_sock_destruct+0xe0/0x142
[<c05fde44>] __sk_free+0x19/0xcc
[<c05fdf50>] sk_free+0x18/0x1a
[<ca0d14ad>] sctp_close+0x192/0x1a1 [sctp]
[<c0649f7f>] inet_release+0x47/0x4d
[<c05fba4d>] sock_release+0x19/0x5e
[<c05fbab3>] sock_close+0x21/0x25
[<c049c31b>] __fput+0xde/0x189
[<c049c3de>] fput+0x18/0x1a
[<c049988f>] filp_close+0x56/0x60
[<c042f422>] put_files_struct+0x5d/0xa1
[<c042f49f>] exit_files+0x39/0x3d
[<c043086a>] do_exit+0x1a5/0x5dd
[<c04a86c2>] ? d_kill+0x35/0x3b
[<c0438fa4>] ? dequeue_signal+0xa6/0x115
[<c0430d05>] do_group_exit+0x63/0x8a
[<c0439504>] get_signal_to_deliver+0x2e1/0x2f9
[<c0401d9e>] do_notify_resume+0x7c/0x6b5
[<c043f601>] ? autoremove_wake_function+0x0/0x34
[<c04a864e>] ? __d_free+0x3d/0x40
[<c04a867b>] ? d_free+0x2a/0x3c
[<c049ba7e>] ? vfs_write+0x103/0x117
[<c05fc8fa>] ? sys_socketcall+0x178/0x182
[<c0402a56>] work_notifysig+0x13/0x19
---[ end trace 9db92c463e789fba ]---
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 'net: skb->dst accessors'(adf30907d63893e4208dfe3f5c88ae12bc2f25d5)
broken the sctp protocol stack, the sctp packet can never be sent out after
Eric Dumazet's patch, which have typo in the sctp code.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Vlad Yasevich <vladisalv.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change all the code that deals directly with ICMPv6 type and code
values to use u8 instead of a signed int as that's the actual data
type.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.
We need to take into account this offset when reporting
sk_wmem_alloc to user, in PROC_FS files or various
ioctls (SIOCOUTQ/TIOCOUTQ)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|