<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4, branch v3.10.52</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/net/ipv4?h=v3.10.52</id>
<link rel='self' href='https://git.amat.us/linux/atom/net/ipv4?h=v3.10.52'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2014-07-28T15:00:06Z</updated>
<entry>
<title>ipv4: fix buffer overflow in ip_options_compile()</title>
<updated>2014-07-28T15:00:06Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-07-21T05:17:42Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=36b526620dcc8e01330964ef88c1ec5217027781'/>
<id>urn:sha1:36b526620dcc8e01330964ef88c1ec5217027781</id>
<content type='text'>
[ Upstream commit 10ec9472f05b45c94db3c854d22581a20b97db41 ]

There is a benign buffer overflow in ip_options_compile spotted by
AddressSanitizer[1] :

Its benign because we always can access one extra byte in skb-&gt;head
(because header is followed by struct skb_shared_info), and in this case
this byte is not even used.

[28504.910798] ==================================================================
[28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile
[28504.913170] Read of size 1 by thread T15843:
[28504.914026]  [&lt;ffffffff81802f91&gt;] ip_options_compile+0x121/0x9c0
[28504.915394]  [&lt;ffffffff81804a0d&gt;] ip_options_get_from_user+0xad/0x120
[28504.916843]  [&lt;ffffffff8180dedf&gt;] do_ip_setsockopt.isra.15+0x8df/0x1630
[28504.918175]  [&lt;ffffffff8180ec60&gt;] ip_setsockopt+0x30/0xa0
[28504.919490]  [&lt;ffffffff8181e59b&gt;] tcp_setsockopt+0x5b/0x90
[28504.920835]  [&lt;ffffffff8177462f&gt;] sock_common_setsockopt+0x5f/0x70
[28504.922208]  [&lt;ffffffff817729c2&gt;] SyS_setsockopt+0xa2/0x140
[28504.923459]  [&lt;ffffffff818cfb69&gt;] system_call_fastpath+0x16/0x1b
[28504.924722]
[28504.925106] Allocated by thread T15843:
[28504.925815]  [&lt;ffffffff81804995&gt;] ip_options_get_from_user+0x35/0x120
[28504.926884]  [&lt;ffffffff8180dedf&gt;] do_ip_setsockopt.isra.15+0x8df/0x1630
[28504.927975]  [&lt;ffffffff8180ec60&gt;] ip_setsockopt+0x30/0xa0
[28504.929175]  [&lt;ffffffff8181e59b&gt;] tcp_setsockopt+0x5b/0x90
[28504.930400]  [&lt;ffffffff8177462f&gt;] sock_common_setsockopt+0x5f/0x70
[28504.931677]  [&lt;ffffffff817729c2&gt;] SyS_setsockopt+0xa2/0x140
[28504.932851]  [&lt;ffffffff818cfb69&gt;] system_call_fastpath+0x16/0x1b
[28504.934018]
[28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right
[28504.934377]  of 40-byte region [ffff880026382800, ffff880026382828)
[28504.937144]
[28504.937474] Memory state around the buggy address:
[28504.938430]  ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr
[28504.939884]  ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.941294]  ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
[28504.942504]  ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.943483]  ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28504.944511] &gt;ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
[28504.945573]                         ^
[28504.946277]  ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.094949]  ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.096114]  ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.097116]  ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.098472]  ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
[28505.099804] Legend:
[28505.100269]  f - 8 freed bytes
[28505.100884]  r - 8 redzone bytes
[28505.101649]  . - 8 allocated bytes
[28505.102406]  x=1..7 - x allocated bytes + (8-x) redzone bytes
[28505.103637] ==================================================================

[1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tcp: fix false undo corner cases</title>
<updated>2014-07-28T15:00:05Z</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2014-07-02T19:07:16Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=76fd3c89bb35027abbd483929d92267479f7346a'/>
<id>urn:sha1:76fd3c89bb35027abbd483929d92267479f7346a</id>
<content type='text'>
[ Upstream commit 6e08d5e3c8236e7484229e46fdf92006e1dd4c49 ]

The undo code assumes that, upon entering loss recovery, TCP
1) always retransmit something
2) the retransmission never fails locally (e.g., qdisc drop)

so undo_marker is set in tcp_enter_recovery() and undo_retrans is
incremented only when tcp_retransmit_skb() is successful.

When the assumption is broken because TCP's cwnd is too small to
retransmit or the retransmit fails locally. The next (DUP)ACK
would incorrectly revert the cwnd and the congestion state in
tcp_try_undo_dsack() or tcp_may_undo(). Subsequent (DUP)ACKs
may enter the recovery state. The sender repeatedly enter and
(incorrectly) exit recovery states if the retransmits continue to
fail locally while receiving (DUP)ACKs.

The fix is to initialize undo_retrans to -1 and start counting on
the first retransmission. Always increment undo_retrans even if the
retransmissions fail locally because they couldn't cause DSACKs to
undo the cwnd reduction.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>igmp: fix the problem when mc leave group</title>
<updated>2014-07-28T15:00:05Z</updated>
<author>
<name>dingtianhong</name>
<email>dingtianhong@huawei.com</email>
</author>
<published>2014-07-02T05:50:48Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=48b60bb7b53285622a808193334a0d558ebbea87'/>
<id>urn:sha1:48b60bb7b53285622a808193334a0d558ebbea87</id>
<content type='text'>
[ Upstream commit 52ad353a5344f1f700c5b777175bdfa41d3cd65a ]

The problem was triggered by these steps:

1) create socket, bind and then setsockopt for add mc group.
   mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
   mreq.imr_interface.s_addr = inet_addr("192.168.1.2");
   setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &amp;mreq, sizeof(mreq));

2) drop the mc group for this socket.
   mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
   mreq.imr_interface.s_addr = inet_addr("0.0.0.0");
   setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &amp;mreq, sizeof(mreq));

3) and then drop the socket, I found the mc group was still used by the dev:

   netstat -g

   Interface       RefCnt Group
   --------------- ------ ---------------------
   eth2		   1	  255.0.0.37

Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need
to be released for the netdev when drop the socket, but this process was broken when
route default is NULL, the reason is that:

The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr
is NULL, the default route dev will be chosen, then the ifindex is got from the dev,
then polling the inet-&gt;mc_list and return -ENODEV, but if the default route dev is NULL,
the in_dev and ifIndex is both NULL, when polling the inet-&gt;mc_list, the mc group will be
released from the mc_list, but the dev didn't dec the refcnt for this mc group, so
when dropping the socket, the mc_list is NULL and the dev still keep this group.

v1-&gt;v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs,
	so I add the checking for the in_dev before polling the mc_list, make sure when
	we remove the mc group, dec the refcnt to the real dev which was using the mc address.
	The problem would never happened again.

Signed-off-by: Ding Tianhong &lt;dingtianhong@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ipv4: icmp: Fix pMTU handling for rare case</title>
<updated>2014-07-28T15:00:04Z</updated>
<author>
<name>Edward Allcutt</name>
<email>edward.allcutt@openmarket.com</email>
</author>
<published>2014-06-30T15:16:02Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=08d9137a5e01f7c977b81feefc480da6ce9d7a4b'/>
<id>urn:sha1:08d9137a5e01f7c977b81feefc480da6ce9d7a4b</id>
<content type='text'>
[ Upstream commit 68b7107b62983f2cff0948292429d5f5999df096 ]

Some older router implementations still send Fragmentation Needed
errors with the Next-Hop MTU field set to zero. This is explicitly
described as an eventuality that hosts must deal with by the
standard (RFC 1191) since older standards specified that those
bits must be zero.

Linux had a generic (for all of IPv4) implementation of the algorithm
described in the RFC for searching a list of MTU plateaus for a good
value. Commit 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
removed this as part of the changes to remove the routing cache.
Subsequently any Fragmentation Needed packet with a zero Next-Hop
MTU has been discarded without being passed to the per-protocol
handlers or notifying userspace for raw sockets.

When there is a router which does not implement RFC 1191 on an
MTU limited path then this results in stalled connections since
large packets are discarded and the local protocols are not
notified so they never attempt to lower the pMTU.

One example I have seen is an OpenBSD router terminating IPSec
tunnels. It's worth pointing out that this case is distinct from
the BSD 4.2 bug which incorrectly calculated the Next-Hop MTU
since the commit in question dismissed that as a valid concern.

All of the per-protocols handlers implement the simple approach from
RFC 1191 of immediately falling back to the minimum value. Although
this is sub-optimal it is vastly preferable to connections hanging
indefinitely.

Remove the Next-Hop MTU != 0 check and allow such packets
to follow the normal path.

Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
Signed-off-by: Edward Allcutt &lt;edward.allcutt@openmarket.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tcp: Fix divide by zero when pushing during tcp-repair</title>
<updated>2014-07-28T15:00:04Z</updated>
<author>
<name>Christoph Paasch</name>
<email>christoph.paasch@uclouvain.be</email>
</author>
<published>2014-06-28T16:26:37Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=00be00119aa3e62aa234a9ccc03010b2504c1096'/>
<id>urn:sha1:00be00119aa3e62aa234a9ccc03010b2504c1096</id>
<content type='text'>
[ Upstream commit 5924f17a8a30c2ae18d034a86ee7581b34accef6 ]

When in repair-mode and TCP_RECV_QUEUE is set, we end up calling
tcp_push with mss_now being 0. If data is in the send-queue and
tcp_set_skb_tso_segs gets called, we crash because it will divide by
mss_now:

[  347.151939] divide error: 0000 [#1] SMP
[  347.152907] Modules linked in:
[  347.152907] CPU: 1 PID: 1123 Comm: packetdrill Not tainted 3.16.0-rc2 #4
[  347.152907] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  347.152907] task: f5b88540 ti: f3c82000 task.ti: f3c82000
[  347.152907] EIP: 0060:[&lt;c1601359&gt;] EFLAGS: 00210246 CPU: 1
[  347.152907] EIP is at tcp_set_skb_tso_segs+0x49/0xa0
[  347.152907] EAX: 00000b67 EBX: f5acd080 ECX: 00000000 EDX: 00000000
[  347.152907] ESI: f5a28f40 EDI: f3c88f00 EBP: f3c83d10 ESP: f3c83d00
[  347.152907]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  347.152907] CR0: 80050033 CR2: 083158b0 CR3: 35146000 CR4: 000006b0
[  347.152907] Stack:
[  347.152907]  c167f9d9 f5acd080 000005b4 00000002 f3c83d20 c16013e6 f3c88f00 f5acd080
[  347.152907]  f3c83da0 c1603b5a f3c83d38 c10a0188 00000000 00000000 f3c83d84 c10acc85
[  347.152907]  c1ad5ec0 00000000 00000000 c1ad679c 010003e0 00000000 00000000 f3c88fc8
[  347.152907] Call Trace:
[  347.152907]  [&lt;c167f9d9&gt;] ? apic_timer_interrupt+0x2d/0x34
[  347.152907]  [&lt;c16013e6&gt;] tcp_init_tso_segs+0x36/0x50
[  347.152907]  [&lt;c1603b5a&gt;] tcp_write_xmit+0x7a/0xbf0
[  347.152907]  [&lt;c10a0188&gt;] ? up+0x28/0x40
[  347.152907]  [&lt;c10acc85&gt;] ? console_unlock+0x295/0x480
[  347.152907]  [&lt;c10ad24f&gt;] ? vprintk_emit+0x1ef/0x4b0
[  347.152907]  [&lt;c1605716&gt;] __tcp_push_pending_frames+0x36/0xd0
[  347.152907]  [&lt;c15f4860&gt;] tcp_push+0xf0/0x120
[  347.152907]  [&lt;c15f7641&gt;] tcp_sendmsg+0xf1/0xbf0
[  347.152907]  [&lt;c116d920&gt;] ? kmem_cache_free+0xf0/0x120
[  347.152907]  [&lt;c106a682&gt;] ? __sigqueue_free+0x32/0x40
[  347.152907]  [&lt;c106a682&gt;] ? __sigqueue_free+0x32/0x40
[  347.152907]  [&lt;c114f0f0&gt;] ? do_wp_page+0x3e0/0x850
[  347.152907]  [&lt;c161c36a&gt;] inet_sendmsg+0x4a/0xb0
[  347.152907]  [&lt;c1150269&gt;] ? handle_mm_fault+0x709/0xfb0
[  347.152907]  [&lt;c15a006b&gt;] sock_aio_write+0xbb/0xd0
[  347.152907]  [&lt;c1180b79&gt;] do_sync_write+0x69/0xa0
[  347.152907]  [&lt;c1181023&gt;] vfs_write+0x123/0x160
[  347.152907]  [&lt;c1181d55&gt;] SyS_write+0x55/0xb0
[  347.152907]  [&lt;c167f0d8&gt;] sysenter_do_call+0x12/0x28

This can easily be reproduced with the following packetdrill-script (the
"magic" with netem, sk_pacing and limit_output_bytes is done to prevent
the kernel from pushing all segments, because hitting the limit without
doing this is not so easy with packetdrill):

0   socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0  setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

+0  bind(3, ..., ...) = 0
+0  listen(3, 1) = 0

+0  &lt; S 0:0(0) win 32792 &lt;mss 1460&gt;
+0  &gt; S. 0:0(0) ack 1 &lt;mss 1460&gt;
+0.1  &lt; . 1:1(0) ack 1 win 65000

+0  accept(3, ..., ...) = 4

// This forces that not all segments of the snd-queue will be pushed
+0 `tc qdisc add dev tun0 root netem delay 10ms`
+0 `sysctl -w net.ipv4.tcp_limit_output_bytes=2`
+0 setsockopt(4, SOL_SOCKET, 47, [2], 4) = 0

+0 write(4,...,10000) = 10000
+0 write(4,...,10000) = 10000

// Set tcp-repair stuff, particularly TCP_RECV_QUEUE
+0 setsockopt(4, SOL_TCP, 19, [1], 4) = 0
+0 setsockopt(4, SOL_TCP, 20, [1], 4) = 0

// This now will make the write push the remaining segments
+0 setsockopt(4, SOL_SOCKET, 47, [20000], 4) = 0
+0 `sysctl -w net.ipv4.tcp_limit_output_bytes=130000`

// Now we will crash
+0 write(4,...,1000) = 1000

This happens since ec3423257508 (tcp: fix retransmission in repair
mode). Prior to that, the call to tcp_push was prevented by a check for
tp-&gt;repair.

The patch fixes it, by adding the new goto-label out_nopush. When exiting
tcp_sendmsg and a push is not required, which is the case for tp-&gt;repair,
we go to this label.

When repairing and calling send() with TCP_RECV_QUEUE, the data is
actually put in the receive-queue. So, no push is required because no
data has been added to the send-queue.

Cc: Andrew Vagin &lt;avagin@openvz.org&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Fixes: ec3423257508 (tcp: fix retransmission in repair mode)
Signed-off-by: Christoph Paasch &lt;christoph.paasch@uclouvain.be&gt;
Acked-by: Andrew Vagin &lt;avagin@openvz.org&gt;
Acked-by: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix</title>
<updated>2014-07-28T15:00:04Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-06-30T08:26:23Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f1e1b06f19e1ddcebcee56ba33845ded7bf719ac'/>
<id>urn:sha1:f1e1b06f19e1ddcebcee56ba33845ded7bf719ac</id>
<content type='text'>
[ Upstream commit 7f502361531e9eecb396cf99bdc9e9a59f7ebd7f ]

We have two different ways to handle changes to sk-&gt;sk_dst

First way (used by TCP) assumes socket lock is owned by caller, and use
no extra lock : __sk_dst_set() &amp; __sk_dst_reset()

Another way (used by UDP) uses sk_dst_lock because socket lock is not
always taken. Note that sk_dst_lock is not softirq safe.

These ways are not inter changeable for a given socket type.

ipv4_sk_update_pmtu(), added in linux-3.8, added a race, as it used
the socket lock as synchronization, but users might be UDP sockets.

Instead of converting sk_dst_lock to a softirq safe version, use xchg()
as we did for sk_rx_dst in commit e47eb5dfb296b ("udp: ipv4: do not use
sk_dst_lock from softirq context")

In a follow up patch, we probably can remove sk_dst_lock, as it is
only used in IPv6.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Fixes: 9cb3a50c5f63e ("ipv4: Invalidate the socket cached route on pmtu events if possible")
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb</title>
<updated>2014-07-28T15:00:04Z</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2014-06-19T01:15:03Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=856443cb555a75b9700d3fabf5965b46337de199'/>
<id>urn:sha1:856443cb555a75b9700d3fabf5965b46337de199</id>
<content type='text'>
[ Upstream commit 2cd0d743b05e87445c54ca124a9916f22f16742e ]

If there is an MSS change (or misbehaving receiver) that causes a SACK
to arrive that covers the end of an skb but is less than one MSS, then
tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
the skb ("Round if necessary..."), then chopping all bytes off the skb
and creating a zero-byte skb in the write queue.

This was visible now because the recently simplified TLP logic in
bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte
skb at the end of the write queue, and now that we do not check that
skb's length we could send it as a TLP probe.

Consider the following example scenario:

 mss: 1000
 skb: seq: 0 end_seq: 4000  len: 4000
 SACK: start_seq: 3999 end_seq: 4000

The tcp_match_skb_to_sack() code will compute:

 in_sack = false
 pkt_len = start_seq - TCP_SKB_CB(skb)-&gt;seq = 3999 - 0 = 3999
 new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
 new_len += mss = 4000

Previously we would find the new_len &gt; skb-&gt;len check failing, so we
would fall through and set pkt_len = new_len = 4000 and chop off
pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
afterward in the write queue.

With this new commit, we notice that the new new_len &gt;= skb-&gt;len check
succeeds, so that we return without trying to fragment.

Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries")
Reported-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Ilpo Jarvinen &lt;ilpo.jarvinen@helsinki.fi&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ip_tunnel: fix ip_tunnel_lookup</title>
<updated>2014-07-28T15:00:04Z</updated>
<author>
<name>Dmitry Popov</name>
<email>ixaphire@qrator.net</email>
</author>
<published>2014-07-04T22:26:37Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=296692cab2e13d7bae70dd9ebfaf32eb36ec2793'/>
<id>urn:sha1:296692cab2e13d7bae70dd9ebfaf32eb36ec2793</id>
<content type='text'>
[ Upstream commit e0056593b61253f1a8a9941dacda22e73b963cdc ]

This patch fixes 3 similar bugs where incoming packets might be routed into
wrong non-wildcard tunnels:

1) Consider the following setup:
    ip address add 1.1.1.1/24 dev eth0
    ip address add 1.1.1.2/24 dev eth0
    ip tunnel add ipip1 remote 2.2.2.2 local 1.1.1.1 mode ipip dev eth0
    ip link set ipip1 up

Incoming ipip packets from 2.2.2.2 were routed into ipip1 even if it has dst =
1.1.1.2. Moreover even if there was wildcard tunnel like
   ip tunnel add ipip0 remote 2.2.2.2 local any mode ipip dev eth0
but it was created before explicit one (with local 1.1.1.1), incoming ipip
packets with src = 2.2.2.2 and dst = 1.1.1.2 were still routed into ipip1.

Same issue existed with all tunnels that use ip_tunnel_lookup (gre, vti)

2)  ip address add 1.1.1.1/24 dev eth0
    ip tunnel add ipip1 remote 2.2.146.85 local 1.1.1.1 mode ipip dev eth0
    ip link set ipip1 up

Incoming ipip packets with dst = 1.1.1.1 were routed into ipip1, no matter what
src address is. Any remote ip address which has ip_tunnel_hash = 0 raised this
issue, 2.2.146.85 is just an example, there are more than 4 million of them.
And again, wildcard tunnel like
   ip tunnel add ipip0 remote any local 1.1.1.1 mode ipip dev eth0
wouldn't be ever matched if it was created before explicit tunnel like above.

Gre &amp; vti tunnels had the same issue.

3)  ip address add 1.1.1.1/24 dev eth0
    ip tunnel add gre1 remote 2.2.146.84 local 1.1.1.1 key 1 mode gre dev eth0
    ip link set gre1 up

Any incoming gre packet with key = 1 were routed into gre1, no matter what
src/dst addresses are. Any remote ip address which has ip_tunnel_hash = 0 raised
the issue, 2.2.146.84 is just an example, there are more than 4 million of them.
Wildcard tunnel like
   ip tunnel add gre2 remote any local any key 1 mode gre dev eth0
wouldn't be ever matched if it was created before explicit tunnel like above.

All this stuff happened because while looking for a wildcard tunnel we didn't
check that matched tunnel is a wildcard one. Fixed.

Signed-off-by: Dmitry Popov &lt;ixaphire@qrator.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>netfilter: ipt_ULOG: fix info leaks</title>
<updated>2014-07-07T01:54:15Z</updated>
<author>
<name>Mathias Krause</name>
<email>minipli@googlemail.com</email>
</author>
<published>2013-09-30T20:05:08Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3ca696839223ed38faeaf9b215b35c91ef6a5811'/>
<id>urn:sha1:3ca696839223ed38faeaf9b215b35c91ef6a5811</id>
<content type='text'>
commit 278f2b3e2af5f32ea1afe34fa12a2518153e6e49 upstream.

The ulog messages leak heap bytes by the means of padding bytes and
incompletely filled string arrays. Fix those by memset(0)'ing the
whole struct before filling it.

Signed-off-by: Mathias Krause &lt;minipli@googlemail.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Cc: Jan Tore Morken &lt;jantore@morken.priv.no&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>ipv4: fix a race in ip4_datagram_release_cb()</title>
<updated>2014-06-26T19:12:38Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-06-10T13:43:01Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9138f241fc79c4124b6ce46ff08a6ec8c07c5ed7'/>
<id>urn:sha1:9138f241fc79c4124b6ce46ff08a6ec8c07c5ed7</id>
<content type='text'>
[ Upstream commit 9709674e68646cee5a24e3000b3558d25412203a ]

Alexey gave a AddressSanitizer[1] report that finally gave a good hint
at where was the origin of various problems already reported by Dormando
in the past [2]

Problem comes from the fact that UDP can have a lockless TX path, and
concurrent threads can manipulate sk_dst_cache, while another thread,
is holding socket lock and calls __sk_dst_set() in
ip4_datagram_release_cb() (this was added in linux-3.8)

It seems that all we need to do is to use sk_dst_check() and
sk_dst_set() so that all the writers hold same spinlock
(sk-&gt;sk_dst_lock) to prevent corruptions.

TCP stack do not need this protection, as all sk_dst_cache writers hold
the socket lock.

[1]
https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

AddressSanitizer: heap-use-after-free in ipv4_dst_check
Read of size 2 by thread T15453:
 [&lt;ffffffff817daa3a&gt;] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116
 [&lt;ffffffff8175b789&gt;] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531
 [&lt;ffffffff81830a36&gt;] ip4_datagram_release_cb+0x46/0x390 ??:0
 [&lt;ffffffff8175eaea&gt;] release_sock+0x17a/0x230 ./net/core/sock.c:2413
 [&lt;ffffffff81830882&gt;] ip4_datagram_connect+0x462/0x5d0 ??:0
 [&lt;ffffffff81846d06&gt;] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [&lt;ffffffff817580ac&gt;] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [&lt;ffffffff817596ce&gt;] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [&lt;ffffffff818b0a29&gt;] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

Freed by thread T15455:
 [&lt;ffffffff8178d9b8&gt;] dst_destroy+0xa8/0x160 ./net/core/dst.c:251
 [&lt;ffffffff8178de25&gt;] dst_release+0x45/0x80 ./net/core/dst.c:280
 [&lt;ffffffff818304c1&gt;] ip4_datagram_connect+0xa1/0x5d0 ??:0
 [&lt;ffffffff81846d06&gt;] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [&lt;ffffffff817580ac&gt;] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [&lt;ffffffff817596ce&gt;] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [&lt;ffffffff818b0a29&gt;] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

Allocated by thread T15453:
 [&lt;ffffffff8178d291&gt;] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171
 [&lt;ffffffff817db3b7&gt;] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406
 [&lt;     inlined    &gt;] __ip_route_output_key+0x3e8/0xf70
__mkroute_output ./net/ipv4/route.c:1939
 [&lt;ffffffff817dde08&gt;] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161
 [&lt;ffffffff817deb34&gt;] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249
 [&lt;ffffffff81830737&gt;] ip4_datagram_connect+0x317/0x5d0 ??:0
 [&lt;ffffffff81846d06&gt;] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [&lt;ffffffff817580ac&gt;] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [&lt;ffffffff817596ce&gt;] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [&lt;ffffffff818b0a29&gt;] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

[2]
&lt;4&gt;[196727.311203] general protection fault: 0000 [#1] SMP
&lt;4&gt;[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
&lt;4&gt;[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
&lt;4&gt;[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
&lt;4&gt;[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000
&lt;4&gt;[196727.311377] RIP: 0010:[&lt;ffffffff815f8c7f&gt;]  [&lt;ffffffff815f8c7f&gt;] ipv4_dst_destroy+0x4f/0x80
&lt;4&gt;[196727.311399] RSP: 0018:ffff885effd23a70  EFLAGS: 00010282
&lt;4&gt;[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040
&lt;4&gt;[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200
&lt;4&gt;[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800
&lt;4&gt;[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
&lt;4&gt;[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce
&lt;4&gt;[196727.311510] FS:  0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000
&lt;4&gt;[196727.311554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
&lt;4&gt;[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0
&lt;4&gt;[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
&lt;4&gt;[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
&lt;4&gt;[196727.311713] Stack:
&lt;4&gt;[196727.311733]  ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
&lt;4&gt;[196727.311784]  ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
&lt;4&gt;[196727.311834]  ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
&lt;4&gt;[196727.311885] Call Trace:
&lt;4&gt;[196727.311907]  &lt;IRQ&gt;
&lt;4&gt;[196727.311912]  [&lt;ffffffff815b7f42&gt;] dst_destroy+0x32/0xe0
&lt;4&gt;[196727.311959]  [&lt;ffffffff815b86c6&gt;] dst_release+0x56/0x80
&lt;4&gt;[196727.311986]  [&lt;ffffffff81620bd5&gt;] tcp_v4_do_rcv+0x2a5/0x4a0
&lt;4&gt;[196727.312013]  [&lt;ffffffff81622b5a&gt;] tcp_v4_rcv+0x7da/0x820
&lt;4&gt;[196727.312041]  [&lt;ffffffff815fd9e0&gt;] ? ip_rcv_finish+0x360/0x360
&lt;4&gt;[196727.312070]  [&lt;ffffffff815de02d&gt;] ? nf_hook_slow+0x7d/0x150
&lt;4&gt;[196727.312097]  [&lt;ffffffff815fd9e0&gt;] ? ip_rcv_finish+0x360/0x360
&lt;4&gt;[196727.312125]  [&lt;ffffffff815fda92&gt;] ip_local_deliver_finish+0xb2/0x230
&lt;4&gt;[196727.312154]  [&lt;ffffffff815fdd9a&gt;] ip_local_deliver+0x4a/0x90
&lt;4&gt;[196727.312183]  [&lt;ffffffff815fd799&gt;] ip_rcv_finish+0x119/0x360
&lt;4&gt;[196727.312212]  [&lt;ffffffff815fe00b&gt;] ip_rcv+0x22b/0x340
&lt;4&gt;[196727.312242]  [&lt;ffffffffa0339680&gt;] ? macvlan_broadcast+0x160/0x160 [macvlan]
&lt;4&gt;[196727.312275]  [&lt;ffffffff815b0c62&gt;] __netif_receive_skb_core+0x512/0x640
&lt;4&gt;[196727.312308]  [&lt;ffffffff811427fb&gt;] ? kmem_cache_alloc+0x13b/0x150
&lt;4&gt;[196727.312338]  [&lt;ffffffff815b0db1&gt;] __netif_receive_skb+0x21/0x70
&lt;4&gt;[196727.312368]  [&lt;ffffffff815b0fa1&gt;] netif_receive_skb+0x31/0xa0
&lt;4&gt;[196727.312397]  [&lt;ffffffff815b1ae8&gt;] napi_gro_receive+0xe8/0x140
&lt;4&gt;[196727.312433]  [&lt;ffffffffa00274f1&gt;] ixgbe_poll+0x551/0x11f0 [ixgbe]
&lt;4&gt;[196727.312463]  [&lt;ffffffff815fe00b&gt;] ? ip_rcv+0x22b/0x340
&lt;4&gt;[196727.312491]  [&lt;ffffffff815b1691&gt;] net_rx_action+0x111/0x210
&lt;4&gt;[196727.312521]  [&lt;ffffffff815b0db1&gt;] ? __netif_receive_skb+0x21/0x70
&lt;4&gt;[196727.312552]  [&lt;ffffffff810519d0&gt;] __do_softirq+0xd0/0x270
&lt;4&gt;[196727.312583]  [&lt;ffffffff816cef3c&gt;] call_softirq+0x1c/0x30
&lt;4&gt;[196727.312613]  [&lt;ffffffff81004205&gt;] do_softirq+0x55/0x90
&lt;4&gt;[196727.312640]  [&lt;ffffffff81051c85&gt;] irq_exit+0x55/0x60
&lt;4&gt;[196727.312668]  [&lt;ffffffff816cf5c3&gt;] do_IRQ+0x63/0xe0
&lt;4&gt;[196727.312696]  [&lt;ffffffff816c5aaa&gt;] common_interrupt+0x6a/0x6a
&lt;4&gt;[196727.312722]  &lt;EOI&gt;
&lt;1&gt;[196727.313071] RIP  [&lt;ffffffff815f8c7f&gt;] ipv4_dst_destroy+0x4f/0x80
&lt;4&gt;[196727.313100]  RSP &lt;ffff885effd23a70&gt;
&lt;4&gt;[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
&lt;0&gt;[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt

Reported-by: Alexey Preobrazhensky &lt;preobr@google.com&gt;
Reported-by: dormando &lt;dormando@rydia.ne&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Fixes: 8141ed9fcedb2 ("ipv4: Add a socket release callback for datagram sockets")
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
