<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4/inetpeer.c, branch v3.2.38</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/net/ipv4/inetpeer.c?h=v3.2.38</id>
<link rel='self' href='https://git.amat.us/linux/atom/net/ipv4/inetpeer.c?h=v3.2.38'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2011-08-07T01:33:19Z</updated>
<entry>
<title>net: Compute protocol sequence numbers and fragment IDs using MD5.</title>
<updated>2011-08-07T01:33:19Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-08-04T03:50:44Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=6e5714eaf77d79ae1c8b47e3e040ff5411b717ec'/>
<id>urn:sha1:6e5714eaf77d79ae1c8b47e3e040ff5411b717ec</id>
<content type='text'>
Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.

MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)

Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation.  So the periodic
regeneration and 8-bit counter have been removed.  We compute and
use a full 32-bit sequence number.

For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.

Reported-by: Dan Kaminsky &lt;dan@doxpara.com&gt;
Tested-by: Willy Tarreau &lt;w@1wt.eu&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: make fragment identifications less predictable</title>
<updated>2011-07-22T04:25:58Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-07-22T04:25:58Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=87c48fa3b4630905f98268dde838ee43626a060c'/>
<id>urn:sha1:87c48fa3b4630905f98268dde838ee43626a060c</id>
<content type='text'>
IPv6 fragment identification generation is way beyond what we use for
IPv4 : It uses a single generator. Its not scalable and allows DOS
attacks.

Now inetpeer is IPv6 aware, we can use it to provide a more secure and
scalable frag ident generator (per destination, instead of system wide)

This patch :
1) defines a new secure_ipv6_id() helper
2) extends inet_getid() to provide 32bit results
3) extends ipv6_select_ident() with a new dest parameter

Reported-by: Fernando Gont &lt;fernando@gont.com.ar&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inetpeer: kill inet_putpeer race</title>
<updated>2011-07-12T03:25:04Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-07-11T02:49:52Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=6d1a3e042f55861a785527a35a6f1ab4217ee810'/>
<id>urn:sha1:6d1a3e042f55861a785527a35a6f1ab4217ee810</id>
<content type='text'>
We currently can free inetpeer entries too early :

[  782.636674] WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (f130f44c)
[  782.636677] 1f7b13c100000000000000000000000002000000000000000000000000000000
[  782.636686]  i i i i u u u u i i i i u u u u i i i i u u u u u u u u u u u u
[  782.636694]                          ^
[  782.636696]
[  782.636698] Pid: 4638, comm: ssh Not tainted 3.0.0-rc5+ #270 Hewlett-Packard HP Compaq 6005 Pro SFF PC/3047h
[  782.636702] EIP: 0060:[&lt;c13fefbb&gt;] EFLAGS: 00010286 CPU: 0
[  782.636707] EIP is at inet_getpeer+0x25b/0x5a0
[  782.636709] EAX: 00000002 EBX: 00010080 ECX: f130f3c0 EDX: f0209d30
[  782.636711] ESI: 0000bc87 EDI: 0000ea60 EBP: f0209ddc ESP: c173134c
[  782.636712]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  782.636714] CR0: 8005003b CR2: f0beca80 CR3: 30246000 CR4: 000006d0
[  782.636716] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  782.636717] DR6: ffff4ff0 DR7: 00000400
[  782.636718]  [&lt;c13fbf76&gt;] rt_set_nexthop.clone.45+0x56/0x220
[  782.636722]  [&lt;c13fc449&gt;] __ip_route_output_key+0x309/0x860
[  782.636724]  [&lt;c141dc54&gt;] tcp_v4_connect+0x124/0x450
[  782.636728]  [&lt;c142ce43&gt;] inet_stream_connect+0xa3/0x270
[  782.636731]  [&lt;c13a8da1&gt;] sys_connect+0xa1/0xb0
[  782.636733]  [&lt;c13a99dd&gt;] sys_socketcall+0x25d/0x2a0
[  782.636736]  [&lt;c149deb8&gt;] sysenter_do_call+0x12/0x28
[  782.636738]  [&lt;ffffffff&gt;] 0xffffffff

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inetpeer: remove unused list</title>
<updated>2011-06-09T00:05:30Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-06-08T13:35:34Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4b9d9be839fdb7dcd7ce7619a623fd9015a50cda'/>
<id>urn:sha1:4b9d9be839fdb7dcd7ce7619a623fd9015a50cda</id>
<content type='text'>
Andi Kleen and Tim Chen reported huge contention on inetpeer
unused_peers.lock, on memcached workload on a 40 core machine, with
disabled route cache.

It appears we constantly flip peers refcnt between 0 and 1 values, and
we must insert/remove peers from unused_peers.list, holding a contended
spinlock.

Remove this list completely and perform a garbage collection on-the-fly,
at lookup time, using the expired nodes we met during the tree
traversal.

This removes a lot of code, makes locking more standard, and obsoletes
two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also
removes two pointers in inet_peer structure.

There is still a false sharing effect because refcnt is in first cache
line of object [were the links and keys used by lookups are located], we
might move it at the end of inet_peer structure to let this first cache
line mostly read by cpus.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
CC: Andi Kleen &lt;andi@firstfloor.org&gt;
CC: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inetpeer: fix race in unused_list manipulations</title>
<updated>2011-05-27T17:39:11Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-05-26T17:27:11Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=686a7e32ca7fdd819eb9606abd3db52b77d1479f'/>
<id>urn:sha1:686a7e32ca7fdd819eb9606abd3db52b77d1479f</id>
<content type='text'>
Several crashes in cleanup_once() were reported in recent kernels.

Commit d6cc1d642de9 (inetpeer: various changes) added a race in
unlink_from_unused().

One way to avoid taking unused_peers.lock before doing the list_empty()
test is to catch 0-&gt;1 refcnt transitions, using full barrier atomic
operations variants (atomic_cmpxchg() and atomic_inc_return()) instead
of previous atomic_inc() and atomic_add_unless() variants.

We then call unlink_from_unused() only for the owner of the 0-&gt;1
transition.

Add a new atomic_add_unless_return() static helper

With help from Arun Sharma.

Refs: https://bugzilla.kernel.org/show_bug.cgi?id=32772

Reported-by: Arun Sharma &lt;asharma@fb.com&gt;
Reported-by: Maximilian Engelhardt &lt;maxi@daemonizer.de&gt;
Reported-by: Yann Dupont &lt;Yann.Dupont@univ-nantes.fr&gt;
Reported-by: Denys Fedoryshchenko &lt;denys@visp.net.lb&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inetpeer: reduce stack usage</title>
<updated>2011-04-12T20:58:33Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-04-11T22:39:40Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=66944e1c5797562cebe2d1857d46dff60bf9a69e'/>
<id>urn:sha1:66944e1c5797562cebe2d1857d46dff60bf9a69e</id>
<content type='text'>
On 64bit arches, we use 752 bytes of stack when cleanup_once() is called
from inet_getpeer().

Lets share the avl stack to save ~376 bytes.

Before patch :

# objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl

0x000006c3 unlink_from_pool [inetpeer.o]:		376
0x00000721 unlink_from_pool [inetpeer.o]:		376
0x00000cb1 inet_getpeer [inetpeer.o]:			376
0x00000e6d inet_getpeer [inetpeer.o]:			376
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5320	    432	     21	   5773	   168d	net/ipv4/inetpeer.o

After patch :

objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl
0x00000c11 inet_getpeer [inetpeer.o]:			376
0x00000dcd inet_getpeer [inetpeer.o]:			376
0x00000ab9 peer_check_expire [inetpeer.o]:		328
0x00000b7f peer_check_expire [inetpeer.o]:		328
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5163	    432	     21	   5616	   15f0	net/ipv4/inetpeer.o

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Scot Doyle &lt;lkml@scotdoyle.com&gt;
Cc: Stephen Hemminger &lt;shemminger@vyatta.com&gt;
Cc: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Reviewed-by: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inetpeer: should use call_rcu() variant</title>
<updated>2011-03-14T06:22:23Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-03-14T06:22:23Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4e75db2e8ff2c97762e87f61f54d7cdeaab1a6b0'/>
<id>urn:sha1:4e75db2e8ff2c97762e87f61f54d7cdeaab1a6b0</id>
<content type='text'>
After commit 7b46ac4e77f3224a (inetpeer: Don't disable BH for initial
fast RCU lookup.), we should use call_rcu() to wait proper RCU grace
period.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Fix PMTU update.</title>
<updated>2011-03-14T01:37:49Z</updated>
<author>
<name>Hiroaki SHIMODA</name>
<email>shimoda.hiroaki@gmail.com</email>
</author>
<published>2011-03-09T20:09:58Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=46af31800b6916c92fffa529dc3c357008da957d'/>
<id>urn:sha1:46af31800b6916c92fffa529dc3c357008da957d</id>
<content type='text'>
On current net-next-2.6, when Linux receives ICMP Type: 3, Code: 4
(Destination unreachable (Fragmentation needed)),

  icmp_unreach
    -&gt; ip_rt_frag_needed
         (peer-&gt;pmtu_expires is set here)
    -&gt; tcp_v4_err
         -&gt; do_pmtu_discovery
              -&gt; ip_rt_update_pmtu
                   (peer-&gt;pmtu_expires is already set,
                    so check_peer_pmtu is skipped.)
                   -&gt; check_peer_pmtu

check_peer_pmtu is skipped and MTU is not updated.

To fix this, let check_peer_pmtu execute unconditionally.
And some minor fixes
1) Avoid potential peer-&gt;pmtu_expires set to be zero.
2) In check_peer_pmtu, argument of time_before is reversed.
3) check_peer_pmtu expects peer-&gt;pmtu_orig is initialized as zero,
   but not initialized.

Signed-off-by: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inetpeer: Don't disable BH for initial fast RCU lookup.</title>
<updated>2011-03-08T22:59:28Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-03-08T22:59:28Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=7b46ac4e77f3224a1befe032c77f1df31d1b42c4'/>
<id>urn:sha1:7b46ac4e77f3224a1befe032c77f1df31d1b42c4</id>
<content type='text'>
If modifications on other cpus are ok, then modifications to
the tree during lookup done by the local cpu are ok too.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inetpeer: seqlock optimization</title>
<updated>2011-03-04T22:33:59Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-03-04T22:33:59Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=65e8354ec13a45414045084166cb340c0d7ffe8a'/>
<id>urn:sha1:65e8354ec13a45414045084166cb340c0d7ffe8a</id>
<content type='text'>
David noticed :

------------------
Eric, I was profiling the non-routing-cache case and something that
stuck out is the case of calling inet_getpeer() with create==0.

If an entry is not found, we have to redo the lookup under a spinlock
to make certain that a concurrent writer rebalancing the tree does
not "hide" an existing entry from us.

This makes the case of a create==0 lookup for a not-present entry
really expensive.  It is on the order of 600 cpu cycles on my
Niagara2.

I added a hack to not do the relookup under the lock when create==0
and it now costs less than 300 cycles.

This is now a pretty common operation with the way we handle COW'd
metrics, so I think it's definitely worth optimizing.
-----------------

One solution is to use a seqlock instead of a spinlock to protect struct
inet_peer_base.

After a failed avl tree lookup, we can easily detect if a writer did
some changes during our lookup. Taking the lock and redo the lookup is
only necessary in this case.

Note: Add one private rcu_deref_locked() macro to place in one spot the
access to spinlock included in seqlock.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
