<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/core/sysctl_net_core.c, branch v3.0.82</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/net/core/sysctl_net_core.c?h=v3.0.82</id>
<link rel='self' href='https://git.amat.us/linux/atom/net/core/sysctl_net_core.c?h=v3.0.82'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2011-05-27T17:41:33Z</updated>
<entry>
<title>net: Kill ratelimit.h dependency in linux/net.h</title>
<updated>2011-05-27T17:41:33Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-05-27T17:41:33Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=c5c177b4aca83338781e72be2e6dd1601c560cb3'/>
<id>urn:sha1:c5c177b4aca83338781e72be2e6dd1601c560cb3</id>
<content type='text'>
Ingo Molnar noticed that we have this unnecessary ratelimit.h
dependency in linux/net.h, which hid compilation problems from
people doing builds only with CONFIG_NET enabled.

Move this stuff out to a seperate net/net_ratelimit.h file and
include that in the only two places where this thing is needed.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>net: filter: Just In Time compiler for x86-64</title>
<updated>2011-04-28T06:05:08Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-04-20T09:27:32Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=0a14842f5a3c0e88a1e59fac5c3025db39721f74'/>
<id>urn:sha1:0a14842f5a3c0e88a1e59fac5c3025db39721f74</id>
<content type='text'>
In order to speedup packet filtering, here is an implementation of a
JIT compiler for x86_64

It is disabled by default, and must be enabled by the admin.

echo 1 &gt;/proc/sys/net/core/bpf_jit_enable

It uses module_alloc() and module_free() to get memory in the 2GB text
kernel range since we call helpers functions from the generated code.

EAX : BPF A accumulator
EBX : BPF X accumulator
RDI : pointer to skb   (first argument given to JIT function)
RBP : frame pointer (even if CONFIG_FRAME_POINTER=n)
r9d : skb-&gt;len - skb-&gt;data_len (headlen)
r8  : skb-&gt;data

To get a trace of generated code, use :

echo 2 &gt;/proc/sys/net/core/bpf_jit_enable

Example of generated code :

# tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24

flen=18 proglen=147 pass=3 image=ffffffffa00b5000
JIT code: ffffffffa00b5000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 60
JIT code: ffffffffa00b5010: 44 2b 4f 64 4c 8b 87 b8 00 00 00 be 0c 00 00 00
JIT code: ffffffffa00b5020: e8 24 7b f7 e0 3d 00 08 00 00 75 28 be 1a 00 00
JIT code: ffffffffa00b5030: 00 e8 fe 7a f7 e0 24 00 3d 00 14 a8 c0 74 49 be
JIT code: ffffffffa00b5040: 1e 00 00 00 e8 eb 7a f7 e0 24 00 3d 00 14 a8 c0
JIT code: ffffffffa00b5050: 74 36 eb 3b 3d 06 08 00 00 74 07 3d 35 80 00 00
JIT code: ffffffffa00b5060: 75 2d be 1c 00 00 00 e8 c8 7a f7 e0 24 00 3d 00
JIT code: ffffffffa00b5070: 14 a8 c0 74 13 be 26 00 00 00 e8 b5 7a f7 e0 24
JIT code: ffffffffa00b5080: 00 3d 00 14 a8 c0 75 07 b8 ff ff 00 00 eb 02 31
JIT code: ffffffffa00b5090: c0 c9 c3

BPF program is 144 bytes long, so native program is almost same size ;)

(000) ldh      [12]
(001) jeq      #0x800           jt 2    jf 8
(002) ld       [26]
(003) and      #0xffffff00
(004) jeq      #0xc0a81400      jt 16   jf 5
(005) ld       [30]
(006) and      #0xffffff00
(007) jeq      #0xc0a81400      jt 16   jf 17
(008) jeq      #0x806           jt 10   jf 9
(009) jeq      #0x8035          jt 10   jf 17
(010) ld       [28]
(011) and      #0xffffff00
(012) jeq      #0xc0a81400      jt 16   jf 13
(013) ld       [38]
(014) and      #0xffffff00
(015) jeq      #0xc0a81400      jt 16   jf 17
(016) ret      #65535
(017) ret      #0

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@infradead.org&gt;
Cc: Ben Hutchings &lt;bhutchings@solarflare.com&gt;
Cc: Hagen Paul Pfeifer &lt;hagen@jauu.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>rps: add __rcu annotations</title>
<updated>2010-10-25T21:18:27Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2010-10-25T03:02:02Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=6e3f7faf3e8a3e226b1a6b955aac12abf2f2e1b6'/>
<id>urn:sha1:6e3f7faf3e8a3e226b1a6b955aac12abf2f2e1b6</id>
<content type='text'>
Add __rcu annotations to :
	(struct netdev_rx_queue)-&gt;rps_map
	(struct netdev_rx_queue)-&gt;rps_flow_table
	struct rps_sock_flow_table *rps_sock_flow_table;

And use appropriate rcu primitives.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Consistent skb timestamping</title>
<updated>2010-05-16T06:57:10Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2010-05-16T06:57:10Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3b098e2d7c693796cc4dffb07caa249fc0f70771'/>
<id>urn:sha1:3b098e2d7c693796cc4dffb07caa249fc0f70771</id>
<content type='text'>
With RPS inclusion, skb timestamping is not consistent in RX path.

If netif_receive_skb() is used, its deferred after RPS dispatch.

If netif_rx() is used, its done before RPS dispatch.

This can give strange tcpdump timestamps results.

I think timestamping should be done as soon as possible in the receive
path, to get meaningful values (ie timestamps taken at the time packet
was delivered by NIC driver to our stack), even if NAPI already can
defer timestamping a bit (RPS can help to reduce the gap)

Tom Herbert prefer to sample timestamps after RPS dispatch. In case
sampling is expensive (HPET/acpi_pm on x86), this makes sense.

Let admins switch from one mode to another, using a new
sysctl, /proc/sys/net/core/netdev_tstamp_prequeue

Its default value (1), means timestamps are taken as soon as possible,
before backlog queueing, giving accurate timestamps.

Setting a 0 value permits to sample timestamps when processing backlog,
after RPS dispatch, to lower the load of the pre-RPS cpu.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>rfs: Receive Flow Steering</title>
<updated>2010-04-16T23:01:27Z</updated>
<author>
<name>Tom Herbert</name>
<email>therbert@google.com</email>
</author>
<published>2010-04-16T23:01:27Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=fec5e652e58fa6017b2c9e06466cb2a6538de5b4'/>
<id>urn:sha1:fec5e652e58fa6017b2c9e06466cb2a6538de5b4</id>
<content type='text'>
This patch implements receive flow steering (RFS).  RFS steers
received packets for layer 3 and 4 processing to the CPU where
the application for the corresponding flow is running.  RFS is an
extension of Receive Packet Steering (RPS).

The basic idea of RFS is that when an application calls recvmsg
(or sendmsg) the application's running CPU is stored in a hash
table that is indexed by the connection's rxhash which is stored in
the socket structure.  The rxhash is passed in skb's received on
the connection from netif_receive_skb.  For each received packet,
the associated rxhash is used to look up the CPU in the hash table,
if a valid CPU is set then the packet is steered to that CPU using
the RPS mechanisms.

The convolution of the simple approach is that it would potentially
allow OOO packets.  If threads are thrashing around CPUs or multiple
threads are trying to read from the same sockets, a quickly changing
CPU value in the hash table could cause rampant OOO packets--
we consider this a non-starter.

To avoid OOO packets, this solution implements two types of hash
tables: rps_sock_flow_table and rps_dev_flow_table.

rps_sock_table is a global hash table.  Each entry is just a CPU
number and it is populated in recvmsg and sendmsg as described above.
This table contains the "desired" CPUs for flows.

rps_dev_flow_table is specific to each device queue.  Each entry
contains a CPU and a tail queue counter.  The CPU is the "current"
CPU for a matching flow.  The tail queue counter holds the value
of a tail queue counter for the associated CPU's backlog queue at
the time of last enqueue for a flow matching the entry.

Each backlog queue has a queue head counter which is incremented
on dequeue, and so a queue tail counter is computed as queue head
count + queue length.  When a packet is enqueued on a backlog queue,
the current value of the queue tail counter is saved in the hash
entry of the rps_dev_flow_table.

And now the trick: when selecting the CPU for RPS (get_rps_cpu)
the rps_sock_flow table and the rps_dev_flow table for the RX queue
are consulted.  When the desired CPU for the flow (found in the
rps_sock_flow table) does not match the current CPU (found in the
rps_dev_flow table), the current CPU is changed to the desired CPU
if one of the following is true:

- The current CPU is unset (equal to RPS_NO_CPU)
- Current CPU is offline
- The current CPU's queue head counter &gt;= queue tail counter in the
rps_dev_flow table.  This checks if the queue tail has advanced
beyond the last packet that was enqueued using this table entry.
This guarantees that all packets queued using this entry have been
dequeued, thus preserving in order delivery.

Making each queue have its own rps_dev_flow table has two advantages:
1) the tail queue counters will be written on each receive, so
keeping the table local to interrupting CPU s good for locality.  2)
this allows lockless access to the table-- the CPU number and queue
tail counter need to be accessed together under mutual exclusion
from netif_receive_skb, we assume that this is only called from
device napi_poll which is non-reentrant.

This patch implements RFS for TCP and connected UDP sockets.
It should be usable for other flow oriented protocols.

There are two configuration parameters for RFS.  The
"rps_flow_entries" kernel init parameter sets the number of
entries in the rps_sock_flow_table, the per rxqueue sysfs entry
"rps_flow_cnt" contains the number of entries in the rps_dev_flow
table for the rxqueue.  Both are rounded to power of two.

The obvious benefit of RFS (over just RPS) is that it achieves
CPU locality between the receive processing for a flow and the
applications processing; this can result in increased performance
(higher pps, lower latency).

The benefits of RFS are dependent on cache hierarchy, application
load, and other factors.  On simple benchmarks, we don't necessarily
see improvement and sometimes see degradation.  However, for more
complex benchmarks and for applications where cache pressure is
much higher this technique seems to perform very well.

Below are some benchmark results which show the potential benfit of
this patch.  The netperf test has 500 instances of netperf TCP_RR
test with 1 byte req. and resp.  The RPC test is an request/response
test similar in structure to netperf RR test ith 100 threads on
each host, but does more work in userspace that netperf.

e1000e on 8 core Intel
   No RFS or RPS		104K tps at 30% CPU
   No RFS (best RPS config):    290K tps at 63% CPU
   RFS				303K tps at 61% CPU

RPC test	tps	CPU%	50/90/99% usec latency	Latency StdDev
  No RFS/RPS	103K	48%	757/900/3185		4472.35
  RPS only:	174K	73%	415/993/2468		491.66
  RFS		223K	73%	379/651/1382		315.61

Signed-off-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h</title>
<updated>2010-03-30T13:02:32Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2010-03-24T08:04:11Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=5a0e3ad6af8660be21ca98a971cd00f331318c05'/>
<id>urn:sha1:5a0e3ad6af8660be21ca98a971cd00f331318c05</id>
<content type='text'>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -&gt; slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Guess-its-ok-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Lee Schermerhorn &lt;Lee.Schermerhorn@hp.com&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6</title>
<updated>2009-12-08T15:55:01Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-12-08T15:55:01Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d7fc02c7bae7b1cf69269992cf880a43a350cdaa'/>
<id>urn:sha1:d7fc02c7bae7b1cf69269992cf880a43a350cdaa</id>
<content type='text'>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
  mac80211: fix reorder buffer release
  iwmc3200wifi: Enable wimax core through module parameter
  iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
  iwmc3200wifi: Coex table command does not expect a response
  iwmc3200wifi: Update wiwi priority table
  iwlwifi: driver version track kernel version
  iwlwifi: indicate uCode type when fail dump error/event log
  iwl3945: remove duplicated event logging code
  b43: fix two warnings
  ipw2100: fix rebooting hang with driver loaded
  cfg80211: indent regulatory messages with spaces
  iwmc3200wifi: fix NULL pointer dereference in pmkid update
  mac80211: Fix TX status reporting for injected data frames
  ath9k: enable 2GHz band only if the device supports it
  airo: Fix integer overflow warning
  rt2x00: Fix padding bug on L2PAD devices.
  WE: Fix set events not propagated
  b43legacy: avoid PPC fault during resume
  b43: avoid PPC fault during resume
  tcp: fix a timewait refcnt race
  ...

Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
CTL_UNNUMBERED removed) in
	kernel/sysctl_check.c
	net/ipv4/sysctl_net_ipv4.c
	net/ipv6/addrconf.c
	net/sctp/sysctl.c
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6</title>
<updated>2009-12-08T15:38:50Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-12-08T15:38:50Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=1557d33007f63dd96e5d15f33af389378e5f2e54'/>
<id>urn:sha1:1557d33007f63dd96e5d15f33af389378e5f2e54</id>
<content type='text'>
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
  security/tomoyo: Remove now unnecessary handling of security_sysctl.
  security/tomoyo: Add a special case to handle accesses through the internal proc mount.
  sysctl: Drop &amp; in front of every proc_handler.
  sysctl: Remove CTL_NONE and CTL_UNNUMBERED
  sysctl: kill dead ctl_handler definitions.
  sysctl: Remove the last of the generic binary sysctl support
  sysctl net: Remove unused binary sysctl code
  sysctl security/tomoyo: Don't look at ctl_name
  sysctl arm: Remove binary sysctl support
  sysctl x86: Remove dead binary sysctl support
  sysctl sh: Remove dead binary sysctl support
  sysctl powerpc: Remove dead binary sysctl support
  sysctl ia64: Remove dead binary sysctl support
  sysctl s390: Remove dead sysctl binary support
  sysctl frv: Remove dead binary sysctl support
  sysctl mips/lasat: Remove dead binary sysctl support
  sysctl drivers: Remove dead binary sysctl support
  sysctl crypto: Remove dead binary sysctl support
  sysctl security/keys: Remove dead binary sysctl support
  sysctl kernel: Remove binary sysctl logic
  ...
</content>
</entry>
<entry>
<title>Merge branch 'master' of /home/davem/src/GIT/linux-2.6/</title>
<updated>2009-12-05T23:22:26Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2009-12-05T23:22:26Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=28b4d5cc17c20786848cdc07b7ea237a309776bb'/>
<id>urn:sha1:28b4d5cc17c20786848cdc07b7ea237a309776bb</id>
<content type='text'>
Conflicts:
	drivers/net/pcmcia/fmvj18x_cs.c
	drivers/net/pcmcia/nmclan_cs.c
	drivers/net/pcmcia/xirc2ps_cs.c
	drivers/net/wireless/ray_cs.c
</content>
</entry>
<entry>
<title>net: use net_eq to compare nets</title>
<updated>2009-11-25T23:14:13Z</updated>
<author>
<name>Octavian Purdila</name>
<email>opurdila@ixiacom.com</email>
</author>
<published>2009-11-25T23:14:13Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=09ad9bc752519cc167d0a573e1acf69b5c707c67'/>
<id>urn:sha1:09ad9bc752519cc167d0a573e1acf69b5c707c67</id>
<content type='text'>
Generated with the following semantic patch

@@
struct net *n1;
struct net *n2;
@@
- n1 == n2
+ net_eq(n1, n2)

@@
struct net *n1;
struct net *n2;
@@
- n1 != n2
+ !net_eq(n1, n2)

applied over {include,net,drivers/net}.

Signed-off-by: Octavian Purdila &lt;opurdila@ixiacom.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
