<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net/netns, branch v2.6.33.10</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/include/net/netns?h=v2.6.33.10</id>
<link rel='self' href='https://git.amat.us/linux/atom/include/net/netns?h=v2.6.33.10'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2010-02-08T19:18:07Z</updated>
<entry>
<title>netfilter: nf_conntrack: fix hash resizing with namespaces</title>
<updated>2010-02-08T19:18:07Z</updated>
<author>
<name>Patrick McHardy</name>
<email>kaber@trash.net</email>
</author>
<published>2010-02-08T19:18:07Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d696c7bdaa55e2208e56c6f98e6bc1599f34286d'/>
<id>urn:sha1:d696c7bdaa55e2208e56c6f98e6bc1599f34286d</id>
<content type='text'>
As noticed by Jon Masters &lt;jonathan@jonmasters.org&gt;, the conntrack hash
size is global and not per namespace, but modifiable at runtime through
/sys/module/nf_conntrack/hashsize. Changing the hash size will only
resize the hash in the current namespace however, so other namespaces
will use an invalid hash size. This can cause crashes when enlarging
the hashsize, or false negative lookups when shrinking it.

Move the hash size into the per-namespace data and only use the global
hash size to initialize the per-namespace value when instanciating a
new namespace. Additionally restrict hash resizing to init_net for
now as other namespaces are not handled currently.

Cc: stable@kernel.org
Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_conntrack: per netns nf_conntrack_cachep</title>
<updated>2010-02-08T19:16:56Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2010-02-08T19:16:56Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=5b3501faa8741d50617ce4191c20061c6ef36cb3'/>
<id>urn:sha1:5b3501faa8741d50617ce4191c20061c6ef36cb3</id>
<content type='text'>
nf_conntrack_cachep is currently shared by all netns instances, but
because of SLAB_DESTROY_BY_RCU special semantics, this is wrong.

If we use a shared slab cache, one object can instantly flight between
one hash table (netns ONE) to another one (netns TWO), and concurrent
reader (doing a lookup in netns ONE, 'finding' an object of netns TWO)
can be fooled without notice, because no RCU grace period has to be
observed between object freeing and its reuse.

We dont have this problem with UDP/TCP slab caches because TCP/UDP
hashtables are global to the machine (and each object has a pointer to
its netns).

If we use per netns conntrack hash tables, we also *must* use per netns
conntrack slab caches, to guarantee an object can not escape from one
namespace to another one.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
[Patrick: added unique slab name allocation]
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
</content>
</entry>
<entry>
<title>netns xfrm: deal with dst entries in netns</title>
<updated>2010-01-25T06:47:53Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2010-01-25T06:47:53Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d7c7544c3d5f59033d1bf3236bc7b289f5f26b75'/>
<id>urn:sha1:d7c7544c3d5f59033d1bf3236bc7b289f5f26b75</id>
<content type='text'>
GC is non-existent in netns, so after you hit GC threshold, no new
dst entries will be created until someone triggers cleanup in init_net.

Make xfrm4_dst_ops and xfrm6_dst_ops per-netns.
This is not done in a generic way, because it woule waste
(AF_MAX - 2) * sizeof(struct dst_ops) bytes per-netns.

Reorder GC threshold initialization so it'd be done before registering
XFRM policies.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Allow xfrm_user_net_exit to batch efficiently.</title>
<updated>2009-12-03T20:22:03Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2009-12-03T02:29:05Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d79d792ef9f99cca463b6619a93e860d1c833a6e'/>
<id>urn:sha1:d79d792ef9f99cca463b6619a93e860d1c833a6e</id>
<content type='text'>
xfrm.nlsk is provided by the xfrm_user module and is access via rcu from
other parts of the xfrm code.  Add xfrm.nlsk_stash a copy of xfrm.nlsk that
will never be set to NULL.  This allows the synchronize_net and
netlink_kernel_release to be deferred until a whole batch of xfrm.nlsk sockets
have been set to NULL.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: remove [un]register_pernet_gen_... and update the docs.</title>
<updated>2009-12-02T00:16:00Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2009-11-29T15:46:17Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=65c0cfafce9575319fb6f70080fbe226e5617e3b'/>
<id>urn:sha1:65c0cfafce9575319fb6f70080fbe226e5617e3b</id>
<content type='text'>
No that all of the callers have been updated to set fields in
struct pernet_operations, and simplified to let the network
namespace core handle the allocation and freeing of the storage
for them, remove the surpurpflous methods and update the docs
to the new style.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netns: embed ip6_dst_ops directly</title>
<updated>2009-09-02T00:40:31Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2009-08-29T01:34:49Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=86393e52c3f1e2f6be18383f6ecdbcdc5727d545'/>
<id>urn:sha1:86393e52c3f1e2f6be18383f6ecdbcdc5727d545</id>
<content type='text'>
struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated,
but there is no fundamental reason for it. Embed it directly into
struct netns_ipv6.

For that:
* move struct dst_ops into separate header to fix circular dependencies
	I honestly tried not to, it's pretty impossible to do other way
* drop dynamical allocation, allocate together with netns

For a change, remove struct dst_ops::dst_net, it's deducible
by using container_of() given dst_ops pointer.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net, netns_xt: shrink netns_xt members</title>
<updated>2009-07-06T02:16:18Z</updated>
<author>
<name>Cyrill Gorcunov</name>
<email>gorcunov@openvz.org</email>
</author>
<published>2009-07-03T20:11:58Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=e04af024b2e74249990587e76ec98220028c01c3'/>
<id>urn:sha1:e04af024b2e74249990587e76ec98220028c01c3</id>
<content type='text'>
In case if kernel was compiled without ebtables support
there is no need to keep ebt_table pointers in netns_xt
structure.

Make it config dependent.

Signed-off-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netfilter: conntrack: optional reliable conntrack event delivery</title>
<updated>2009-06-13T10:30:52Z</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2009-06-13T10:30:52Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=dd7669a92c6066b2b31bae7e04cd787092920883'/>
<id>urn:sha1:dd7669a92c6066b2b31bae7e04cd787092920883</id>
<content type='text'>
This patch improves ctnetlink event reliability if one broadcast
listener has set the NETLINK_BROADCAST_ERROR socket option.

The logic is the following: if an event delivery fails, we keep
the undelivered events in the missed event cache. Once the next
packet arrives, we add the new events (if any) to the missed
events in the cache and we try a new delivery, and so on. Thus,
if ctnetlink fails to deliver an event, we try to deliver them
once we see a new packet. Therefore, we may lose state
transitions but the userspace process gets in sync at some point.

At worst case, if no events were delivered to userspace, we make
sure that destroy events are successfully delivered. Basically,
if ctnetlink fails to deliver the destroy event, we remove the
conntrack entry from the hashes and we insert them in the dying
list, which contains inactive entries. Then, the conntrack timer
is added with an extra grace timeout of random32() % 15 seconds
to trigger the event again (this grace timeout is tunable via
/proc). The use of a limited random timeout value allows
distributing the "destroy" resends, thus, avoiding accumulating
lots "destroy" events at the same time. Event delivery may
re-order but we can identify them by means of the tuple plus
the conntrack ID.

The maximum number of conntrack entries (active or inactive) is
still handled by nf_conntrack_max. Thus, we may start dropping
packets at some point if we accumulate a lot of inactive conntrack
entries that did not successfully report the destroy event to
userspace.

During my stress tests consisting of setting a very small buffer
of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
flag, and generating lots of very small connections, I noticed
very few destroy entries on the fly waiting to be resend.

A simple way to test this patch consist of creating a lot of
entries, set a very small Netlink buffer in conntrackd (+ a patch
which is not in the git tree to set the BROADCAST_ERROR flag)
and invoke `conntrack -F'.

For expectations, no changes are introduced in this patch.
Currently, event delivery is only done for new expectations (no
events from expectation expiration, removal and confirmation).
In that case, they need a per-expectation event cache to implement
the same idea that is exposed in this patch.

This patch can be useful to provide reliable flow-accouting. We
still have to add a new conntrack extension to store the creation
and destroy time.

Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
</content>
</entry>
<entry>
<title>netfilter: conntrack: move event caching to conntrack extension infrastructure</title>
<updated>2009-06-13T10:26:29Z</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2009-06-13T10:26:29Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a0891aa6a635f658f29bb061a00d6d3486941519'/>
<id>urn:sha1:a0891aa6a635f658f29bb061a00d6d3486941519</id>
<content type='text'>
This patch reworks the per-cpu event caching to use the conntrack
extension infrastructure.

The main drawback is that we consume more memory per conntrack
if event delivery is enabled. This patch is required by the
reliable event delivery that follows to this patch.

BTW, this patch allows you to enable/disable event delivery via
/proc/sys/net/netfilter/nf_conntrack_events in runtime, although
you can still disable event caching as compilation option.

Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu()</title>
<updated>2009-03-25T20:05:46Z</updated>
<author>
<name>Eric Dumazet</name>
<email>dada1@cosmosbay.com</email>
</author>
<published>2009-03-25T20:05:46Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=ea781f197d6a835cbb93a0bf88ee1696296ed8aa'/>
<id>urn:sha1:ea781f197d6a835cbb93a0bf88ee1696296ed8aa</id>
<content type='text'>
Use "hlist_nulls" infrastructure we added in 2.6.29 for RCUification of UDP &amp; TCP.

This permits an easy conversion from call_rcu() based hash lists to a
SLAB_DESTROY_BY_RCU one.

Avoiding call_rcu() delay at nf_conn freeing time has numerous gains.

First, it doesnt fill RCU queues (up to 10000 elements per cpu).
This reduces OOM possibility, if queued elements are not taken into account
This reduces latency problems when RCU queue size hits hilimit and triggers
emergency mode.

- It allows fast reuse of just freed elements, permitting better use of
CPU cache.

- We delete rcu_head from "struct nf_conn", shrinking size of this structure
by 8 or 16 bytes.

This patch only takes care of "struct nf_conn".
call_rcu() is still used for less critical conntrack parts, that may
be converted later if necessary.

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
</content>
</entry>
</feed>
