<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/proc/proc_sysctl.c, branch v3.0.56</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/fs/proc/proc_sysctl.c?h=v3.0.56</id>
<link rel='self' href='https://git.amat.us/linux/atom/fs/proc/proc_sysctl.c?h=v3.0.56'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2011-06-20T14:45:25Z</updated>
<entry>
<title>proc_sys_permission() is OK in RCU mode</title>
<updated>2011-06-20T14:45:25Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2011-06-19T00:42:00Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=1aec7036d0c2996c86ce483ca0a28f3b20807b43'/>
<id>urn:sha1:1aec7036d0c2996c86ce483ca0a28f3b20807b43</id>
<content type='text'>
nothing blocking there, since all instances of sysctl
-&gt;permissions() method are non-blocking - both of them,
that is.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'next' into for-linus</title>
<updated>2011-03-15T22:41:17Z</updated>
<author>
<name>James Morris</name>
<email>jmorris@namei.org</email>
</author>
<published>2011-03-15T22:41:17Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a002951c97ff8da49938c982a4c236bf2fafdc9f'/>
<id>urn:sha1:a002951c97ff8da49938c982a4c236bf2fafdc9f</id>
<content type='text'>
</content>
</entry>
<entry>
<title>unfuck proc_sysctl -&gt;d_compare()</title>
<updated>2011-03-08T07:22:27Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2011-03-08T06:25:28Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=dfef6dcd35cb4a251f6322ca9b2c06f0bb1aa1f4'/>
<id>urn:sha1:dfef6dcd35cb4a251f6322ca9b2c06f0bb1aa1f4</id>
<content type='text'>
a) struct inode is not going to be freed under -&gt;d_compare();
however, the thing PROC_I(inode)-&gt;sysctl points to just might.
Fortunately, it's enough to make freeing that sucker delayed,
provided that we don't step on its -&gt;unregistering, clear
the pointer to it in PROC_I(inode) before dropping the reference
and check if it's NULL in -&gt;d_compare().

b) I'm not sure that we *can* walk into NULL inode here (we recheck
dentry-&gt;seq between verifying that it's still hashed / fetching
dentry-&gt;d_inode and passing it to -&gt;d_compare() and there's no
negative hashed dentries in /proc/sys/*), but if we can walk into
that, we really should not have -&gt;d_compare() return 0 on it!
Said that, I really suspect that this check can be simply killed.
Nick?

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>security/selinux: fix /proc/sys/ labeling</title>
<updated>2011-02-01T16:53:54Z</updated>
<author>
<name>Lucian Adrian Grijincu</name>
<email>lucian.grijincu@gmail.com</email>
</author>
<published>2011-02-01T16:42:22Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=8e6c96935fcc1ed3dbebc96fddfef3f2f2395afc'/>
<id>urn:sha1:8e6c96935fcc1ed3dbebc96fddfef3f2f2395afc</id>
<content type='text'>
This fixes an old (2007) selinux regression: filesystem labeling for
/proc/sys returned
     -r--r--r-- unknown                          /proc/sys/fs/file-nr
instead of
     -r--r--r-- system_u:object_r:sysctl_fs_t:s0 /proc/sys/fs/file-nr

Events that lead to breaking of /proc/sys/ selinux labeling:

1) sysctl was reimplemented to route all calls through /proc/sys/

    commit 77b14db502cb85a031fe8fde6c85d52f3e0acb63
    [PATCH] sysctl: reimplement the sysctl proc support

2) proc_dir_entry was removed from ctl_table:

    commit 3fbfa98112fc3962c416452a0baf2214381030e6
    [PATCH] sysctl: remove the proc_dir_entry member for the sysctl tables

3) selinux still walked the proc_dir_entry tree to apply
   labeling. Because ctl_tables don't have a proc_dir_entry, we did
   not label /proc/sys/ inodes any more. To achieve this the /proc/sys/
   inodes were marked private and private inodes were ignored by
   selinux.

    commit bbaca6c2e7ef0f663bc31be4dad7cf530f6c4962
    [PATCH] selinux: enhance selinux to always ignore private inodes

    commit 86a71dbd3e81e8870d0f0e56b87875f57e58222b
    [PATCH] sysctl: hide the sysctl proc inodes from selinux

Access control checks have been done by means of a special sysctl hook
that was called for read/write accesses to any /proc/sys/ entry.

We don't have to do this because, instead of walking the
proc_dir_entry tree we can walk the dentry tree (as done in this
patch). With this patch:
* we don't mark /proc/sys/ inodes as private
* we don't need the sysclt security hook
* we walk the dentry tree to find the path to the inode.

We have to strip the PID in /proc/PID/ entries that have a
proc_dir_entry because selinux does not know how to label paths like
'/1/net/rpc/nfsd.fh' (and defaults to 'proc_t' labeling). Selinux does
know of '/net/rpc/nfsd.fh' (and applies the 'sysctl_rpc_t' label).

PID stripping from the path was done implicitly in the previous code
because the proc_dir_entry tree had the root in '/net' in the example
from above. The dentry tree has the root in '/1'.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Lucian Adrian Grijincu &lt;lucian.grijincu@gmail.com&gt;
Signed-off-by: Eric Paris &lt;eparis@redhat.com&gt;
</content>
</entry>
<entry>
<title>fs: provide rcu-walk aware permission i_ops</title>
<updated>2011-01-07T06:50:29Z</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@kernel.dk</email>
</author>
<published>2011-01-07T06:49:58Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=b74c79e99389cd79b31fcc08f82c24e492e63c7e'/>
<id>urn:sha1:b74c79e99389cd79b31fcc08f82c24e492e63c7e</id>
<content type='text'>
Signed-off-by: Nick Piggin &lt;npiggin@kernel.dk&gt;
</content>
</entry>
<entry>
<title>fs: rcu-walk aware d_revalidate method</title>
<updated>2011-01-07T06:50:29Z</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@kernel.dk</email>
</author>
<published>2011-01-07T06:49:57Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=34286d6662308d82aed891852d04c7c3a2649b16'/>
<id>urn:sha1:34286d6662308d82aed891852d04c7c3a2649b16</id>
<content type='text'>
Require filesystems be aware of .d_revalidate being called in rcu-walk
mode (nd-&gt;flags &amp; LOOKUP_RCU). For now do a simple push down, returning
-ECHILD from all implementations.

Signed-off-by: Nick Piggin &lt;npiggin@kernel.dk&gt;
</content>
</entry>
<entry>
<title>fs: dcache reduce branches in lookup path</title>
<updated>2011-01-07T06:50:28Z</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@kernel.dk</email>
</author>
<published>2011-01-07T06:49:55Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=fb045adb99d9b7c562dc7fef834857f78249daa1'/>
<id>urn:sha1:fb045adb99d9b7c562dc7fef834857f78249daa1</id>
<content type='text'>
Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry-&gt;d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.&gt;]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)-&gt;d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&amp;\1, \2);/' -i

Signed-off-by: Nick Piggin &lt;npiggin@kernel.dk&gt;
</content>
</entry>
<entry>
<title>fs: rcu-walk for path lookup</title>
<updated>2011-01-07T06:50:27Z</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@kernel.dk</email>
</author>
<published>2011-01-07T06:49:52Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=31e6b01f4183ff419a6d1f86177cbf4662347cec'/>
<id>urn:sha1:31e6b01f4183ff419a6d1f86177cbf4662347cec</id>
<content type='text'>
Perform common cases of path lookups without any stores or locking in the
ancestor dentry elements. This is called rcu-walk, as opposed to the current
algorithm which is a refcount based walk, or ref-walk.

This results in far fewer atomic operations on every path element,
significantly improving path lookup performance. It also avoids cacheline
bouncing on common dentries, significantly improving scalability.

The overall design is like this:
* LOOKUP_RCU is set in nd-&gt;flags, which distinguishes rcu-walk from ref-walk.
* Take the RCU lock for the entire path walk, starting with the acquiring
  of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are
  not required for dentry persistence.
* synchronize_rcu is called when unregistering a filesystem, so we can
  access d_ops and i_ops during rcu-walk.
* Similarly take the vfsmount lock for the entire path walk. So now mnt
  refcounts are not required for persistence. Also we are free to perform mount
  lookups, and to assume dentry mount points and mount roots are stable up and
  down the path.
* Have a per-dentry seqlock to protect the dentry name, parent, and inode,
  so we can load this tuple atomically, and also check whether any of its
  members have changed.
* Dentry lookups (based on parent, candidate string tuple) recheck the parent
  sequence after the child is found in case anything changed in the parent
  during the path walk.
* inode is also RCU protected so we can load d_inode and use the inode for
  limited things.
* i_mode, i_uid, i_gid can be tested for exec permissions during path walk.
* i_op can be loaded.

When we reach the destination dentry, we lock it, recheck lookup sequence,
and increment its refcount and mountpoint refcount. RCU and vfsmount locks
are dropped. This is termed "dropping rcu-walk". If the dentry refcount does
not match, we can not drop rcu-walk gracefully at the current point in the
lokup, so instead return -ECHILD (for want of a better errno). This signals the
path walking code to re-do the entire lookup with a ref-walk.

Aside from the final dentry, there are other situations that may be encounted
where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take
a reference on the last good dentry) and continue with a ref-walk. Again, if
we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup
using ref-walk. But it is very important that we can continue with ref-walk
for most cases, particularly to avoid the overhead of double lookups, and to
gain the scalability advantages on common path elements (like cwd and root).

The cases where rcu-walk cannot continue are:
* NULL dentry (ie. any uncached path element)
* parent with d_inode-&gt;i_op-&gt;permission or ACLs
* dentries with d_revalidate
* Following links

In future patches, permission checks and d_revalidate become rcu-walk aware. It
may be possible eventually to make following links rcu-walk aware.

Uncached path elements will always require dropping to ref-walk mode, at the
very least because i_mutex needs to be grabbed, and objects allocated.

Signed-off-by: Nick Piggin &lt;npiggin@kernel.dk&gt;
</content>
</entry>
<entry>
<title>fs: change d_compare for rcu-walk</title>
<updated>2011-01-07T06:50:19Z</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@kernel.dk</email>
</author>
<published>2011-01-07T06:49:27Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=621e155a3591962420eacdd39f6f0aa29ceb221e'/>
<id>urn:sha1:621e155a3591962420eacdd39f6f0aa29ceb221e</id>
<content type='text'>
Change d_compare so it may be called from lock-free RCU lookups. This
does put significant restrictions on what may be done from the callback,
however there don't seem to have been any problems with in-tree fses.
If some strange use case pops up that _really_ cannot cope with the
rcu-walk rules, we can just add new rcu-unaware callbacks, which would
cause name lookup to drop out of rcu-walk mode.

For in-tree filesystems, this is just a mechanical change.

Signed-off-by: Nick Piggin &lt;npiggin@kernel.dk&gt;
</content>
</entry>
<entry>
<title>fs: change d_delete semantics</title>
<updated>2011-01-07T06:50:18Z</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@kernel.dk</email>
</author>
<published>2011-01-07T06:49:23Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=fe15ce446beb3a33583af81ffe6c9d01a75314ed'/>
<id>urn:sha1:fe15ce446beb3a33583af81ffe6c9d01a75314ed</id>
<content type='text'>
Change d_delete from a dentry deletion notification to a dentry caching
advise, more like -&gt;drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.

Signed-off-by: Nick Piggin &lt;npiggin@kernel.dk&gt;
</content>
</entry>
</feed>
