linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2013-01-03	exec: do not leave bprm->interp on stack	Kees Cook
	commit b66c5984017533316fd1951770302649baf1aa33 upstream. If a series of scripts are executed, each triggering module loading via unprintable bytes in the script header, kernel stack contents can leak into the command line. Normally execution of binfmt_script and binfmt_misc happens recursively. However, when modules are enabled, and unprintable bytes exist in the bprm->buf, execution will restart after attempting to load matching binfmt modules. Unfortunately, the logic in binfmt_script and binfmt_misc does not expect to get restarted. They leave bprm->interp pointing to their local stack. This means on restart bprm->interp is left pointing into unused stack memory which can then be copied into the userspace argv areas. After additional study, it seems that both recursion and restart remains the desirable way to handle exec with scripts, misc, and modules. As such, we need to protect the changes to interp. This changes the logic to require allocation for any changes to the bprm->interp. To avoid adding a new kmalloc to every exec, the default value is left as-is. Only when passing through binfmt_script or binfmt_misc does an allocation take place. For a proof of concept, see DoTest.sh from: http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ Signed-off-by: Kees Cook <keescook@chromium.org> Cc: halfdog <me@halfdog.net> Cc: P J P <ppandit@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-03	vfs: d_obtain_alias() needs to use "/" as default name.	NeilBrown
	commit b911a6bdeef5848c468597d040e3407e0aee04ce upstream. NFS appears to use d_obtain_alias() to create the root dentry rather than d_make_root. This can cause 'prepend_path()' to complain that the root has a weird name if an NFS filesystem is lazily unmounted. e.g. if "/mnt" is an NFS mount then { cd /mnt; umount -l /mnt ; ls -l /proc/self/cwd; } will cause a WARN message like WARNING: at /home/git/linux/fs/dcache.c:2624 prepend_path+0x1d7/0x1e0() ... Root dentry has weird name <> to appear in kernel logs. So change d_obtain_alias() to use "/" rather than "" as the anonymous name. Signed-off-by: NeilBrown <neilb@suse.de> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [bwh: Backported to 3.2: use named initialisers instead of QSTR_INIT()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-03	nfsd4: fix oops on unusual readlike compound	J. Bruce Fields
	commit d5f50b0c290431c65377c4afa1c764e2c3fe5305 upstream. If the argument and reply together exceed the maximum payload size, then a reply with a read-like operation can overlow the rq_pages array. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-03	proc: pid/status: show all supplementary groups	Artem Bityutskiy
	commit 8d238027b87e654be552eabdf492042a34c5c300 upstream. We display a list of supplementary group for each process in /proc/<pid>/status. However, we show only the first 32 groups, not all of them. Although this is rare, but sometimes processes do have more than 32 supplementary groups, and this kernel limitation breaks user-space apps that rely on the group list in /proc/<pid>/status. Number 32 comes from the internal NGROUPS_SMALL macro which defines the length for the internal kernel "small" groups buffer. There is no apparent reason to limit to this value. This patch removes the 32 groups printing limit. The Linux kernel limits the amount of supplementary groups by NGROUPS_MAX, which is currently set to 65536. And this is the maximum count of groups we may possibly print. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-03	NFS: avoid NULL dereference in nfs_destroy_server	NeilBrown
	commit f259613a1e4b44a0cf85a5dafd931be96ee7c9e5 upstream. In rare circumstances, nfs_clone_server() of a v2 or v3 server can get an error between setting server->destory (to nfs_destroy_server), and calling nfs_start_lockd (which will set server->nlm_host). If this happens, nfs_clone_server will call nfs_free_server which will call nfs_destroy_server and thence nlmclnt_done(NULL). This causes the NULL to be dereferenced. So add a guard to only call nlmclnt_done() if ->nlm_host is not NULL. The other guards there are irrelevant as nlm_host can only be non-NULL if one of these flags are set - so remove those tests. (Thanks to Trond for this suggestion). This is suitable for any stable kernel since 2.6.25. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-03	nfsd: avoid permission checks on EXCLUSIVE_CREATE replay	Neil Brown
	commit 7007c90fb9fef593b4aeaeee57e6a6754276c97c upstream. With NFSv4, if we create a file then open it we explicit avoid checking the permissions on the file during the open because the fact that we created it ensures we should be allow to open it (the create and the open should appear to be a single operation). However if the reply to an EXCLUSIVE create gets lots and the client resends the create, the current code will perform the permission check - because it doesn't realise that it did the open already.. This patch should fix this. Note that I haven't actually seen this cause a problem. I was just looking at the code trying to figure out a different EXCLUSIVE open related issue, and this looked wrong. (Fix confirmed with pynfs 4.0 test OPEN4--bfields) Signed-off-by: NeilBrown <neilb@suse.de> [bfields: use OWNER_OVERRIDE and update for 4.1] Signed-off-by: J. Bruce Fields <bfields@redhat.com> [bwh: Backported to 3.2: - Adjust context - Use current_fh as file handle in do_open_lookup()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-03	nfsd: fix v4 reply caching	J. Bruce Fields
	commit 57d276d71aef7d8305ff002a070cb98deb2edced upstream. Very embarassing: 1091006c5eb15cba56785bd5b498a8d0b9546903 "nfsd: turn on reply cache for NFSv4" missed a line, effectively leaving the reply cache off in the v4 case. I thought I'd tested that, but I guess not. This time, wrote a pynfs test to confirm it works. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-03	NFS: Add sequence_priviliged_ops for nfs4_proc_sequence()	Bryan Schumaker
	commit 6bdb5f213c4344324f600dde885f25768fbd14db upstream. If I mount an NFS v4.1 server to a single client multiple times and then run xfstests over each mountpoint I usually get the client into a state where recovery deadlocks. The server informs the client of a cb_path_down sequence error, the client then does a bind_connection_to_session and checks the status of the lease. I found that bind_connection_to_session sets the NFS4_SESSION_DRAINING flag on the client, but this flag is never unset before nfs4_check_lease() reaches nfs4_proc_sequence(). This causes the client to deadlock, halting all NFS activity to the server. nfs4_proc_sequence() is only called by the state manager, so I can change it to run in privileged mode to bypass the NFS4_SESSION_DRAINING check and avoid the deadlock. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-03	ext4: init pagevec in ext4_da_block_invalidatepages	Eric Sandeen
	commit 66bea92c69477a75a5d37b9bfed5773c92a3c4b4 upstream. ext4_da_block_invalidatepages is missing a pagevec_init(), which means that pvec->cold contains random garbage. This affects whether the page goes to the front or back of the LRU when ->cold makes it to free_hot_cold_page() Reviewed-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-03	ext4: fix memory leak in ext4_xattr_set_acl()'s error path	Eugene Shatokhin
	commit 24ec19b0ae83a385ad9c55520716da671274b96c upstream. In ext4_xattr_set_acl(), if ext4_journal_start() returns an error, posix_acl_release() will not be called for 'acl' which may result in a memory leak. This patch fixes that. Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-03	nfs: fix wrong object type in lockowner_slab	Yanchuan Nian
	commit 3c40794b2dd0f355ef4e6bf8d85af5dcd7da7ece upstream. The object type in the cache of lockowner_slab is wrong, and it is better to fix it. Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-03	freezer: PF_FREEZER_NOSIG should be cleared along with PF_NOFREEZE	Oleg Nesterov
	This patch is only for pre-v3.3 stable trees which backported b40a7959 "freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD". v3.3+ doesn't need this fix. b40a7959 is the trivial bugfix, but unfortunately I forgot that until 34b087e4 "freezer: kill unused set_freezable_with_signal()" there were another only-for-kernel-threads flag, PF_FREEZER_NOSIG, which should be cleared as well. See https://bugs.launchpad.net/ubuntu/+source/v86d/+bug/1080530 The freezer fails because it expects that a PF_FREEZER_NOSIG task doesn't need a signal. Before b40a7959 it wrongly succeeds leaving the PF_NOFREEZE \| PF_FREEZER_NOSIG task unfrozen. Reported-and-tested-by: Joseph Salisbury <joseph.salisbury@canonical.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> [bwh: Don't touch PF_FORKNOEXEC; it's cleared elsewhere] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	jbd: Fix lock ordering bug in journal_unmap_buffer()	Jan Kara
	commit 25389bb207987b5774182f763b9fb65ff08761c8 upstream. Commit 09e05d48 introduced a wait for transaction commit into journal_unmap_buffer() in the case we are truncating a buffer undergoing commit in the page stradding i_size on a filesystem with blocksize < pagesize. Sadly we forgot to drop buffer lock before waiting for transaction commit and thus deadlock is possible when kjournald wants to lock the buffer. Fix the problem by dropping the buffer lock before waiting for transaction commit. Since we are still holding page lock (and that is OK), buffer cannot disappear under us. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	reiserfs: Move quota calls out of write lock	Jan Kara
	commit 7af11686933726e99af22901d622f9e161404e6b upstream. Calls into highlevel quota code cannot happen under the write lock. These calls take dqio_mutex which ranks above write lock. So drop write lock before calling back into quota code. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	reiserfs: Protect reiserfs_quota_write() with write lock	Jan Kara
	commit 361d94a338a3fd0cee6a4ea32bbc427ba228e628 upstream. Calls into reiserfs journalling code and reiserfs_get_block() need to be protected with write lock. We remove write lock around calls to high level quota code in the next patch so these paths would suddently become unprotected. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	reiserfs: Protect reiserfs_quota_on() with write lock	Jan Kara
	commit b9e06ef2e8706fe669b51f4364e3aeed58639eb2 upstream. In reiserfs_quota_on() we do quite some work - for example unpacking tail of a quota file. Thus we have to hold write lock until a moment we call back into the quota code. Signed-off-by: Jan Kara <jack@suse.cz> [bwh: Backported to 3.2: there is no distinction between USRQUOTA and GRPQUOTA mount options here] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	reiserfs: Fix lock ordering during remount	Jan Kara
	commit 3bb3e1fc47aca554e7e2cc4deeddc24750987ac2 upstream. When remounting reiserfs dquot_suspend() or dquot_resume() can be called. These functions take dqonoff_mutex which ranks above write lock so we have to drop it before calling into quota code. Signed-off-by: Jan Kara <jack@suse.cz> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	xfs: drop buffer io reference when a bad bio is built	Dave Chinner
	commit d69043c42d8c6414fa28ad18d99973aa6c1c2e24 upstream. Error handling in xfs_buf_ioapply_map() does not handle IO reference counts correctly. We increment the b_io_remaining count before building the bio, but then fail to decrement it in the failure case. This leads to the buffer never running IO completion and releasing the reference that the IO holds, so at unmount we can leak the buffer. This leak is captured by this assert failure during unmount: XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 273 This is not a new bug - the b_io_remaining accounting has had this problem for a long, long time - it's just very hard to get a zero length bio being built by this code... Further, the buffer IO error can be overwritten on a multi-segment buffer by subsequent bio completions for partial sections of the buffer. Hence we should only set the buffer error status if the buffer is not already carrying an error status. This ensures that a partial IO error on a multi-segment buffer will not be lost. This part of the problem is a regression, however. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	jffs2: Fix lock acquisition order bug in jffs2_write_begin	Thomas Betker
	commit 5ffd3412ae5536a4c57469cb8ea31887121dcb2e upstream. jffs2_write_begin() first acquires the page lock, then f->sem. This causes an AB-BA deadlock with jffs2_garbage_collect_live(), which first acquires f->sem, then the page lock: jffs2_garbage_collect_live mutex_lock(&f->sem) (A) jffs2_garbage_collect_dnode jffs2_gc_fetch_page read_cache_page_async do_read_cache_page lock_page(page) (B) jffs2_write_begin grab_cache_page_write_begin find_lock_page lock_page(page) (B) mutex_lock(&f->sem) (A) We fix this by restructuring jffs2_write_begin() to take f->sem before the page lock. However, we make sure that f->sem is not held when calling jffs2_reserve_space(), as this is not permitted by the locking rules. The deadlock above was observed multiple times on an SoC with a dual ARMv7 (Cortex-A9), running the long-term 3.4.11 kernel; it occurred when using scp to copy files from a host system to the ARM target system. The fix was heavily tested on the same target system. Signed-off-by: Thomas Betker <thomas.betker@rohde-schwarz.com> Acked-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> [bwh: Backported to 3.2: - Adjust context - Use D1(printk(KERN_DEBUG ...)) instead of jffs2_dbg(1, ...)] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	GFS2: Test bufdata with buffer locked and gfs2_log_lock held	Benjamin Marzinski
	commit 96e5d1d3adf56f1c7eeb07258f6a1a0a7ae9c489 upstream. In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the buffer without having the gfs2_log_lock held. It was then assuming it would stay attached for the rest of the function. However, without either the log lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any time. This patch moves the locking before the test. If there isn't a bd already attached, gfs2 can safely allocate one and attach it before locking. There is no way that the newly allocated bd could be on the ail list, and thus no way for __gfs2_ail_flush() to detach it. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	UBIFS: fix mounting problems after power cuts	Artem Bityutskiy
	commit a28ad42a4a0c6f302f488f26488b8b37c9b30024 upstream. This is a bugfix for a problem with the following symptoms: 1. A power cut happens 2. After reboot, we try to mount UBIFS 3. Mount fails with "No space left on device" error message UBIFS complains like this: UBIFS error (pid 28225): grab_empty_leb: could not find an empty LEB The root cause of this problem is that when we mount, not all LEBs are categorized. Only those which were read are. However, the 'ubifs_find_free_leb_for_idx()' function assumes that all LEBs were categorized and 'c->freeable_cnt' is valid, which is a false assumption. This patch fixes the problem by teaching 'ubifs_find_free_leb_for_idx()' to always fall back to LPT scanning if no freeable LEBs were found. This problem was reported by few people in the past, but Brent Taylor was able to reproduce it and send me a flash image which cannot be mounted, which made it easy to hunt the bug. Kudos to Brent. Reported-by: Brent Taylor <motobud@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	UBIFS: introduce categorized lprops counter	Artem Bityutskiy
	commit 98a1eebda3cb2a84ecf1f219bb3a95769033d1bf upstream. This commit is a preparation for a subsequent bugfix. We introduce a counter for categorized lprops. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16	eCryptfs: check for eCryptfs cipher support at mount	Tim Sally
	commit 5f5b331d5c21228a6519dcb793fc1629646c51a6 upstream. The issue occurs when eCryptfs is mounted with a cipher supported by the crypto subsystem but not by eCryptfs. The mount succeeds and an error does not occur until a write. This change checks for eCryptfs cipher support at mount time. Resolves Launchpad issue #338914, reported by Tyler Hicks in 03/2009. https://bugs.launchpad.net/ecryptfs/+bug/338914 Signed-off-by: Tim Sally <tsally@atomicpeace.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16	eCryptfs: Copy up POSIX ACL and read-only flags from lower mount	Tyler Hicks
	commit 069ddcda37b2cf5bb4b6031a944c0e9359213262 upstream. When the eCryptfs mount options do not include '-o acl', but the lower filesystem's mount options do include 'acl', the MS_POSIXACL flag is not flipped on in the eCryptfs super block flags. This flag is what the VFS checks in do_last() when deciding if the current umask should be applied to a newly created inode's mode or not. When a default POSIX ACL mask is set on a directory, the current umask is incorrectly applied to new inodes created in the directory. This patch ignores the MS_POSIXACL flag passed into ecryptfs_mount() and sets the flag on the eCryptfs super block depending on the flag's presence on the lower super block. Additionally, it is incorrect to allow a writeable eCryptfs mount on top of a read-only lower mount. This missing check did not allow writes to the read-only lower mount because permissions checks are still performed on the lower filesystem's objects but it is best to simply not allow a rw mount on top of ro mount. However, a ro eCryptfs mount on top of a rw mount is valid and still allowed. https://launchpad.net/bugs/1009207 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Stefan Beller <stefanbeller@googlemail.com> Cc: John Johansen <john.johansen@canonical.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16	fanotify: fix missing break	Eric Paris
	commit 848561d368751a1c0f679b9f045a02944506a801 upstream. Anders Blomdell noted in 2010 that Fanotify lost events and provided a test case. Eric Paris confirmed it was a bug and posted a fix to the list https://groups.google.com/forum/?fromgroups=#!topic/linux.kernel/RrJfTfyW2BE but never applied it. Repeated attempts over time to actually get him to apply it have never had a reply from anyone who has raised it So apply it anyway Signed-off-by: Alan Cox <alan@linux.intel.com> Reported-by: Anders Blomdell <anders.blomdell@control.lth.se> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16	xfs: fix reading of wrapped log data	Dave Chinner
	commit 6ce377afd1755eae5c93410ca9a1121dfead7b87 upstream. Commit 4439647 ("xfs: reset buffer pointers before freeing them") in 3.0-rc1 introduced a regression when recovering log buffers that wrapped around the end of log. The second part of the log buffer at the start of the physical log was being read into the header buffer rather than the data buffer, and hence recovery was seeing garbage in the data buffer when it got to the region of the log buffer that was incorrectly read. Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16	cifs: fix potential buffer overrun in cifs.idmap handling code	Jeff Layton
	commit 36960e440ccf94349c09fb944930d3bfe4bc473f upstream. The userspace cifs.idmap program generally works with the wbclient libs to generate binary SIDs in userspace. That program defines the struct that holds these values as having a max of 15 subauthorities. The kernel idmapping code however limits that value to 5. When the kernel copies those values around though, it doesn't sanity check the num_subauths value handed back from userspace or from the server. It's possible therefore for userspace to hand us back a bogus num_subauths value (or one that's valid, but greater than 5) that could cause the kernel to walk off the end of the cifs_sid->sub_auths array. Fix this by defining a new routine for copying sids and using that in all of the places that copy it. If we end up with a sid that's longer than expected then this approach will just lop off the "extra" subauths, but that's basically what the code does today already. Better approaches might be to fix this code to reject SIDs with >5 subauths, or fix it to handle the subauths array dynamically. At the same time, change the kernel to check the length of the data returned by userspace. If it's shorter than struct cifs_sid, reject it and return -EIO. If that happens we'll end up with fields that are basically uninitialized. Long term, it might make sense to redefine cifs_sid using a flexarray at the end, to allow for variable-length subauth lists, and teach the code to handle the case where the subauths array being passed in from userspace is shorter than 5 elements. Note too, that I don't consider this a security issue since you'd need a compromised cifs.idmap program. If you have that, you can do all sorts of nefarious stuff. Still, this is probably reasonable for stable. Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16	nfs: Show original device name verbatim in /proc/*/mount{s,info}	Ben Hutchings
	commit 97a54868262da1629a3e65121e65b8e8c4419d9f upstream. Since commit c7f404b ('vfs: new superblock methods to override /proc/*/mount{s,info}'), nfs_path() is used to generate the mounted device name reported back to userland. nfs_path() always generates a trailing slash when the given dentry is the root of an NFS mount, but userland may expect the original device name to be returned verbatim (as it used to be). Make this canonicalisation optional and change the callers accordingly. [jrnieder@gmail.com: use flag instead of bool argument] Reported-and-tested-by: Chris Hiestand <chiestand@salk.edu> Reference: http://bugs.debian.org/669314 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: - Adjust context - nfs_show_devname() still takes a pointer to struct vfsmount]
2012-11-16	nfsv3: Make v3 mounts fail with ETIMEDOUTs instead EIO on mountd timeouts	Scott Mayhew
	commit acce94e68a0f346115fd41cdc298197d2d5a59ad upstream. In very busy v3 environment, rpc.mountd can respond to the NULL procedure but not the MNT procedure in a timely manner causing the MNT procedure to time out. The problem is the mount system call returns EIO which causes the mount to fail, instead of ETIMEDOUT, which would cause the mount to be retried. This patch sets the RPC_TASK_SOFT\|RPC_TASK_TIMEOUT flags to the rpc_call_sync() call in nfs_mount() which causes ETIMEDOUT to be returned on timed out connections. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16	NFS: fix bug in legacy DNS resolver.	NeilBrown
	commit 8d96b10639fb402357b75b055b1e82a65ff95050 upstream. The DNS resolver's use of the sunrpc cache involves a 'ttl' number (relative) rather that a timeout (absolute). This confused me when I wrote commit c5b29f885afe890f953f7f23424045cdad31d3e4 "sunrpc: use seconds since boot in expiry cache" and I managed to break it. The effect is that any TTL is interpreted as 0, and nothing useful gets into the cache. This patch removes the use of get_expiry() - which really expects an expiry time - and uses get_uint() instead, treating the int correctly as a ttl. This fixes a regression that has been present since 2.6.37, causing certain NFS accesses in certain environments to incorrectly fail. Reported-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16	NFSv4: nfs4_locku_done must release the sequence id	Trond Myklebust
	commit 2b1bc308f492589f7d49012ed24561534ea2be8c upstream. If the state recovery machinery is triggered by the call to nfs4_async_handle_error() then we can deadlock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16	NFSv4.1: We must release the sequence id when we fail to get a session slot	Trond Myklebust
	commit 2240a9e2d013d8269ea425b73e1d7a54c7bc141f upstream. If we do not release the sequence id in cases where we fail to get a session slot, then we can deadlock if we hit a recovery scenario. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: - Adjust context - nfs4_setup_sequence() has an additional 'cache_reply' parameter] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16	NFS: Wait for session recovery to finish before returning	Bryan Schumaker
	commit 399f11c3d872bd748e1575574de265a6304c7c43 upstream. Currently, we will schedule session recovery and then return to the caller of nfs4_handle_exception. This works for most cases, but causes a hang on the following test case: Client Server ------ ------ Open file over NFS v4.1 Write to file Expire client Try to lock file The server will return NFS4ERR_BADSESSION, prompting the client to schedule recovery. However, the client will continue placing lock attempts and the open recovery never seems to be scheduled. The simplest solution is to wait for session recovery to run before retrying the lock. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16	nfsd: add get_uint for u32's	J. Bruce Fields
	commit a007c4c3e943ecc054a806c259d95420a188754b upstream. I don't think there's a practical difference for the range of values these interfaces should see, but it would be safer to be unambiguous. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	Revert "lockd: use rpc client's cl_nodename for id encoding"	Ben Hutchings
	This reverts 5ff39e971c87ea9f4c4c7b253898abafa960e32b which was commit 303a7ce92064c285a04c870f2dc0192fdb2968cb upstream. It is not necessary for kernel versions without per-netns RPC clients. Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing error check	Kees Cook
	commit 12176503366885edd542389eed3aaf94be163fdb upstream. The compat ioctl for VIDEO_SET_SPU_PALETTE was missing an error check while converting ioctl arguments. This could lead to leaking kernel stack contents into userspace. Patch extracted from existing fix in grsecurity. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: David Miller <davem@davemloft.net> Cc: Brad Spengler <spender@grsecurity.net> Cc: PaX Team <pageexec@freemail.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD	Oleg Nesterov
	commit b40a79591ca918e7b91b0d9b6abd5d00f2e88c19 upstream. flush_old_exec() clears PF_KTHREAD but forgets about PF_NOFREEZE. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [bwh: Backported to 3.2: PF_FORKNOEXEC is cleared elsewhere] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	sysfs: sysfs_pathname/sysfs_add_one: Use strlcat() instead of strcat()	Geert Uytterhoeven
	commit 66081a72517a131430dcf986775f3268aafcb546 upstream. The warning check for duplicate sysfs entries can cause a buffer overflow when printing the warning, as strcat() doesn't check buffer sizes. Use strlcat() instead. Since strlcat() doesn't return a pointer to the passed buffer, unlike strcat(), I had to convert the nested concatenation in sysfs_add_one() to an admittedly more obscure comma operator construct, to avoid emitting code for the concatenation if CONFIG_BUG is disabled. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	NLM: nlm_lookup_file() may return NLMv4-specific error codes	Trond Myklebust
	commit cd0b16c1c3cda12dbed1f8de8f1a9b0591990724 upstream. If the filehandle is stale, or open access is denied for some reason, nlm_fopen() may return one of the NLMv4-specific error codes nlm4_stale_fh or nlm4_failed. These get passed right through nlm_lookup_file(), and so when nlmsvc_retrieve_args() calls the latter, it needs to filter the result through the cast_status() machinery. Failure to do so, will trigger the BUG_ON() in encode_nlm_stat... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Reported-by: Larry McVoy <lm@bitmover.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	nohz: Fix idle ticks in cpu summary line of /proc/stat	Michal Hocko
	commit 7386cdbf2f57ea8cff3c9fde93f206e58b9fe13f upstream. Git commit 09a1d34f8535ecf9 "nohz: Make idle/iowait counter update conditional" introduced a bug in regard to cpu hotplug. The effect is that the number of idle ticks in the cpu summary line in /proc/stat is still counting ticks for offline cpus. Reproduction is easy, just start a workload that keeps all cpus busy, switch off one or more cpus and then watch the idle field in top. On a dual-core with one cpu 100% busy and one offline cpu you will get something like this: %Cpu(s): 48.7 us, 1.3 sy, 0.0 ni, 50.0 id, 0.0 wa, 0.0 hi, 0.0 si, %0.0 st The problem is that an offline cpu still has ts->idle_active == 1. To fix this we should make sure that the cpu is online when calling get_cpu_idle_time_us and get_cpu_iowait_time_us. [Srivatsa: Rebased to current mainline] Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20121010061820.8999.57245.stgit@srivatsabhat.in.ibm.com Cc: deepthi@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	ext4: race-condition protection for ext4_convert_unwritten_extents_endio	Dmitry Monakhov
	commit dee1f973ca341c266229faa5a1a5bb268bed3531 upstream. We assumed that at the time we call ext4_convert_unwritten_extents_endio() extent in question is fully inside [map.m_lblk, map->m_len] because it was already split during submission. But this may not be true due to a race between writeback vs fallocate. If extent in question is larger than requested we will split it again. Special precautions should being done if zeroout required because [map.m_lblk, map->m_len] already contains valid data. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking	Hugh Dickins
	commit 35c2a7f4908d404c9124c2efc6ada4640ca4d5d5 upstream. Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(), u64 inum = fid->raw[2]; which is unhelpfully reported as at the end of shmem_alloc_inode(): BUG: unable to handle kernel paging request at ffff880061cd3000 IP: [<ffffffff812190d0>] shmem_alloc_inode+0x40/0x40 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Call Trace: [<ffffffff81488649>] ? exportfs_decode_fh+0x79/0x2d0 [<ffffffff812d77c3>] do_handle_open+0x163/0x2c0 [<ffffffff812d792c>] sys_open_by_handle_at+0xc/0x10 [<ffffffff83a5f3f8>] tracesys+0xe1/0xe6 Right, tmpfs is being stupid to access fid->raw[2] before validating that fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may fall at the end of a page, and the next page not be present. But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and could oops in the same way: add the missing fh_len checks to those. Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Sage Weil <sage@inktank.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	nfsd4: fix nfs4 stateid leak	J. Bruce Fields
	commit cf9182e90b2af04245ac4fae497fe73fc71285b4 upstream. Processes that open and close multiple files may end up setting this oo_last_closed_stid without freeing what was previously pointed to. This can result in a major leak, visible for example by watching the nfsd4_stateids line of /proc/slabinfo. Reported-by: Cyril B. <cbay@excellency.fr> Tested-by: Cyril B. <cbay@excellency.fr> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	jbd: Fix assertion failure in commit code due to lacking transaction credits	Jan Kara
	commit 09e05d4805e6c524c1af74e524e5d0528bb3fef3 upstream. ext3 users of data=journal mode with blocksize < pagesize were occasionally hitting assertion failure in journal_commit_transaction() checking whether the transaction has at least as many credits reserved as buffers attached. The core of the problem is that when a file gets truncated, buffers that still need checkpointing or that are attached to the committing transaction are left with buffer_mapped set. When this happens to buffers beyond i_size attached to a page stradding i_size, subsequent write extending the file will see these buffers and as they are mapped (but underlying blocks were freed) things go awry from here. The assertion failure just coincidentally (and in this case luckily as we would start corrupting filesystem) triggers due to journal_head not being properly cleaned up as well. Under some rare circumstances this bug could even hit data=ordered mode users. There the assertion won't trigger and we would end up corrupting the filesystem. We fix the problem by unmapping buffers if possible (in lots of cases we just need a buffer attached to a transaction as a place holder but it must not be written out anyway). And in one case, we just have to bite the bullet and wait for transaction commit to finish. Reviewed-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-17	eCryptfs: Call lower ->flush() from ecryptfs_flush()	Tyler Hicks
	commit 64e6651dcc10e9d2cc6230208a8e6c2cfd19ae18 upstream. Since eCryptfs only calls fput() on the lower file in ecryptfs_release(), eCryptfs should call the lower filesystem's ->flush() from ecryptfs_flush(). If the lower filesystem implements ->flush(), then eCryptfs should try to flush out any dirty pages prior to calling the lower ->flush(). If the lower filesystem does not implement ->flush(), then eCryptfs has no need to do anything in ecryptfs_flush() since dirty pages are now written out to the lower filesystem in ecryptfs_release(). Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-17	eCryptfs: Write out all dirty pages just before releasing the lower file	Tyler Hicks
	commit 7149f2558d5b5b988726662fe58b1c388337805b upstream. Fixes a regression caused by: 821f749 eCryptfs: Revert to a writethrough cache model That patch reverted some code (specifically, 32001d6f) that was necessary to properly handle open() -> mmap() -> close() -> dirty pages -> munmap(), because the lower file could be closed before the dirty pages are written out. Rather than reapplying 32001d6f, this approach is a better way of ensuring that the lower file is still open in order to handle writing out the dirty pages. It is called from ecryptfs_release(), while we have a lock on the lower file pointer, just before the lower file gets the final fput() and we overwrite the pointer. https://launchpad.net/bugs/1047261 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Artemy Tregubenko <me@arty.name> Tested-by: Artemy Tregubenko <me@arty.name> Tested-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-17	eCryptfs: Revert to a writethrough cache model	Tyler Hicks
	commit 821f7494a77627fb1ab539591c57b22cdca702d6 upstream. A change was made about a year ago to get eCryptfs to better utilize its page cache during writes. The idea was to do the page encryption operations during page writeback, rather than doing them when initially writing into the page cache, to reduce the number of page encryption operations during sequential writes. This meant that the encrypted page would only be written to the lower filesystem during page writeback, which was a change from how eCryptfs had previously wrote to the lower filesystem in ecryptfs_write_end(). The change caused a few eCryptfs-internal bugs that were shook out. Unfortunately, more grave side effects have been identified that will force changes outside of eCryptfs. Because the lower filesystem isn't consulted until page writeback, eCryptfs has no way to pass lower write errors (ENOSPC, mainly) back to userspace. Additionaly, it was reported that quotas could be bypassed because of the way eCryptfs may sometimes open the lower filesystem using a privileged kthread. It would be nice to resolve the latest issues, but it is best if the eCryptfs commits be reverted to the old behavior in the meantime. This reverts: 32001d6f "eCryptfs: Flush file in vma close" 5be79de2 "eCryptfs: Flush dirty pages in setattr" 57db4e8d "ecryptfs: modify write path to encrypt page in writepage" Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Tested-by: Colin King <colin.king@canonical.com> Cc: Colin King <colin.king@canonical.com> Cc: Thieu Le <thieule@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-17	eCryptfs: Initialize empty lower files when opening them	Tyler Hicks
	commit e3ccaa9761200952cc269b1f4b7d7bb77a5e071b upstream. Historically, eCryptfs has only initialized lower files in the ecryptfs_create() path. Lower file initialization is the act of writing the cryptographic metadata from the inode's crypt_stat to the header of the file. The ecryptfs_open() path already expects that metadata to be in the header of the file. A number of users have reported empty lower files in beneath their eCryptfs mounts. Most of the causes for those empty files being left around have been addressed, but the presence of empty files causes problems due to the lack of proper cryptographic metadata. To transparently solve this problem, this patch initializes empty lower files in the ecryptfs_open() error path. If the metadata is unreadable due to the lower inode size being 0, plaintext passthrough support is not in use, and the metadata is stored in the header of the file (as opposed to the user.ecryptfs extended attribute), the lower file will be initialized. The number of nested conditionals in ecryptfs_open() was getting out of hand, so a helper function was created. To avoid the same nested conditional problem, the conditional logic was reversed inside of the helper function. https://launchpad.net/bugs/911507 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Cc: John Johansen <john.johansen@canonical.com> Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-17	eCryptfs: Unlink lower inode when ecryptfs_create() fails	Tyler Hicks
	commit 8bc2d3cf612994a960c2e8eaea37f6676f67082a upstream. ecryptfs_create() creates a lower inode, allocates an eCryptfs inode, initializes the eCryptfs inode and cryptographic metadata attached to the inode, and then writes the metadata to the header of the file. If an error was to occur after the lower inode was created, an empty lower file would be left in the lower filesystem. This is a problem because ecryptfs_open() refuses to open any lower files which do not have the appropriate metadata in the file header. This patch properly unlinks the lower inode when an error occurs in the later stages of ecryptfs_create(), reducing the chance that an empty lower file will be left in the lower filesystem. https://launchpad.net/bugs/872905 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Cc: John Johansen <john.johansen@canonical.com> Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-17	udf: fix retun value on error path in udf_load_logicalvol	Nikola Pajkovsky
	commit 68766a2edcd5cd744262a70a2f67a320ac944760 upstream. In case we detect a problem and bail out, we fail to set "ret" to a nonzero value, and udf_load_logicalvol will mistakenly report success. Signed-off-by: Nikola Pajkovsky <npajkovs@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>