aboutsummaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)Author
2014-01-13nfs: page cache invalidation for dioChristoph Hellwig
Make sure to properly invalidate the pagecache before performing direct I/O, so that no stale pages are left around. This matches what the generic direct I/O code does. Also take the i_mutex over the direct write submission to avoid the lifelock vs truncate waiting for i_dio_count to decrease, and to avoid having the pagecache easily repopulated while direct I/O is in progrss. Again matching the generic direct I/O code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-13nfs: take i_mutex during direct I/O readsChristoph Hellwig
We'll need the i_mutex to prevent i_dio_count from incrementing while truncate is waiting for it to reach zero, and protects against having the pagecache repopulated after we flushed it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-13nfs: merge nfs_direct_write into nfs_file_direct_writeChristoph Hellwig
Simple code cleanup to prepare for later fixes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-13nfs: merge nfs_direct_read into nfs_file_direct_readChristoph Hellwig
Simple code cleanup to prepare for later fixes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-13nfs: increment i_dio_count for reads, tooChristoph Hellwig
i_dio_count is used to protect dio access against truncate. We want to make sure there are no dio reads pending either when doing a truncate. I suspect on plain NFS things might work even without this, but once we use a pnfs layout driver that access backing devices directly things will go bad without the proper synchronization. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-13nfs: defer inode_dio_done call until size update is doneChristoph Hellwig
We need to have the I/O fully finished before telling the truncate code that we are done. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-13nfs: fix size updates for aio writesChristoph Hellwig
nfs_file_direct_write only updates the inode size if it succeeded and returned the number of bytes written. But in the AIO case nfs_direct_wait turns the return value into -EIOCBQUEUED and we skip the size update. Instead the aio completion path should updated it, which this patch does. The implementation is a little hacky because there is no obvious way to find out we are called for a write in nfs_direct_complete. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-13nfs4.1: properly handle ENOTSUP in SECINFO_NO_NAMEWeston Andros Adamson
Don't check for -NFS4ERR_NOTSUPP, it's already been mapped to -ENOTSUPP by nfs4_stat_to_errno. This allows the client to mount v4.1 servers that don't support SECINFO_NO_NAME by falling back to the "guess and check" method of nfs4_find_root_sec. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Cc: stable@vger.kernel.org # 3.1+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-13NFSv4.1: Fix a race in nfs4_write_inodeTrond Myklebust
nfs4_write_inode() must not be allowed to exit until the layoutcommit is done. That means that both NFS_INO_LAYOUTCOMMIT and NFS_INO_LAYOUTCOMMITTING have to be cleared. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-13NFSv4.1: Don't trust attributes if a pNFS LAYOUTCOMMIT is outstandingTrond Myklebust
If a LAYOUTCOMMIT is outstanding, then chances are that the metadata server may still be returning incorrect values for the change attribute, ctime, mtime and/or size. Just ignore those attributes for now, and wait for the LAYOUTCOMMIT rpc call to finish. Reported-by: shaobingqing <shaobingqing@bwstor.com.cn> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-05point to the right include file in a comment (left over from a9004abc3)Toralf Förster
Signed-off-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-05NFS: dprintk() should not print negative fileids and inode numbersNiels de Vos
A fileid in NFS is a uint64. There are some occurrences where dprintk() outputs a signed fileid. This leads to confusion and more difficult to read debugging (negative fileids matching positive inode numbers). Signed-off-by: Niels de Vos <ndevos@redhat.com> CC: Santosh Pradhan <spradhan@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-05nfs: fix dead code of ipv6_addr_scopeAlexander Aring
The correct way to check on IPV6_ADDR_SCOPE_LINKLOCAL is to check with the ipv6_addr_src_scope function. Currently this can't be work, because ipv6_addr_scope returns a int with a mask of IPV6_ADDR_SCOPE_MASK (0x00f0U) and IPV6_ADDR_SCOPE_LINKLOCAL is 0x02. So the condition is always false. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2013-12-31Merge tag 'v3.13-rc6' into for-3.14/coreJens Axboe
Needed to bring blk-mq uptodate, since changes have been going in since for-3.14/core was established. Fixup merge issues related to the immutable biovec changes. Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-flush.c fs/btrfs/check-integrity.c fs/btrfs/extent_io.c fs/btrfs/scrub.c fs/logfs/dev_bdev.c
2013-12-06nfs: check if gssd is running before attempting to use krb5i auth in ↵Jeff Layton
SETCLIENTID call Currently, the client will attempt to use krb5i in the SETCLIENTID call even if rpc.gssd isn't running. When that fails, it'll then fall back to RPC_AUTH_UNIX. This introduced a delay when mounting if rpc.gssd isn't running, and causes warning messages to pop up in the ring buffer. Check to see if rpc.gssd is running before even attempting to use krb5i auth, and just silently skip trying to do so if it isn't. In the event that the admin is actually trying to mount with krb5*, it will still fail at a later stage of the mount attempt. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-12-06NFSv4: OPEN must handle the NFS4ERR_IO return code correctlyTrond Myklebust
decode_op_hdr() cannot distinguish between an XDR decoding error and the perfectly valid errorcode NFS4ERR_IO. This is normally not a problem, but for the particular case of OPEN, we need to be able to increment the NFSv4 open sequence id when the server returns a valid response. Reported-by: J Bruce Fields <bfields@fieldses.org> Link: http://lkml.kernel.org/r/20131204210356.GA19452@fieldses.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org
2013-12-05Merge tag 'nfs-for-3.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: - Stable fix for a NFSv4.1 delegation and state recovery deadlock - Stable fix for a loop on irrecoverable errors when returning delegations - Fix a 3-way deadlock between layoutreturn, open, and state recovery - Update the MAINTAINERS file with contact information for Trond Myklebust - Close needs to handle NFS4ERR_ADMIN_REVOKED - Enabling v4.2 should not recompile nfsd and lockd - Fix a couple of compile warnings * tag 'nfs-for-3.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: fix do_div() warning by instead using sector_div() MAINTAINERS: Update contact information for Trond Myklebust NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recovery SUNRPC: do not fail gss proc NULL calls with EACCES NFSv4: close needs to handle NFS4ERR_ADMIN_REVOKED NFSv4: Update list of irrecoverable errors on DELEGRETURN NFSv4 wait on recovery for async session errors NFS: Fix a warning in nfs_setsecurity NFS: Enabling v4.2 should not recompile nfsd and lockd
2013-12-04nfs: fix do_div() warning by instead using sector_div()Helge Deller
When compiling a 32bit kernel with CONFIG_LBDAF=n the compiler complains like shown below. Fix this warning by instead using sector_div() which is provided by the kernel.h header file. fs/nfs/blocklayout/extents.c: In function ‘normalize’: include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default] fs/nfs/blocklayout/extents.c:47:13: note: in expansion of macro ‘do_div’ nfs/blocklayout/extents.c:47:2: warning: right shift count >= width of type [enabled by default] fs/nfs/blocklayout/extents.c:47:2: warning: passing argument 1 of ‘__div64_32’ from incompatible pointer type [enabled by default] include/asm-generic/div64.h:35:17: note: expected ‘uint64_t *’ but argument is of type ‘sector_t *’ extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor); Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-12-04NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recoveryTrond Myklebust
Andy Adamson reports: The state manager is recovering expired state and recovery OPENs are being processed. If kswapd is pruning inodes at the same time, a deadlock can occur when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the resultant layoutreturn gets an error that the state mangager is to handle, causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq. At the same time an open is waiting for the inode deletion to complete in __wait_on_freeing_inode. If the open is either the open called by the state manager, or an open from the same open owner that is holding the NFSv4 sequence id which causes the OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue, then the state is deadlocked with kswapd. The fix is simply to have layoutreturn ignore all errors except NFS4ERR_DELAY. We already know that layouts are dropped on all server reboots, and that it has to be coded to deal with the "forgetful client model" that doesn't send layoutreturns. Reported-by: Andy Adamson <andros@netapp.com> Link: http://lkml.kernel.org/r/1385402270-14284-1-git-send-email-andros@netapp.com Signed-off-by: Trond Myklebust <Trond.Myklebust@primarydata.com>
2013-11-23block: Abstract out bvec iteratorKent Overstreet
Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6
2013-11-23block: Convert various code to bio_for_each_segment()Kent Overstreet
With immutable biovecs we don't want code accessing bi_io_vec directly - the uses this patch changes weren't incorrect since they all own the bio, but it makes the code harder to audit for no good reason - also, this will help with multipage bvecs later. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-20NFSv4: close needs to handle NFS4ERR_ADMIN_REVOKEDTrond Myklebust
Also ensure that we zero out the stateid mode when exiting Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-20NFSv4: Update list of irrecoverable errors on DELEGRETURNTrond Myklebust
If the DELEGRETURN errors out with something like NFS4ERR_BAD_STATEID then there is no recovery possible. Just quit without returning an error. Also, note that the client must not assume that the NFSv4 lease has been renewed when it sees an error on DELEGRETURN. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2013-11-20NFSv4 wait on recovery for async session errorsAndy Adamson
When the state manager is processing the NFS4CLNT_DELEGRETURN flag, session draining is off, but DELEGRETURN can still get a session error. The async handler calls nfs4_schedule_session_recovery returns -EAGAIN, and the DELEGRETURN done then restarts the RPC task in the prepare state. With the state manager still processing the NFS4CLNT_DELEGRETURN flag with session draining off, these DELEGRETURNs will cycle with errors filling up the session slots. This prevents OPEN reclaims (from nfs_delegation_claim_opens) required by the NFS4CLNT_DELEGRETURN state manager processing from completing, hanging the state manager in the __rpc_wait_for_completion_task in nfs4_run_open_task as seen in this kernel thread dump: kernel: 4.12.32.53-ma D 0000000000000000 0 3393 2 0x00000000 kernel: ffff88013995fb60 0000000000000046 ffff880138cc5400 ffff88013a9df140 kernel: ffff8800000265c0 ffffffff8116eef0 ffff88013fc10080 0000000300000001 kernel: ffff88013a4ad058 ffff88013995ffd8 000000000000fbc8 ffff88013a4ad058 kernel: Call Trace: kernel: [<ffffffff8116eef0>] ? cache_alloc_refill+0x1c0/0x240 kernel: [<ffffffffa0358110>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc] kernel: [<ffffffffa0358152>] rpc_wait_bit_killable+0x42/0xa0 [sunrpc] kernel: [<ffffffff8152914f>] __wait_on_bit+0x5f/0x90 kernel: [<ffffffffa0358110>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc] kernel: [<ffffffff815291f8>] out_of_line_wait_on_bit+0x78/0x90 kernel: [<ffffffff8109b520>] ? wake_bit_function+0x0/0x50 kernel: [<ffffffffa035810d>] __rpc_wait_for_completion_task+0x2d/0x30 [sunrpc] kernel: [<ffffffffa040d44c>] nfs4_run_open_task+0x11c/0x160 [nfs] kernel: [<ffffffffa04114e7>] nfs4_open_recover_helper+0x87/0x120 [nfs] kernel: [<ffffffffa0411646>] nfs4_open_recover+0xc6/0x150 [nfs] kernel: [<ffffffffa040cc6f>] ? nfs4_open_recoverdata_alloc+0x2f/0x60 [nfs] kernel: [<ffffffffa0414e1a>] nfs4_open_delegation_recall+0x6a/0xa0 [nfs] kernel: [<ffffffffa0424020>] nfs_end_delegation_return+0x120/0x2e0 [nfs] kernel: [<ffffffff8109580f>] ? queue_work+0x1f/0x30 kernel: [<ffffffffa0424347>] nfs_client_return_marked_delegations+0xd7/0x110 [nfs] kernel: [<ffffffffa04225d8>] nfs4_run_state_manager+0x548/0x620 [nfs] kernel: [<ffffffffa0422090>] ? nfs4_run_state_manager+0x0/0x620 [nfs] kernel: [<ffffffff8109b0f6>] kthread+0x96/0xa0 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20 kernel: [<ffffffff8109b060>] ? kthread+0x0/0xa0 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 The state manager can not therefore process the DELEGRETURN session errors. Change the async handler to wait for recovery on session errors. Signed-off-by: Andy Adamson <andros@netapp.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-19NFS: Fix a warning in nfs_setsecurityTrond Myklebust
Fix the following warning: linux-nfs/fs/nfs/inode.c:315:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-19NFS: Enabling v4.2 should not recompile nfsd and lockdAnna Schumaker
When CONFIG_NFS_V4_2 is toggled nfsd and lockd will be recompiled, instead of only the nfs client. This patch moves a small amount of code into the client directory to avoid unnecessary recompiles. Signed-off-by: Anna Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-16Merge tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes: - Stable fix for data corruption when retransmitting O_DIRECT writes - Stable fix for a deep recursion/stack overflow bug in rpc_release_client - Stable fix for infinite looping when mounting a NFSv4.x volume - Fix a typo in the nfs mount option parser - Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is * tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: fix pnfs Kconfig defaults NFS: correctly report misuse of "migration" mount option. nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once SUNRPC: Avoid deep recursion in rpc_release_client SUNRPC: Fix a data corruption issue when retransmitting RPC calls
2013-11-15nfs: fix pnfs Kconfig defaultsChristoph Hellwig
Defaulting to m seem to prevent building the pnfs layout modules into the kernel. Default to the value of CONFIG_NFS_V4 make sure they are built in for built-in NFSv4 support and modular for a modular NFSv4. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-15NFS: correctly report misuse of "migration" mount option.NeilBrown
The current test on valid use of the "migration" mount option can never report an error as it will only do so if mnt->version !=4 && mnt->minor_version != 0 (and some other condition), but if that test would succeed, then the previous test has already gone-to out_minorversion_mismatch. So change the && to an || to get correct semantics. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-15tree-wide: use reinit_completion instead of INIT_COMPLETIONWolfram Sang
Use this new function to make code more comprehensible, since we are reinitialzing the completion, not initializing. [akpm@linux-foundation.org: linux-next resyncs] Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13) Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than onceJeff Layton
Currently, when we try to mount and get back NFS4ERR_CLID_IN_USE or NFS4ERR_WRONGSEC, we create a new rpc_clnt and then try the call again. There is no guarantee that doing so will work however, so we can end up retrying the call in an infinite loop. Worse yet, we create the new client using rpc_clone_client_set_auth, which creates the new client as a child of the old one. Thus, we can end up with a *very* long lineage of rpc_clnts. When we go to put all of the references to them, we can end up with a long call chain that can smash the stack as each rpc_free_client() call can recurse back into itself. This patch fixes this by simply ensuring that the SETCLIENTID call will only be retried in this situation if the last attempt did not use RPC_AUTH_UNIX. Note too that with this change, we don't need the (i > 2) check in the -EACCES case since we now have a more reliable test as to whether we should reattempt. Cc: stable@vger.kernel.org # v3.10+ Cc: Chuck Lever <chuck.lever@oracle.com> Tested-by/Acked-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "All kinds of stuff this time around; some more notable parts: - RCU'd vfsmounts handling - new primitives for coredump handling - files_lock is gone - Bruce's delegations handling series - exportfs fixes plus misc stuff all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits) ecryptfs: ->f_op is never NULL locks: break delegations on any attribute modification locks: break delegations on link locks: break delegations on rename locks: helper functions for delegation breaking locks: break delegations on unlink namei: minor vfs_unlink cleanup locks: implement delegations locks: introduce new FL_DELEG lock flag vfs: take i_mutex on renamed file vfs: rename I_MUTEX_QUOTA now that it's not used for quotas vfs: don't use PARENT/CHILD lock classes for non-directories vfs: pull ext4's double-i_mutex-locking into common code exportfs: fix quadratic behavior in filehandle lookup exportfs: better variable name exportfs: move most of reconnect_path to helper function exportfs: eliminate unused "noprogress" counter exportfs: stop retrying once we race with rename/remove exportfs: clear DISCONNECTED on all parents sooner exportfs: more detailed comment for path_reconnect ...
2013-11-04NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_securityTrond Myklebust
We already check for nfs_server_capable(inode, NFS_CAP_SECURITY_LABEL) in nfs4_label_alloc() We check the minor version in _nfs4_server_capabilities before setting NFS_CAP_SECURITY_LABEL. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-04NFSv4: Sanity check the server reply in _nfs4_server_capabilitiesTrond Myklebust
We don't want to be setting capabilities and/or requesting attributes that are not appropriate for the NFSv4 minor version. - Ensure that we clear the NFS_CAP_SECURITY_LABEL capability when appropriate - Ensure that we limit the attribute bitmasks to the mounted_on_fileid attribute and less for NFSv4.0 - Ensure that we limit the attribute bitmasks to suppattr_exclcreat and less for NFSv4.1 - Ensure that we limit it to change_sec_label or less for NFSv4.2 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-04NFSv4.2: encode_readdir - only ask for labels when doing readdirplusTrond Myklebust
Currently, if the server is doing NFSv4.2 and supports labeled NFS, then our on-the-wire READDIR request ends up asking for the label information, which is then ignored unless we're doing readdirplus. This patch ensures that READDIR doesn't ask the server for label information at all unless the readdir->bitmask contains the FATTR4_WORD2_SECURITY_LABEL attribute, and the readdir->plus flag is set. While we're at it, optimise away the 3rd bitmap field if it is zero. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-04nfs: set security label when revalidating inodeJeff Layton
Currently, we fetch the security label when revalidating an inode's attributes, but don't apply it. This is in contrast to the readdir() codepath where we do apply label changes. Cc: Dave Quigley <dpquigl@davequigley.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-01NFS: Fix a missing initialisation when reading the SELinux labelTrond Myklebust
Ensure that _nfs4_do_get_security_label() also initialises the SEQUENCE call correctly, by having it call into nfs4_call_sync(). Reported-by: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org # 3.11+ Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-01nfs: fix oops when trying to set SELinux labelJeff Layton
Chao reported the following oops when testing labeled NFS: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4] PGD 277bbd067 PUD 2777ea067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache sg coretemp kvm_intel kvm crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul iTCO_wdt glue_helper ablk_helper cryptd iTCO_vendor_support bnx2 pcspkr serio_raw i7core_edac cdc_ether microcode usbnet edac_core mii lpc_ich i2c_i801 mfd_core shpchp ioatdma dca acpi_cpufreq mperf nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ata_generic ttm pata_acpi drm ata_piix libata megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 4 PID: 25657 Comm: chcon Not tainted 3.10.0-33.el7.x86_64 #1 Hardware name: IBM System x3550 M3 -[7944OEJ]-/90Y4784 , BIOS -[D6E150CUS-1.11]- 02/08/2011 task: ffff880178397220 ti: ffff8801595d2000 task.ti: ffff8801595d2000 RIP: 0010:[<ffffffffa0568703>] [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4] RSP: 0018:ffff8801595d3888 EFLAGS: 00010296 RAX: 0000000000000000 RBX: ffff8801595d3b30 RCX: 0000000000000b4c RDX: ffff8801595d3b30 RSI: ffff8801595d38e0 RDI: ffff880278b6ec00 RBP: ffff8801595d38c8 R08: ffff8801595d3b30 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801595d38e0 R13: ffff880277a4a780 R14: ffffffffa05686c0 R15: ffff8802765f206c FS: 00007f2c68486800(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000027651a000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ffff880277865800 ffff880278b6ec00 ffff880277a4a780 ffff8801595d3948 ffffffffa02ad926 ffff8801595d3b30 ffff8802765f206c Call Trace: [<ffffffffa02ad926>] rpcauth_wrap_req+0x86/0xd0 [sunrpc] [<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc] [<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc] [<ffffffffa02a1ecb>] call_transmit+0x18b/0x290 [sunrpc] [<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc] [<ffffffffa02aae14>] __rpc_execute+0x84/0x400 [sunrpc] [<ffffffffa02ac40e>] rpc_execute+0x5e/0xa0 [sunrpc] [<ffffffffa02a2ea0>] rpc_run_task+0x70/0x90 [sunrpc] [<ffffffffa02a2f03>] rpc_call_sync+0x43/0xa0 [sunrpc] [<ffffffffa055284d>] _nfs4_do_set_security_label+0x11d/0x170 [nfsv4] [<ffffffffa0558861>] nfs4_set_security_label.isra.69+0xf1/0x1d0 [nfsv4] [<ffffffff815fca8b>] ? avc_alloc_node+0x24/0x125 [<ffffffff815fcd2f>] ? avc_compute_av+0x1a3/0x1b5 [<ffffffffa055897b>] nfs4_xattr_set_nfs4_label+0x3b/0x50 [nfsv4] [<ffffffff811bc772>] generic_setxattr+0x62/0x80 [<ffffffff811bcfc3>] __vfs_setxattr_noperm+0x63/0x1b0 [<ffffffff811bd1c5>] vfs_setxattr+0xb5/0xc0 [<ffffffff811bd2fe>] setxattr+0x12e/0x1c0 [<ffffffff811a4d22>] ? final_putname+0x22/0x50 [<ffffffff811a4f2b>] ? putname+0x2b/0x40 [<ffffffff811aa1cf>] ? user_path_at_empty+0x5f/0x90 [<ffffffff8119bc29>] ? __sb_start_write+0x49/0x100 [<ffffffff811bd66f>] SyS_lsetxattr+0x8f/0xd0 [<ffffffff8160cf99>] system_call_fastpath+0x16/0x1b Code: 48 8b 02 48 c7 45 c0 00 00 00 00 48 c7 45 c8 00 00 00 00 48 c7 45 d0 00 00 00 00 48 c7 45 d8 00 00 00 00 48 c7 45 e0 00 00 00 00 <48> 8b 00 48 8b 00 48 85 c0 0f 84 ae 00 00 00 48 8b 80 b8 03 00 RIP [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4] RSP <ffff8801595d3888> CR2: 0000000000000000 The problem is that _nfs4_do_set_security_label calls rpc_call_sync() directly which fails to do any setup of the SEQUENCE call. Have it use nfs4_call_sync() instead which does the right thing. While we're at it change the name of "args" to "arg" to better match the pattern in _nfs4_do_setattr. Reported-by: Chao Ye <cye@redhat.com> Cc: David Quigley <dpquigl@davequigley.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org # 3.11+ Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-31nfs: fix inverted test for delegation in nfs4_reclaim_open_stateJeff Layton
commit 6686390bab6a0e0 (NFS: remove incorrect "Lock reclaim failed!" warning.) added a test for a delegation before checking to see if any reclaimed locks failed. The test however is backward and is only doing that check when a delegation is held instead of when one isn't. Cc: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Fixes: 6686390bab6a: NFS: remove incorrect "Lock reclaim failed!" warning. Cc: stable@vger.kernel.org # 3.12 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28Merge branch 'fscache' of ↵Trond Myklebust
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into linux-next Pull fs-cache fixes from David Howells: Can you pull these commits to fix an issue with NFS whereby caching can be enabled on a file that is open for writing by subsequently opening it for reading. This can be made to crash by opening it for writing again if you're quick enough. The gist of the patchset is that the cookie should be acquired at inode creation only and subsequently enabled and disabled as appropriate (which dispenses with the backing objects when they're not needed). The extra synchronisation that NFS does can then be dispensed with as it is thenceforth managed by FS-Cache. Could you send these on to Linus? This likely will need fixing also in CIFS and 9P also once the FS-Cache changes are upstream. AFS and Ceph are probably safe. * 'fscache' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open() FS-Cache: Provide the ability to enable/disable cookies FS-Cache: Add use/unuse/wake cookie wrappers
2013-10-28nfs: use IS_ROOT not DCACHE_DISCONNECTEDJ. Bruce Fields
This check was added by Al Viro with d9e80b7de91db05c1c4d2e5ebbfd70b3b3ba0e0f "nfs d_revalidate() is too trigger-happy with d_drop()", with the explanation that we don't want to remove the root of a disconnected tree, which will still be included on the s_anon list. But DCACHE_DISCONNECTED does *not* actually identify dentries that are disconnected from the dentry tree or hashed on s_anon. IS_ROOT() is the way to do that. Also add a comment from Al's commit to remind us why this check is there. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28nfs: Use PTR_ERR_OR_ZERO in 'nfs/nfs4super.c'Geyslan G. Bem
Use 'PTR_ERR_OR_ZERO()' rather than 'IS_ERR(...) ? PTR_ERR(...) : 0'. Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28nfs: Use PTR_ERR_OR_ZERO in 'nfs41_callback_up' functionGeyslan G. Bem
Use 'PTR_ERR_OR_ZERO()' rather than 'IS_ERR(...) ? PTR_ERR(...) : 0'. Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28nfs: Remove useless 'error' assignmentGeyslan G. Bem
the 'error' variable was been assigned twice in vain. Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28NFS: add support for multiple sec= mount optionsWeston Andros Adamson
This patch adds support for multiple security options which can be specified using a colon-delimited list of security flavors (the same syntax as nfsd's exports file). This is useful, for instance, when NFSv4.x mounts cross SECINFO boundaries. With this patch a user can use "sec=krb5i,krb5p" to mount a remote filesystem using krb5i, but can still cross into krb5p-only exports. New mounts will try all security options before failing. NFSv4.x SECINFO results will be compared against the sec= flavors to find the first flavor in both lists or if no match is found will return -EPERM. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28NFS: stop using NFS_MOUNT_SECFLAVOUR server flagWeston Andros Adamson
Since the parsed sec= flavor is now stored in nfs_server->auth_info, we no longer need an nfs_server flag to determine if a sec= option was used. This flag has not been completely removed because it is still needed for the (old but still supported) non-text parsed mount options ABI compatability. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28NFS: cache parsed auth_info in nfs_serverWeston Andros Adamson
Cache the auth_info structure in nfs_server and pass these values to submounts. This lays the groundwork for supporting multiple sec= options. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28NFS: separate passed security flavs from selectedWeston Andros Adamson
When filling parsed_mount_data, store the parsed sec= mount option in the new struct nfs_auth_info and the chosen flavor in selected_flavor. This patch lays the groundwork for supporting multiple sec= options. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28NFSv4: make nfs_find_best_sec staticWeston Andros Adamson
It's not used outside of nfs4namespace.c anymore. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28NFS: Fix possible endless state recovery waitChuck Lever
In nfs4_wait_clnt_recover(), hold a reference to the clp being waited on. The state manager can reduce clp->cl_count to 1, in which case the nfs_put_client() in nfs4_run_state_manager() can free *clp before wait_on_bit() returns and allows nfs4_wait_clnt_recover() to run again. The behavior at that point is non-deterministic. If the waited-on bit still happens to be zero, wait_on_bit() will wake the waiter as expected. If the bit is set again (say, if the memory was poisoned when freed) wait_on_bit() can leave the waiter asleep. This is a narrow fix which ensures the safety of accessing *clp in nfs4_wait_clnt_recover(), but does not address the continued use of a possibly freed *clp after nfs4_wait_clnt_recover() returns (see nfs_end_delegation_return(), for example). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>