Age | Commit message (Collapse) | Author |
|
commit f7b43d0c992c3ec3e8d9285c3fb5e1e0eb0d031a upstream.
As of 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low
on space", we permit the server to process a LOCK operation even if
there might not be space to return the conflicting lockowner, because
we've made returning the conflicting lockowner optional.
However, the rpc server still wants to know the most we might possibly
return, so we need to take into account the possible conflicting
lockowner in the svc_reserve_space() call here.
Symptoms were log messages like "RPC request reserved 88 but used 108".
Fixes: 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low on space"
Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 15b23ef5d348ea51c5e7573e2ef4116fbc7cb099 upstream.
The calculation of page_ptr here is wrong in the case the read doesn't
start at an offset that is a multiple of a page.
The result is that nfs4svc_encode_compoundres sets rq_next_page to a
value one too small, and then the loop in svc_free_res_pages may
incorrectly fail to clear a page pointer in rq_respages[].
Pages left in rq_respages[] are available for the next rpc request to
use, so xdr data may be written to that page, which may hold data still
waiting to be transmitted to the client or data in the page cache.
The observed result was silent data corruption seen on an NFSv4 client.
We tag this as "fixing" 05638dc73af2 because that commit exposed this
bug, though the incorrect calculation predates it.
Particular thanks to Andrea Arcangeli and David Gilbert for analysis and
testing.
Fixes: 05638dc73af2 "nfsd4: simplify server xdr->next_page use"
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit aee3776441461c14ba6d8ed9e2149933e65abb6e upstream.
Commit 3b299709091b "nfsd4: enforce rd_dircount" totally misunderstood
rd_dircount; it refers to total non-attribute bytes returned, not number
of directory entries returned.
Bring the code into agreement with RFC 3530 section 14.2.24.
Fixes: 3b299709091b "nfsd4: enforce rd_dircount"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3c45ddf823d679a820adddd53b52c6699c9a05ac upstream.
The current code always selects XPRT_TRANSPORT_BC_TCP for the back
channel, even when the forward channel was not TCP (eg, RDMA). When
a 4.1 mount is attempted with RDMA, the server panics in the TCP BC
code when trying to send CB_NULL.
Instead, construct the transport protocol number from the forward
channel transport or'd with XPRT_TRANSPORT_BC. Transports that do
not support bi-directional RPC will not have registered a "BC"
transport, causing create_backchannel_client() to fail immediately.
Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d9499a95716db0d4bc9b67e88fd162133e7d6b08 upstream.
A memory allocation failure could cause nfsd_startup_generic to fail, in
which case nfsd_users wouldn't be incorrectly left elevated.
After nfsd restarts nfsd_startup_generic will then succeed without doing
anything--the first consequence is likely nfs4_start_net finding a bad
laundry_wq and crashing.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Fixes: 4539f14981ce "nfsd: replace boolean nfsd_up flag by users counter"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Commit 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low
on space" forgot to free conf->data in nfsd4_encode_lockt and before
sign conf->data to NULL in nfsd4_encode_lock_denied, causing a leak.
Worse, kfree() can be called on an uninitialized pointer in the case of
a succesful lock (or one that fails for a reason other than a conflict).
(Note that lock->lk_denied.ld_owner.data appears it should be zero here,
until you notice that it's one arm of a union the other arm of which is
written to in the succesful case by the
memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid,
sizeof(stateid_t));
in nfsd4_lock(). In the 32-bit case this overwrites ld_owner.data.)
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Fixes: 8c7424cff6 ""nfsd4: don't try to encode conflicting owner if low on space"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Introduced by commit 561f0ed498 (nfsd4: allow large readdirs).
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
XDR requires 4-byte alignment; nfs4d READLINK reply writes out the padding,
but truncates the packet to the padding-less size.
Fix by taking the padding into consideration when truncating the packet.
Symptoms:
# ll /mnt/
ls: cannot read symbolic link /mnt/test: Input/output error
total 4
-rw-r--r--. 1 root root 0 Jun 14 01:21 123456
lrwxrwxrwx. 1 root root 6 Jul 2 03:33 test
drwxr-xr-x. 1 root root 0 Jul 2 23:50 tmp
drwxr-xr-x. 1 root root 60 Jul 2 23:44 tree
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Fixes: 476a7b1f4b2c (nfsd4: don't treat readlink like a zero-copy operation)
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
An NFS operation that creates a new symlink includes the symlink data,
which is xdr-encoded as a length followed by the data plus 0 to 3 bytes
of zero-padding as required to reach a 4-byte boundary.
The vfs, on the other hand, wants null-terminated data.
The simple way to handle this would be by copying the data into a newly
allocated buffer with space for the final null.
The current nfsd_symlink code tries to be more clever by skipping that
step in the (likely) case where the byte following the string is already
0.
But that assumes that the byte following the string is ours to look at.
In fact, it might be the first byte of a page that we can't read, or of
some object that another task might modify.
Worse, the NFSv4 code tries to fix the problem by actually writing to
that byte.
In the NFSv2/v3 cases this actually appears to be safe:
- nfs3svc_decode_symlinkargs explicitly null-terminates the data
(after first checking its length and copying it to a new
page).
- NFSv2 limits symlinks to 1k. The buffer holding the rpc
request is always at least a page, and the link data (and
previous fields) have maximum lengths that prevent the request
from reaching the end of a page.
In the NFSv4 case the CREATE op is potentially just one part of a long
compound so can end up on the end of a page if you're unlucky.
The minimal fix here is to copy and null-terminate in the NFSv4 case.
The nfsd_symlink() interface here seems too fragile, though. It should
really either do the copy itself every time or just require a
null-terminated string.
Reported-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Commit 561f0ed498ca (nfsd4: allow large readdirs) introduces a bug
about readdir the root of pseudofs.
Call xdr_truncate_encode() revert encoded name when skipping.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
If nfsd needs to recall a delegation for some reason it implies that there is
contention on the file, so further delegations should not be handed out.
The current code fails to do so, and the result is effectively a
live-lock under some workloads: a client attempting a conflicting
operation on a read-delegated file receives NFS4ERR_DELAY and retries
the operation, but by the time it retries the server may already have
given out another delegation.
We could simply avoid delegations for (say) 30 seconds after any recall, but
this is probably too heavy handed.
We could keep a list of inodes (or inode numbers or filehandles) for recalled
delegations, but that requires memory allocation and searching.
The approach taken here is to use a bloom filter to record the filehandles
which are currently blocked from delegation, and to accept the cost of a few
false positives.
We have 2 bloom filters, each of which is valid for 30 seconds. When a
delegation is recalled the filehandle is added to one filter and will remain
disabled for between 30 and 60 seconds.
We keep a count of the number of filehandles that have been added, so when
that count is zero we can bypass all other tests.
The bloom filters have 256 bits and 3 hash functions. This should allow a
couple of dozen blocked filehandles with minimal false positives. If many
more filehandles are all blocked at once, behaviour will degrade towards
rejecting all delegations for between 30 and 60 seconds, then resetting and
allowing new delegations.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
27b11428b7de ("nfsd4: remove lockowner when removing lock stateid")
introduced a memory leak.
Cc: stable@vger.kernel.org
Reported-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Currently, the DRC cache pruner will stop scanning the list when it
hits an entry that is RC_INPROG. It's possible however for a call to
take a *very* long time. In that case, we don't want it to block other
entries from being pruned if they are expired or we need to trim the
cache to get back under the limit.
Fix the DRC cache pruner to just ignore RC_INPROG entries.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
While we're here, let's kill off a couple of the read-side macros.
Leaving the more complicated ones alone for now.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The rpc code makes available to the NFS server an array of pages to
encod into. The server represents its reply as an xdr buf, with the
head pointing into the first page in that array, the pages ** array
starting just after that, and the tail (if any) sharing any leftover
space in the page used by the head.
While encoding, we use xdr_stream->page_ptr to keep track of which page
we're currently using.
Currently we set xdr_stream->page_ptr to buf->pages, which makes the
head a weird exception to the rule that page_ptr always points to the
page we're currently encoding into. So, instead set it to buf->pages -
1 (the page actually containing the head), and remove the need for a
little unintuitive logic in xdr_get_next_encode_buffer() and
xdr_truncate_encode.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
We don't want the stateid to be found in the hash table before the delegation
is granted.
Currently this is protected by the client_mutex, but we want to break that
up and this is a necessary step toward that goal.
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
...as the name is a bit more descriptive and we've started using it for
other purposes.
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The memset of resp in svc_process_common should ensure that these are
already zeroed by the time they get here.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
In the NFS4_OPEN_CLAIM_PREVIOUS case, we should only mark it confirmed
if the nfs4_check_open_reclaim check succeeds.
In the NFS4_OPEN_CLAIM_DELEG_PREV_FH and NFS4_OPEN_CLAIM_DELEGATE_PREV
cases, I see no point in declaring the openowner confirmed when the
operation is going to fail anyway, and doing so might allow the client
to game things such that it wouldn't need to confirm a subsequent open
with the same owner.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
This fixes a bug in the handling of the fi_delegations list.
nfs4_setlease does not hold the recall_lock when adding to it. The
client_mutex is held, which prevents against concurrent list changes,
but nfsd_break_deleg_cb does not hold while walking it. New delegations
could theoretically creep onto the list while we're walking it there.
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The laundromat uses two variables to calculate when it should next run,
but one is completely ignored at the end of the run. Merge the two and
rename the variable to be more descriptive of what it does.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
sparse says:
CHECK fs/nfsd/nfs4xdr.c
fs/nfsd/nfs4xdr.c:2043:1: warning: symbol 'nfsd4_encode_fattr' was not declared. Should it be static?
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
When debugging, rpc prints messages from dprintk(KERN_WARNING ...)
with "^A4" prefixed,
[ 2780.339988] ^A4nfsd: connect from unprivileged port: 127.0.0.1, port=35316
Trond tells,
> dprintk != printk. We have NEVER supported dprintk(KERN_WARNING...)
This patch removes using of dprintk with KERN_WARNING.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Note nobody's ever noticed because the typical client probably never
requests FILES_AVAIL without also requesting something else on the list.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
ex_nflavors can't be negative number, just defined by uint32_t.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
No need for a typedef wrapper for svc_export or svc_client, remove them.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Commit 49b28684fdba ("nfsd: Remove deprecated nfsctl system call and
related code") removed the only use of ipv6_addr_set_v4mapped(), so
net/ipv6.h is unneeded now.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Commit 8f6c5ffc8987 ("kernel/groups.c: remove return value of
set_groups") removed the last use of "ret".
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
After commit 4c1e1b34d5c8 ("nfsd: Store ex_anon_uid and ex_anon_gid as
kuids and kgids") using kuid/kgid for ex_anon_uid/ex_anon_gid,
user_namespace.h is not needed.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
If fsloc_parse() failed at kzalloc(), fs/nfsd/export.c
411
412 fsloc->locations = kzalloc(fsloc->locations_count
413 * sizeof(struct nfsd4_fs_location), GFP_KERNEL);
414 if (!fsloc->locations)
415 return -ENOMEM;
svc_export_parse() will call nfsd4_fslocs_free() with fsloc->locations = NULL,
so that, "kfree(fsloc->locations[i].path);" will cause a crash.
If fsloc_parse() failed after that, fsloc_parse() will call nfsd4_fslocs_free(),
and svc_export_parse() will call it again, so that, a double free is caused.
This patch checks the fsloc->locations, and set to NULL after it be freed.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
RPC_MAX_AUTH_SIZE is scattered around several places. Better to set it
once in the auth code, where this kind of estimate should be made. And
while we're at it we can leave it zero when we're not using krb5i or
krb5p.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
And switch a couple other functions from the encode(&p,...) convention
to the p = encode(p,...) convention mostly used elsewhere.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
These macros just obscure what's going on. Adopt the convention of the
client-side code.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
encode_getattr, for example, can return nfserr_resource to indicate it
ran out of buffer space. That's not a legal error in the 4.1 case.
And in the 4.1 case, if we ran out of buffer space, we should have
exceeded a session limit too.
(Note in 1bc49d83c37cfaf46be357757e592711e67f9809 "nfsd4: fix
nfs4err_resource in 4.1 case" we originally tried fixing this error
return before fixing the problem that we could error out while we still
had lots of available space. The result was to trade one illegal error
for another in those cases. We decided that was helpful, so reverted
the change in fc208d026be0c7d60db9118583fc62f6ca97743d, and are only
reinstating it now that we've elimited almost all of those cases.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
I'm not sure why a client would want to stuff multiple reads in a
single compound rpc, but it's legal for them to do it, and we should
really support it.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
More cleanup, no change in functionality.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Trivial cleanup, no change in functionality.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The splice and readv cases are actually quite different--for example the
former case ignores the array of vectors we build up for the latter.
It is probably clearer to separate the two cases entirely.
There's some code duplication between the split out encoders, but this
is only temporary and will be fixed by a later patch.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
We currently allow only one read per compound, with operations before
and after whose responses will require no more than about a page to
encode.
While we don't expect clients to violate those limits any time soon,
this limitation isn't really condoned by the spec, so to future proof
the server we should lift the limitation.
At the same time we'd like to continue to support zero-copy reads.
Supporting multiple zero-copy-reads per compound would require a new
data structure to replace struct xdr_buf, which can represent only one
set of included pages.
So for now we plan to modify encode_read() to support either zero-copy
or non-zero-copy reads, and use some heuristics at the start of the
compound processing to decide whether a zero-copy read will work.
This will allow us to support more exotic compounds without introducing
a performance regression in the normal case.
Later patches handle those "exotic compounds", this one just makes sure
zero-copy is turned off in those cases.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Otherwise a following patch would turn off all 4.1 zero-copy reads.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
We plan to use this estimate to decide whether or not to allow zero-copy
reads. Currently we're assuming all getattr's are a page, which can be
both too small (ACLs e.g. may be arbitrarily long) and too large (after
an upcoming read patch this will unnecessarily prevent zero copy reads
in any read compound also containing a getattr).
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|