Age | Commit message (Collapse) | Author |
|
commit d8bdc59f215e62098bc5b4256fd9928bf27053a1 upstream.
Rather than pass in some random truncated offset to the pid-related
functions, check that the offset is in range up-front.
This is just cleanup, the previous commit fixed the real problem.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 4471a675dfc7ca676c165079e91c712b09dc9ce4 upstream.
5520e89 ("brk: fix min_brk lower bound computation for COMPAT_BRK")
tried to get the whole logic of brk randomization for legacy
(libc5-based) applications finally right.
It turns out that the way to detect whether brk has actually been
randomized in the end or not introduced by that patch still doesn't work
for those binaries, as reported by Geert:
: /sbin/init from my old m68k ramdisk exists prematurely.
:
: Before the patch:
:
: | brk(0x80005c8e) = 0x80006000
:
: After the patch:
:
: | brk(0x80005c8e) = 0x80005c8e
:
: Old libc5 considers brk() to have failed if the return value is not
: identical to the requested value.
I don't like it, but currently see no better option than a bit flag in
task_struct to catch the CONFIG_COMPAT_BRK && randomize_va_space == 2
case.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c1530019e311c91d14b24d8e74d233152d806e45 upstream.
During RCU walk in path_lookupat and path_openat, the rcu lookup
frequently failed if looking up an absolute path, because when root
directory was looked up, seq number was not properly set in nameidata.
We dropped out of RCU walk in nameidata_drop_rcu due to mismatch in
directory entry's seq number. We reverted to slow path walk that need
to take references.
With the following patch, I saw a 50% increase in an exim mail server
benchmark throughput on a 4-socket Nehalem-EX system.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 78530bf7f2559b317c04991b52217c1608d5a58d upstream.
This patch fixes severe UBIFS bug: UBIFS oopses when we 'fsync()' an
file on R/O-mounter file-system. We (the UBIFS authors) incorrectly
thought that VFS would not propagate 'fsync()' down to the file-system
if it is read-only, but this is not the case.
It is easy to exploit this bug using the following simple perl script:
use strict;
use File::Sync qw(fsync sync);
die "File path is not specified" if not defined $ARGV[0];
my $path = $ARGV[0];
open FILE, "<", "$path" or die "Cannot open $path: $!";
fsync(\*FILE) or die "cannot fsync $path: $!";
close FILE or die "Cannot close $path: $!";
Thanks to Reuben Dowle <Reuben.Dowle@navico.com> for reporting about this
issue.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reported-by: Reuben Dowle <Reuben.Dowle@navico.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit b836aec53e2bce71de1d5415313380688c851477 upstream.
On no-mmu arch, there is a memleak during shmem test. The cause of this
memleak is ramfs_nommu_expand_for_mapping() added page refcount to 2
which makes iput() can't free that pages.
The simple test file is like this:
int main(void)
{
int i;
key_t k = ftok("/etc", 42);
for ( i=0; i<100; ++i) {
int id = shmget(k, 10000, 0644|IPC_CREAT);
if (id == -1) {
printf("shmget error\n");
}
if(shmctl(id, IPC_RMID, NULL ) == -1) {
printf("shm rm error\n");
return -1;
}
}
printf("run ok...\n");
return 0;
}
And the result:
root:/> free
total used free shared buffers
Mem: 60320 17912 42408 0 0
-/+ buffers: 17912 42408
root:/> shmem
run ok...
root:/> free
total used free shared buffers
Mem: 60320 19096 41224 0 0
-/+ buffers: 19096 41224
root:/> shmem
run ok...
root:/> free
total used free shared buffers
Mem: 60320 20296 40024 0 0
-/+ buffers: 20296 40024
...
After this patch the test result is:(no memleak anymore)
root:/> free
total used free shared buffers
Mem: 60320 16668 43652 0 0
-/+ buffers: 16668 43652
root:/> shmem
run ok...
root:/> free
total used free shared buffers
Mem: 60320 16668 43652 0 0
-/+ buffers: 16668 43652
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c88ac00c5af70c2a0741da14b22cdcf8507ddd92 upstream.
This patch fixes UBIFS assertion warnings like:
UBIFS assert failed in ubifs_leb_unmap at 135 (pid 29365)
Pid: 29365, comm: integck Tainted: G I 2.6.37-ubi-2.6+ #34
Call Trace:
[<ffffffffa047c663>] ubifs_lpt_init+0x95e/0x9ee [ubifs]
[<ffffffffa04623a7>] ubifs_remount_fs+0x2c7/0x762 [ubifs]
[<ffffffff810f066e>] do_remount_sb+0xb6/0x101
[<ffffffff81106ff4>] ? do_mount+0x191/0x78e
[<ffffffff811070bb>] do_mount+0x258/0x78e
[<ffffffff810da1e8>] ? alloc_pages_current+0xa2/0xc5
[<ffffffff81107674>] sys_mount+0x83/0xbd
[<ffffffff81009a12>] system_call_fastpath+0x16/0x1b
They happen when we re-mount from R/O mode to R/W mode. While
re-mounting, we write to the media, but we still have the c->ro_mount
flag set. The fix is very simple - just clear the flag before
starting re-mounting R/W.
These warnings are caused by the following commit:
2ef13294d29bcfb306e0d360f1b97f37b647b0c0
For -stable guys: this bug was introduced in 2.6.38, this is materieal
for 2.6.38-stable.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 8c559d30b4e59cf6994215ada1fe744928f494bf upstream.
Don't allow everybody to dump sensitive information about filesystems.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 157c249114508aa71daa308a426e15d81a4eed00 upstream.
While testing my patchset to fix asynchronous writes, I hit a bunch
of signature problems when testing with signing on. The problem seems
to be that signature checks on receive can be running at the same
time as a process that is sending, or even that multiple receives can
be checking signatures at the same time, clobbering the same data
structures.
While we're at it, clean up the comments over cifs_calculate_signature
and add a note that the srv_mutex should be held when calling this
function.
This patch seems to fix the problems for me, but I'm not clear on
whether it's the best approach. If it is, then this should probably
go to stable too.
Cc: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 2b6c26a0a62cc0bab0ad487533d5581d7c293fef upstream.
Commit 522440ed made cifs set backing_dev_info on the mapping attached
to new inodes. This change caused a fairly significant read performance
regression, as cifs started doing page-sized reads exclusively.
By virtue of the fact that they're allocated as part of cifs_sb_info by
kzalloc, the ra_pages on cifs BDIs get set to 0, which prevents any
readahead. This forces the normal read codepaths to use readpage instead
of readpages causing a four-fold increase in the number of read calls
with the default rsize.
Fix it by setting ra_pages in the BDI to the same value as that in the
default_backing_dev_info.
Fixes https://bugzilla.kernel.org/show_bug.cgi?id=31662
Reported-and-Tested-by: Till <till2.schaefer@uni-dortmund.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 7797069305d13252fd66cf722aa8f2cbeb3c95cd upstream.
cifs_close doesn't check that the filp->private_data is non-NULL before
trying to put it. That can cause an oops in certain error conditions
that can occur on open or lookup before the private_data is set.
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 70945643722ffeac779d2529a348f99567fa5c33 upstream.
Currently, we skip doing the is_path_accessible check in cifs_mount if
there is no prefixpath. I have a report of at least one server however
that allows a TREE_CONNECT to a share that has a DFS referral at its
root. The reporter in this case was using a UNC that had no prefixpath,
so the is_path_accessible check was not triggered and the box later hit
a BUG() because we were chasing a DFS referral on the root dentry for
the mount.
This patch fixes this by removing the check for a zero-length
prefixpath. That should make the is_path_accessible check be done in
this situation and should allow the client to chase the DFS referral at
mount time instead.
Reported-and-Tested-by: Yogesh Sharma <ysharma@cymer.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 23fcf2ec93fb8573a653408316af599939ff9a8e upstream.
Lock stateid's can have access_bmap 0 if they were only partially
initialized (due to a failed lock request); handle that case in
free_generic_stateid.
------------[ cut here ]------------
kernel BUG at fs/nfsd/nfs4state.c:380!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
Modules linked in: nfs fscache md4 nls_utf8 cifs ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nfsd lockd nfs_acl auth_rpcgss sunrpc ipv6 ppdev parport_pc parport pcnet32 mii pcspkr microcode i2c_piix4 BusLogic floppy [last unloaded: mperf]
Pid: 1468, comm: nfsd Not tainted 2.6.38+ #120 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
EIP: 0060:[<e24f180d>] EFLAGS: 00010297 CPU: 0
EIP is at nfs4_access_to_omode+0x1c/0x29 [nfsd]
EAX: ffffffff EBX: dd758120 ECX: 00000000 EDX: 00000004
ESI: dd758120 EDI: ddfe657c EBP: dd54dde0 ESP: dd54dde0
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process nfsd (pid: 1468, ti=dd54c000 task=ddc92580 task.ti=dd54c000)
Stack:
dd54ddf0 e24f19ca 00000000 ddfe6560 dd54de08 e24f1a5d dd758130 deee3a20
ddfe6560 31270000 dd54df1c e24f52fd 0000000f dd758090 e2505dd0 0be304cf
dbb51d68 0000000e ddfe657c ddcd8020 dd758130 dd758128 dd7580d8 dd54de68
Call Trace:
[<e24f19ca>] free_generic_stateid+0x1c/0x3e [nfsd]
[<e24f1a5d>] release_lockowner+0x71/0x8a [nfsd]
[<e24f52fd>] nfsd4_lock+0x617/0x66c [nfsd]
[<e24e57b6>] ? nfsd_setuser+0x199/0x1bb [nfsd]
[<e24e056c>] ? nfsd_setuser_and_check_port+0x65/0x81 [nfsd]
[<c07a0052>] ? _cond_resched+0x8/0x1c
[<c04ca61f>] ? slab_pre_alloc_hook.clone.33+0x23/0x27
[<c04cac01>] ? kmem_cache_alloc+0x1a/0xd2
[<c04835a0>] ? __call_rcu+0xd7/0xdd
[<e24e0dfb>] ? fh_verify+0x401/0x452 [nfsd]
[<e24f0b61>] ? nfsd4_encode_operation+0x52/0x117 [nfsd]
[<e24ea0d7>] ? nfsd4_putfh+0x33/0x3b [nfsd]
[<e24f4ce6>] ? nfsd4_delegreturn+0xd4/0xd4 [nfsd]
[<e24ea2c9>] nfsd4_proc_compound+0x1ea/0x33e [nfsd]
[<e24de6ee>] nfsd_dispatch+0xd1/0x1a5 [nfsd]
[<e1d6e1c7>] svc_process_common+0x282/0x46f [sunrpc]
[<e1d6e578>] svc_process+0xdc/0xfa [sunrpc]
[<e24de0fa>] nfsd+0xd6/0x115 [nfsd]
[<e24de024>] ? nfsd_shutdown+0x24/0x24 [nfsd]
[<c0454322>] kthread+0x62/0x67
[<c04542c0>] ? kthread_worker_fn+0x114/0x114
[<c07a6ebe>] kernel_thread_helper+0x6/0x10
Code: eb 05 b8 00 00 27 4f 8d 65 f4 5b 5e 5f 5d c3 83 e0 03 55 83 f8 02 89 e5 74 17 83 f8 03 74 05 48 75 09 eb 09 b8 02 00 00 00 eb 0b <0f> 0b 31 c0 eb 05 b8 01 00 00 00 5d c3 55 89 e5 57 56 89 d6 8d
EIP: [<e24f180d>] nfs4_access_to_omode+0x1c/0x29 [nfsd] SS:ESP 0068:dd54dde0
---[ end trace 2b0bf6c6557cb284 ]---
The trace route is:
-> nfsd4_lock()
-> if (lock->lk_is_new) {
-> alloc_init_lock_stateid()
3739: stp->st_access_bmap = 0;
->if (status && lock->lk_is_new && lock_sop)
-> release_lockowner()
-> free_generic_stateid()
-> nfs4_access_bmap_to_omode()
-> nfs4_access_to_omode()
380: BUG(); *****
This problem was introduced by 0997b173609b9229ece28941c118a2a9b278796e.
Reported-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Tested-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 954032d2527f2fce7355ba70709b5e143d6b686f upstream.
This was noticed by users who performed more than 2^32 lock operations
and hence made this counter overflow (eventually leading to
use-after-free's). Setting rq_client to NULL here means that it won't
later get auth_domain_put() when it should be.
Appears to have been introduced in 2.5.42 by "[PATCH] kNFSd: Move auth
domain lookup into svcauth" which moved most of the rq_client handling
to common svcauth code, but left behind this one line.
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 5b41395fcc0265fc9f193aef9df39ce49d64677c upstream.
When writing a contiguous set of blocks, two indirect blocks could be
needed depending on how the blocks are aligned, so we need to increase
the number of credits needed by one.
[ Also fixed a another bug which could further underestimate the
number of journal credits needed by 1; the code was using integer
division instead of DIV_ROUND_UP() -- tytso]
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 46e4690bbd9a4f8d9e7c4f34e34b48f703ad47e0 upstream.
In ext4_register_li_request, we malloc a ext4_li_request and
inserts it into ext4_li_info->li_request_list. In case of any
error later, we free it in the end. But if we have some error
in ext4_run_lazyinit_thread, the whole li_request_list will be
dropped and freed in it. So we will double free this ext4_li_request.
This patch just sets elr to NULL after it is inserted to the list
so that the latter kfree won't double free it.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 44cff8a9ee8a974f9e931df910688e7fc1f0b0f9 upstream.
Handle the rare case where a directory metadata block is uncompressed and
corrupted, leading to a kernel oops in directory scanning (memcpy).
Normally corruption is detected at the decompression stage and dealt with
then, however, this will not happen if:
- metadata isn't compressed (users can optionally request no metadata
compression), or
- the compressed metadata block was larger than the original, in which
case the uncompressed version was used, or
- the data was corrupt after decompression
This patch fixes this by adding some sanity checks against known maximum
values.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 117a91e0f25fd7698e20ac3dfa62086be3dc82a3 upstream.
Bugzilla bug 31422 reports occasional "page allocation failure. order:4"
at Squashfs mount time. Fix this by making zlib workspace allocation
use vmalloc rather than kmalloc.
Reported-by: Mehmet Giritli <mehmet@giritli.eu>
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 272b62c1f0f6f742046e45b50b6fec98860208a0 upstream.
When a hole spans across page boundaries, the next write forces
a read of the block. This could end up reading existing garbage
data from the disk in ocfs2_map_page_blocks. This leads to
non-zero holes. In order to avoid this, mark the writes as new
when the holes span across page boundaries.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
Signed-off-by: jlbec <jlbec@evilplan.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit b03f24567ce7caf2420b8be4c6eb74c191d59a91 upstream.
There's no reason to write quota info in dquot_commit(). The writing is a
relict from the old days when we didn't have dquot_acquire() and
dquot_release() and thus dquot_commit() could have created / removed quota
structures from the file. These days dquot_commit() only updates usage counters
/ limits in quota structure and thus there's no need to write quota info.
This also fixes an issue with journaling filesystem which didn't reserve
enough space in the transaction for write of quota info (it could have been
dirty at the time of dquot_commit() because of a race with other operation
changing it).
Reported-and-tested-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 7da6443aca9be29c6948dcbd636ad50154d0bc0c upstream.
This patch fixes a debugging failure with which looks like this:
UBIFS error (pid 32313): dbg_check_space_info: free space changed from 6019344 to 6022654
The reason for this failure is described in the comment this patch adds
to the code. But in short - 'c->freeable_cnt' may be different before
and after re-mounting, and this is normal. So the debugging code should
make sure that free space calculations do not depend on 'c->freeable_cnt'.
A similar issue has been reported here:
http://lists.infradead.org/pipermail/linux-mtd/2011-April/034647.html
This patch should fix it.
For the -stable guys: this patch is only relevant for kernels 2.6.30
onwards.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 54acbaaa523ca0bd284a18f67ad213c379679e86 upstream.
Thanks to coverity which spotted that UBIFS will oops if 'kmalloc()'
in 'read_pnode()' fails and we dereference a NULL 'pnode' pointer
when we 'goto out'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 8b229c76765816796eec7ccd428f03bd8de8b525 upstream.
This fix makes the 'dbg_check_old_index()' function return
immediately if debugging is disabled, instead of executing
incorrect 'goto out' which causes UBIFS to:
1. Allocate memory
2. Read the flash
On every commit. OK, we do not commit that often, but it is
still silly to do unneeded I/O anyway.
Credits to coverity for spotting this silly issue.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 08fe4db170b4193603d9d31f40ebaf652d07ac9c upstream.
root_item->flags and root_item->byte_limit are not initialized when
a subvolume is created. This bug is not revealed until we added
readonly snapshot support - now you mount a btrfs filesystem and you
may find the subvolumes in it are readonly.
To work around this problem, we steal a bit from root_item->inode_item->flags,
and use it to indicate if those fields have been properly initialized.
When we read a tree root from disk, we check if the bit is set, and if
not we'll set the flag and initialize the two fields of the root item.
Reported-by: Andreas Philipp <philipp.andreas@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Tested-by: Andreas Philipp <philipp.andreas@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit d0de4dc584ec6aa3b26fffea320a8457827768fc upstream.
On an error path in inotify_init1 a normal user can trigger a double
free of struct user. This is a regression introduced by a2ae4cc9a16e
("inotify: stop kernel memory leak on file creation failure").
We fix this by making sure that if a group exists the user reference is
dropped when the group is cleaned up. We should not explictly drop the
reference on error and also drop the reference when the group is cleaned
up.
The new lifetime rules are that an inotify group lives from
inotify_new_group to the last fsnotify_put_group. Since the struct user
and inotify_devs are directly tied to this lifetime they are only
changed/updated in those two locations. We get rid of all special
casing of struct user or user->inotify_devs.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 34094537943113467faee98fe67c8a3d3f9a0a8b upstream.
From the result of a function test of mmap, mmap write to shared pages
turned out to be broken for hole blocks. It doesn't write out filled
blocks and the data will be lost after umount. This is due to a bug
that the target file is not queued for log writer when filling hole
blocks.
Also, nilfs_page_mkwrite function exits normal code path even after
successfully filled hole blocks due to a change of block_page_mkwrite
function; just after nilfs was merged into the mainline,
block_page_mkwrite() started to return VM_FAULT_LOCKED instead of zero
by the patch "mm: close page_mkwrite races" (commit:
b827e496c893de0c). The current nilfs_page_mkwrite() is not handling
this value properly.
This corrects nilfs_page_mkwrite() and will resolve the data loss
problem in mmap write.
[This should be applied to every kernel since 2.6.30 but a fix is
needed for 2.6.37 and prior kernels]
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 704b2907c2d47ceb187c0e25a6bbc2174b198f2f upstream.
During mount, we can do a quotacheck that involves a bulkstat pass
on all inodes. If there are more inodes in the filesystem than can
be held in memory, we require the inode cache shrinker to run to
ensure that we don't run out of memory.
Unfortunately, the inode cache shrinker is not registered until we
get to the end of the superblock setup process, which is after a
quotacheck is run if it is needed. Hence we need to register the
inode cache shrinker earlier in the mount process so that we don't
OOM during mount. This requires that we also initialise the syncd
work before we register the shrinker, so we nee dto juggle that
around as well.
While there, make sure that we have set up the block sizes in the
VFS superblock correctly before the quotacheck is run so that any
inodes that are cached as a result of the quotacheck have their
block size fields set up correctly.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 1821df040ac3cd6a57518739f345da6d50ea9d3f upstream.
The pointer '(*auth_tok_key)' is set to NULL in case request_key()
fails, in order to prevent its use by functions calling
ecryptfs_keyring_auth_tok_for_sig().
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 50f198ae16ac66508d4b8d5a40967a8507ad19ee upstream.
Unlock the page in error path of ecryptfs_write_begin(). This may
happen, for example, if decryption fails while bring the page
up-to-date.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit d39195c33bb1b5fdcb0f416e8a0b34bfdb07a027 upstream.
Orphan cleanup is currently executed even if the file system has some
number of unknown ROCOMPAT features, which deletes inodes and frees
blocks, which could be very bad for some RO_COMPAT features,
especially the SNAPSHOT feature.
This patch skips the orphan cleanup if it contains readonly compatible
features not known by this ext4 implementation, which would prevent
the fs from being mounted (or remounted) readwrite.
Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 24ff6663ccfdaf088dfa7acae489cb11ed4f43c4 upstream.
While trying to track down some NFS problems with BTRFS, I kept noticing I was
getting -EACCESS for no apparent reason. Eric Paris and printk() helped me
figure out that it was SELinux that was giving me grief, with the following
denial
type=AVC msg=audit(1290013638.413:95): avc: denied { 0x800000 } for pid=1772
comm="nfsd" name="" dev=sda1 ino=256 scontext=system_u:system_r:kernel_t:s0
tcontext=system_u:object_r:unlabeled_t:s0 tclass=file
Turns out this is because in d_obtain_alias if we can't find an alias we create
one and do all the normal instantiation stuff, but we don't do the
security_d_instantiate.
Usually we are protected from getting a hashed dentry that hasn't yet run
security_d_instantiate() by the parent's i_mutex, but obviously this isn't an
option there, so in order to deal with the case that a second thread comes in
and finds our new dentry before we get to run security_d_instantiate(), we go
ahead and call it if we find a dentry already. Eric assures me that this is ok
as the code checks to see if the dentry has been initialized already so calling
security_d_instantiate() against the same dentry multiple times is ok. With
this patch I'm no longer getting errant -EACCESS values.
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit b8413f98f997bb3ed7327e6d7117e7e91ce010c3 upstream.
When one of the two waits in nfs_commit_inode() is interrupted, it
returns a non-negative value, which causes nfs_wb_page() to think
that the operation was successful causing it to busy-loop rather
than exiting.
It also causes nfs_file_fsync() to incorrectly report the file as
being successfully committed to disk.
This patch fixes both problems by ensuring that we return an error
if the attempts to wait fail.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 95f28604a65b1c40b6c6cd95e58439cd7ded3add upstream.
We don't have proper reference counting for this yet, so we run into
cases where the device is pulled and we OOPS on flushing the fs data.
This happens even though the dirty inodes have already been
migrated to the default_backing_dev_info.
Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 27cb1572e3e6bb1f8cf6bb3d74c914a87b131792 upstream.
Don't hold vfsmount_lock over the loop traversing ->mnt_parent;
do check_mnt(new.mnt) under namespace_sem instead; combined with
namespace_sem held over all that code it'll guarantee the stability
of ->mnt_parent chain all the way to the root.
Doing check_mnt() outside of namespace_sem in case of pivot_root()
is wrong anyway.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 5a02ab7c3c4580f94d13c683721039855b67cda6 upstream.
We must not use dummy for index.
After the first index, READ32(dummy) will change dummy!!!!
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
[bfields@redhat.com: Trond points out READ_BUF alone is sufficient.]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0997b173609b9229ece28941c118a2a9b278796e upstream.
Make sure we properly reference count the struct files that a lock
depends on, and release them when the lock stateid is released.
This fixes a major leak of struct files when using locking over nfsv4.
Reported-by: Rick Koshi <nfs-bug-report@more-right-rudder.com>
Tested-by: Ivo Přikryl <prikryl@eurosat.cz>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 529d7b2a7fa31e9f7d08bc790d232c3cbe64fa24 upstream.
Minor cleanup in preparation for a bugfix--moving some code to avoid
forward references, etc. No change in functionality.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 5ece3cafbd88d4da5c734e1810c4a2e6474b57b2 upstream.
The members of nfsd4_op_flags, (ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS)
equals to ALLOWED_AS_FIRST_OP, maybe that's not what we want.
OP_PUTROOTFH with op_flags = ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS,
can't appears as the first operation with out SEQUENCE ops.
This patch modify the wrong value of ALLOWED_WITHOUT_FH etc which
was introduced by f9bb94c4.
Reviewed-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 5883f57ca0008ffc93e09cbb9847a1928e50c6f3 upstream.
While mm->start_stack was protected from cross-uid viewing (commit
f83ce3e6b02d5 ("proc: avoid information leaks to non-privileged
processes")), the start_code and end_code values were not. This would
allow the text location of a PIE binary to leak, defeating ASLR.
Note that the value "1" is used instead of "0" for a protected value since
"ps", "killall", and likely other readers of /proc/pid/stat, take
start_code of "0" to mean a kernel thread and will misbehave. Thanks to
Brad Spengler for pointing this out.
Addresses CVE-2011-0726
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Eugene Teo <eugeneteo@kernel.sg>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0db0c01b53a1a421513f91573241aabafb87802a upstream.
The current code fails to print the "[heap]" marking if the heap is split
into multiple mappings.
Fix the check so that the marking is displayed in all possible cases:
1. vma matches exactly the heap
2. the heap vma is merged e.g. with bss
3. the heap vma is splitted e.g. due to locked pages
Test cases. In all cases, the process should have mapping(s) with
[heap] marking:
(1) vma matches exactly the heap
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
int main (void)
{
if (sbrk(4096) != (void *)-1) {
printf("check /proc/%d/maps\n", (int)getpid());
while (1)
sleep(1);
}
return 0;
}
# ./test1
check /proc/553/maps
[1] + Stopped ./test1
# cat /proc/553/maps | head -4
00008000-00009000 r-xp 00000000 01:00 3113640 /test1
00010000-00011000 rw-p 00000000 01:00 3113640 /test1
00011000-00012000 rw-p 00000000 00:00 0 [heap]
4006f000-40070000 rw-p 00000000 00:00 0
(2) the heap vma is merged
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
char foo[4096] = "foo";
char bar[4096];
int main (void)
{
if (sbrk(4096) != (void *)-1) {
printf("check /proc/%d/maps\n", (int)getpid());
while (1)
sleep(1);
}
return 0;
}
# ./test2
check /proc/556/maps
[2] + Stopped ./test2
# cat /proc/556/maps | head -4
00008000-00009000 r-xp 00000000 01:00 3116312 /test2
00010000-00012000 rw-p 00000000 01:00 3116312 /test2
00012000-00014000 rw-p 00000000 00:00 0 [heap]
4004a000-4004b000 rw-p 00000000 00:00 0
(3) the heap vma is splitted (this fails without the patch)
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
int main (void)
{
if ((sbrk(4096) != (void *)-1) && !mlockall(MCL_FUTURE) &&
(sbrk(4096) != (void *)-1)) {
printf("check /proc/%d/maps\n", (int)getpid());
while (1)
sleep(1);
}
return 0;
}
# ./test3
check /proc/559/maps
[1] + Stopped ./test3
# cat /proc/559/maps|head -4
00008000-00009000 r-xp 00000000 01:00 3119108 /test3
00010000-00011000 rw-p 00000000 01:00 3119108 /test3
00011000-00012000 rw-p 00000000 00:00 0 [heap]
00012000-00013000 rw-p 00000000 00:00 0 [heap]
It looks like the bug has been there forever, and since it only results in
some information missing from a procfile, it does not fulfil the -stable
"critical issue" criteria.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit ce654b37f87980d95f339080e4c3bdb2370bdf22 upstream.
Orphan cleanup is currently executed even if the file system has some
number of unknown ROCOMPAT features, which deletes inodes and frees
blocks, which could be very bad for some RO_COMPAT features.
This patch skips the orphan cleanup if it contains readonly compatible
features not known by this ext3 implementation, which would prevent
the fs from being mounted (or remounted) readwrite.
Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit e91f90bb0bb10be9cc8efd09a3cf4ecffcad0db1 upstream.
The test program below will hang because io_getevents() uses
add_wait_queue_exclusive(), which means the wake_up() in io_destroy() only
wakes up one of the threads. Fix this by using wake_up_all() in the aio
code paths where we want to make sure no one gets stuck.
// t.c -- compile with gcc -lpthread -laio t.c
#include <libaio.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
static const int nthr = 2;
void *getev(void *ctx)
{
struct io_event ev;
io_getevents(ctx, 1, 1, &ev, NULL);
printf("io_getevents returned\n");
return NULL;
}
int main(int argc, char *argv[])
{
io_context_t ctx = 0;
pthread_t thread[nthr];
int i;
io_setup(1024, &ctx);
for (i = 0; i < nthr; ++i)
pthread_create(&thread[i], NULL, getev, ctx);
sleep(1);
io_destroy(ctx);
for (i = 0; i < nthr; ++i)
pthread_join(thread[i], NULL);
return 0;
}
Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit d7433142b63d727b5a217c37b1a1468b116a9771 upstream.
(crossport of 1f7bebb9e911d870fa8f997ddff838e82b5715ea
by Andreas Schlick <schlick@lavabit.com>)
When ext3_dx_add_entry() has to split an index node, it has to ensure that
name_len of dx_node's fake_dirent is also zero, because otherwise e2fsck
won't recognise it as an intermediate htree node and consider the htree to
be corrupted.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 34d211a2d5df4984a35b18d8ccacbe1d10abb067 upstream.
It turns out that while a maximum of 8 partitions may be what people
"should" have had, you can actually fit up to 18 entries(*) in a sector.
And some people clearly were taking advantage of that, like Michael
Cree, who had ten partitions on one of his OSF disks.
(*) The OSF partition data starts at byte offset 64 in the first sector,
and the array of 16-byte partition entries start at offset 148 in
the on-disk partition structure.
Reported-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c83ce989cb5ff86575821992ea82c4df5c388ebc upstream.
The new vfs locking scheme introduced in 2.6.38 breaks NFS sillyrename
because the latter relies on being able to determine the parent
directory of the dentry in the ->iput() callback in order to send the
appropriate unlink rpc call.
Looking at the code that cares about races with dput(), there doesn't
seem to be anything that specifically uses d_parent as a test for
whether or not there is a race:
- __d_lookup_rcu(), __d_lookup() all test for d_hashed() after d_parent
- shrink_dcache_for_umount() is safe since nothing else can rearrange
the dentries in that super block.
- have_submount(), select_parent() and d_genocide() can test for a
deletion if we set the DCACHE_DISCONNECTED flag when the dentry
is removed from the parent's d_subdirs list.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c826cb7dfce80512c26c984350077a25046bd215 upstream.
This creates a helper function for he "try to ascend into the parent
directory" case, which was written out in triplicate before. With all
the locking and subtle sequence number stuff, we really don't want to
duplicate that kind of code.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: NFSROOT should default to "proto=udp"
nfs4: remove duplicated #include
NFSv4: nfs4_state_mark_reclaim_nograce() should be static
NFSv4: Fix the setlk error handler
NFSv4.1: Fix the handling of the SEQUENCE status bits
NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses
NFSv4.1 reclaim complete must wait for completion
NFSv4: remove duplicate clientid in struct nfs_client
NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY
sunrpc: Propagate errors from xs_bind() through xs_create_sock()
(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid
nfs: fix compilation warning
nfs: add kmalloc return value check in decode_and_add_ds
SUNRPC: Remove resource leak in svc_rdma_send_error()
nfs: close NFSv4 COMMIT vs. CLOSE race
SUNRPC: Close a race in __rpc_wait_for_completion_task()
|
|
The kernel automatically evaluates partition tables of storage devices.
The code for evaluating OSF partitions contains a bug that leaks data
from kernel heap memory to userspace for certain corrupted OSF
partitions.
In more detail:
for (i = 0 ; i < le16_to_cpu(label->d_npartitions); i++, partition++) {
iterates from 0 to d_npartitions - 1, where d_npartitions is read from
the partition table without validation and partition is a pointer to an
array of at most 8 d_partitions.
Add the proper and obvious validation.
Signed-off-by: Timo Warns <warns@pre-sense.de>
Cc: stable@kernel.org
[ Changed the patch trivially to not repeat the whole le16_to_cpu()
thing, and to use an explicit constant for the magic value '8' ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix for a dumb preadv()/pwritev() compat bug - unlike the native
variants, the compat_... ones forget to check FMODE_P{READ,WRITE}, so
e.g. on pipe the native preadv() will fail with -ESPIPE and compat one
will act as readv() and succeed.
Not critical, but it's a clear bug with trivial fix, so IMO it's OK for
-final.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: break out of shrink_delalloc earlier
btrfs: fix not enough reserved space
btrfs: fix dip leak
Btrfs: make sure not to return overlapping extents to fiemap
Btrfs: deal with short returns from copy_from_user
Btrfs: fix regressions in copy_from_user handling
|
|
Josef had changed shrink_delalloc to exit after three shrink
attempts, which wasn't quite enough because new writers could
race in and steal free space.
But it also fixed deadlocks and stalls as we tried to recover
delalloc reservations. The code was tweaked to loop 1024
times, and would reset the counter any time a small amount
of progress was made. This was too drastic, and with a
lot of writers we can end up stuck in shrink_delalloc forever.
The shrink_delalloc loop is fairly complex because the caller is looping
too, and the caller will go ahead and force a transaction commit to make
sure we reclaim space.
This reworks things to exit shrink_delalloc when we've forced some
writeback and the delalloc reservations have gone down. This means
the writeback has not just started but has also finished at
least some of the metadata changes required to reclaim delalloc
space.
If we've got this wrong, we're returning ENOSPC too early, which
is a big improvement over the current behavior of hanging the machine.
Test 224 in xfstests hammers on this nicely, and with 1000 writers
trying to fill a 1GB drive we get our first ENOSPC at 93% full. The
other writers are able to continue until we get 100%.
This is a worst case test for btrfs because the 1000 writers are doing
small IO, and the small FS size means we don't have a lot of room
for metadata chunks.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|