Age | Commit message (Collapse) | Author |
|
commit 55f9cf6bbaa682958a7dd2755f883b768270c3ce upstream.
The lower filesystem may do some type of inode revalidation during a
getattr call. eCryptfs should take advantage of that by copying the
lower inode attributes to the eCryptfs inode after a call to
vfs_getattr() on the lower inode.
I originally wrote this fix while working on eCryptfs on nfsv3 support,
but discovered it also fixed an eCryptfs on ext4 nanosecond timestamp
bug that was reported.
https://bugs.launchpad.net/bugs/613873
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 5e640927a597a7c3e72b61e8bce74c22e906de65 upstream.
LANMAN response length was changed to 16 bytes instead of 24 bytes.
Revert it back to 24 bytes.
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 9616125611ee47693186533d76e403856a36b3c8 upstream.
The code finds, the '%' sign in an ipv6 address and copies that to a
buffer allocated on the stack. It then ignores that buffer, and passes
'pct' to simple_strtoul(), which doesn't work right because we're
comparing 'endp' against a completely different string.
Fix it by passing the correct pointer. While we're at it, this is a
good candidate for conversion to strict_strtoul as well.
Cc: David Howells <dhowells@redhat.com>
Reported-by: Björn JACKE <bj@sernet.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit fa7ea87a057958a8b7926c1a60a3ca6d696328ed upstream.
Validate number of blocks in map and remove redundant variable.
Signed-off-by: Timo Warns <warns@pre-sense.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 261cd298a8c363d7985e3482946edb4bfedacf98 upstream.
task_show_regs used to be a debugging aid in the early bringup days
of Linux on s390. /proc/<pid>/status is a world readable file, it
is not a good idea to show the registers of a process. The only
correct fix is to remove task_show_regs.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 47c85291d3dd1a51501555000b90f8e281a0458e upstream.
These functions return an nfs status, not a host_err. So don't
try to convert before returning.
This is a regression introduced by
3c726023402a2f3b28f49b9d90ebf9e71151157d; I fixed up two of the callers,
but missed these two.
Reported-by: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 51788b1bdd0d68345bab0af4301e7fa429277228 upstream.
Commit bf5fc093c5b625e4259203f1cee7ca73488a5620 refactored
btrfs_ioctl_space_info() and introduced several security issues.
space_args.space_slots is an unsigned 64-bit type controlled by a
possibly unprivileged caller. The comparison as a signed int type
allows providing values that are treated as negative and cause the
subsequent allocation size calculation to wrap, or be truncated to 0.
By providing a size that's truncated to 0, kmalloc() will return
ZERO_SIZE_PTR. It's also possible to provide a value smaller than the
slot count. The subsequent loop ignores the allocation size when
copying data in, resulting in a heap overflow or write to ZERO_SIZE_PTR.
The fix changes the slot count type and comparison typecast to u64,
which prevents truncation or signedness errors, and also ensures that we
don't copy more data than we've allocated in the subsequent loop. Note
that zero-size allocations are no longer possible since there is already
an explicit check for space_args.space_slots being 0 and truncation of
this value is no longer an issue.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: Josef Bacik <josef@redhat.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 78d2978874e4e10e97dfd4fd79db45bdc0748550 upstream.
In get_empty_filp() since 2.6.29, file_free(f) is called with f->f_cred == NULL
when security_file_alloc() returned an error. As a result, kernel will panic()
due to put_cred(NULL) call within RCU callback.
Fix this bug by assigning f->f_cred before calling security_file_alloc().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0fbca4d1c3932c27c4794bf5c2b5fc961cf5a54f upstream.
Commit 368e136 ("xfs: remove duplicate code from dquot reclaim") fails
to unlock the dquot freelist when the number of loop restarts is
exceeded in xfs_qm_dqreclaim_one(). This causes hangs in memory
reclaim.
Rework the loop control logic into an unwind stack that all the
different cases jump into. This means there is only one set of code
that processes the loop exit criteria, and simplifies the unlocking
of all the items from different points in the loop. It also fixes a
double increment of the restart counter from the qi_dqlist_lock
case.
Reported-by: Malcolm Scott <lkml@malc.org.uk>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Cc: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 3aa6e0aa8ab3e64bbfba092c64d42fd1d006b124 upstream.
If nfsd fails to find an exported via NFS file in the readahead cache, it
should increment corresponding nfsdstats counter (ra_depth[10]), but due to a
bug it may instead write to ra_depth[11], corrupting the following field.
In a kernel with NFSDv4 compiled in the corruption takes the form of an
increment of a counter of the number of NFSv4 operation 0's received; since
there is no operation 0, this is harmless.
In a kernel with NFSDv4 disabled it corrupts whatever happens to be in the
memory beyond nfsdstats.
Signed-off-by: Konstantin Khorenko <khorenko@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0ca7a5b9ac5d301845dd6382ff25a699b6263a81 upstream.
Fixes the following kernel oops in nilfs_setup_super() which could
arise if one of two super-blocks is unavailable.
> BUG: unable to handle kernel NULL pointer dereference at (null)
> Pid: 3529, comm: mount.nilfs2 Not tainted 2.6.37 #1 /
> EIP: 0060:[<c03196bc>] EFLAGS: 00010202 CPU: 3
> EIP is at memcpy+0xc/0x1b
> Call Trace:
> [<f953720e>] ? nilfs_setup_super+0x6c/0xa5 [nilfs2]
> [<f95369e9>] ? nilfs_get_root_dentry+0x81/0xcb [nilfs2]
> [<f9537a08>] ? nilfs_mount+0x4f9/0x62c [nilfs2]
> [<c02745cf>] ? kstrdup+0x36/0x3f
> [<f953750f>] ? nilfs_mount+0x0/0x62c [nilfs2]
> [<c0293940>] ? vfs_kern_mount+0x4d/0x12c
> [<c02a5100>] ? get_fs_type+0x76/0x8f
> [<c0293a68>] ? do_kern_mount+0x33/0xbf
> [<c02a784a>] ? do_mount+0x2ed/0x714
> [<c02a6171>] ? copy_mount_options+0x28/0xfc
> [<c02a7ce3>] ? sys_mount+0x72/0xaf
> [<c0473085>] ? syscall_call+0x7/0xb
Reported-by: Wakko Warner <wakko@animx.eu.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Wakko Warner <wakko@animx.eu.org>
LKML-Reference: <20110121024918.GA29598@animx.eu.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 20d9600cb407b0b55fef6ee814b60345c6f58264 upstream.
When using devices that support max_segments > BIO_MAX_PAGES (256), direct
IO tries to allocate a bio with more pages than allowed, which leads to an
oops in dio_bio_alloc(). Clamp the request to the supported maximum, and
change dio_bio_alloc() to reflect that bio_alloc() will always return a
bio when called with __GFP_WAIT and a valid number of vectors.
[akpm@linux-foundation.org: remove redundant BUG_ON()]
Signed-off-by: David Dillow <dillowda@ornl.gov>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 540b2e377797d8715469d408b887baa9310c5f3e upstream.
NTLM response length was changed to 16 bytes instead of 24 bytes
that are sent in Tree Connection Request during share-level security
share mounts. Revert it back to 24 bytes.
Reported-and-Tested-by: Grzegorz Ozanski <grzegorz.ozanski@intel.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Acked-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 12fed00de963433128b5366a21a55808fab2f756 upstream.
When we get oplock break notification we should set the appropriate
value of OplockLevel field in oplock break acknowledge according to
the oplock level held by the client in this time. As we only can have
level II oplock or no oplock in the case of oplock break, we should be
aware only about clientCanCacheRead field in cifsInodeInfo structure.
Also fix bug connected with wrong interpretation of OplockLevel field
during oplock break notification processing.
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0781b909b5586f4db720b5d1838b78f9d8e42f14 upstream.
commit 95aac7b1cd224f ("epoll: make epoll_wait() use the hrtimer range
feature") added a performance regression because it uses timespec_add_ns()
with potential very large 'ns' values.
[akpm@linux-foundation.org: s/epoll_set_mstimeout/ep_set_mstimeout/, per Davide]
Reported-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Shawn Bohrer <shawn.bohrer@gmail.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0b0abeaf3d30cec03ac6497fe978b8f7edecc5ae upstream.
This reverts commit 115e19c53501edc11f730191f7f047736815ae3d.
Apparently setting inode->bdi to one's own sb->s_bdi stops VFS from
sending *read-aheads*. This problem was bisected to this commit. A
revert fixes it. I'll investigate farther why is this happening for the
next Kernel, but for now a revert.
I'm sending to stable@kernel.org as well, since it exists also in
2.6.37. 2.6.36 is good and does not have this patch.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 2892c15ddda6a76dc10b7499e56c0f3b892e5a69 upstream.
In 2.6.37 I was running into oopses with repeated module
loads & unloads. I tracked this down to:
fb1813f4 ext4: use dedicated slab caches for group_info structures
(this was in addition to the features advert unload problem)
The kstrdup & subsequent kfree of the cache name was causing
a double free. In slub, at least, if I read it right it allocates
& frees the name itself, slab seems to do something different...
so in slub I think we were leaking -our- cachep->name, and double
freeing the one allocated by slub.
After getting lost in slab/slub/slob a bit, I just looked at other
sized-caches that get allocated. jbd2, biovec, sgpool all do it
more or less the way jbd2 does. Below patch follows the jbd2
method of dynamically allocating a cache at mount time from
a list of static names.
(This might also possibly fix a race creating the caches with
parallel mounts running).
[Folded in a fix from Dan Carpenter which fixed an off-by-one error in
the original patch]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit d50bdd5aa55127635fd8a5c74bd2abb256bd34e3 upstream.
This fixes a corruption problem with the multi-block
writepages submittal change for ext4, from commit
bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc ("ext4: use bio
layer instead of buffer layer in mpage_da_submit_io").
(Note that this corruption is not present in 2.6.37 on
ext4, because the corruption was detected after the
feature was merged in 2.6.37-rc1, and so it was turned
off by adding a non-default mount option,
mblk_io_submit. With this commit, which hopefully
fixes the last of the bugs with this feature, we'll be
able to turn on this performance feature by default in
2.6.38, and remove the mblk_io_submit option.)
The ext4 code path to bundle multiple pages for
writeback in ext4_bio_write_page() had a bug: we should
be clearing buffer head dirty flags *before* we submit
the bio, not in the completion routine.
The patch below was tested on 2.6.37 under KVM with the
postgresql script which was submitted by Jon Nelson as
documented in commit 1449032be1.
Without the patch, I'd hit the corruption problem about
50-70% of the time. With the patch, I executed the
script > 100 times with no corruption seen.
I also fixed a bug to make sure ext4_end_bio() doesn't
dereference the bio after the bio_put() call.
Reported-by: Jon Nelson <jnelson@jamponi.net>
Reported-by: Matthias Bayer <jackdachef@gmail.com>
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 8f021222c1e2756ea4c9dde93b23e1d2a0a4ec37 upstream.
Ext4 features interface was not properly unregistered which led to
problems while unloading/reloading ext4 module. This commit fixes that by
adding proper kobject unregistration code into ext4_exit_fs() as well as
fail-path of ext4_init_fs()
Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 8f1f745331c1b560f53c0d60e55a4f4f43f7cce5 upstream.
https://bugzilla.kernel.org/show_bug.cgi?id=27652
If the lazyinit thread is running, the teardown function
ext4_destroy_lazyinit_thread() has problems:
ext4_clear_request_list();
while (ext4_li_info->li_task) {
wake_up(&ext4_li_info->li_wait_daemon);
wait_event(ext4_li_info->li_wait_task,
ext4_li_info->li_task == NULL);
}
Clearing the request list will cause the thread to exit and free
ext4_li_info, so then we're waiting on something which is getting
freed.
Fix this up by making the thread respond to kthread_stop, and exit,
without the need to wait for that exit in some other homegrown way.
Reported-and-Tested-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 1c5b9e9065567876c2d4a7a16d78f0fed154a5bf upstream.
Commit 40389687 moved a call to ext4_forget() out of
ext4_free_branches and let ext4_free_blocks() handle calling
bforget(). But that change unfortunately did not replace the call to
ext4_forget() with brelse(), which was needed to drop the in-use count
of the indirect block's buffer head, which lead to a memory leak when
deleting files that used indirect blocks. Fix this.
Thanks to Hugh Dickins for pointing this out.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit ca6e909f9bebe709bc65a3ee605ce32969db0452 upstream.
When ext4_trim_fs() is called to trim a part of a single group, the
logic will wrongly set last block of the interval to 'len' instead
of 'first_block + len'. Thus a shorter interval is possibly trimmed.
Fix it.
CC: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 6c5a6cb998854f3c579ecb2bc1423d302bcb1b76 upstream.
fs/ext4/super.c: In function 'ext4_register_li_request':
fs/ext4/super.c:2936: warning: 'ret' may be used uninitialized in this function
It looks buggy to me, too.
Cc: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit b9543dac5bbc4aef0a598965b6b34f6259ab9a9b upstream.
When wb_writeback() is called in WB_SYNC_ALL mode, work->nr_to_write is
usually set to LONG_MAX. The logic in wb_writeback() then calls
__writeback_inodes_sb() with nr_to_write == MAX_WRITEBACK_PAGES and we
easily end up with non-positive nr_to_write after the function returns, if
the inode has more than MAX_WRITEBACK_PAGES dirty pages at the moment.
When nr_to_write is <= 0 wb_writeback() decides we need another round of
writeback but this is wrong in some cases! For example when a single
large file is continuously dirtied, we would never finish syncing it
because each pass would be able to write MAX_WRITEBACK_PAGES and inode
dirty timestamp never gets updated (as inode is never completely clean).
Thus __writeback_inodes_sb() would write the redirtied inode again and
again.
Fix the issue by setting nr_to_write to LONG_MAX in WB_SYNC_ALL mode. We
do not need nr_to_write in WB_SYNC_ALL mode anyway since
write_cache_pages() does livelock avoidance using page tagging in
WB_SYNC_ALL mode.
This makes wb_writeback() call __writeback_inodes_sb() only once on
WB_SYNC_ALL. The latter function won't livelock because it works on
- a finite set of files by doing queue_io() once at the beginning
- a finite set of pages by PAGECACHE_TAG_TOWRITE page tagging
After this patch, program from http://lkml.org/lkml/2010/10/24/154 is no
longer able to stall sync forever.
[fengguang.wu@intel.com: fix locking comment]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit aa373cf550994623efb5d49a4d8775bafd10bbc1 upstream.
Background writeback is easily livelockable in a loop in wb_writeback() by
a process continuously re-dirtying pages (or continuously appending to a
file). This is in fact intended as the target of background writeback is
to write dirty pages it can find as long as we are over
dirty_background_threshold.
But the above behavior gets inconvenient at times because no other work
queued in the flusher thread's queue gets processed. In particular, since
e.g. sync(1) relies on flusher thread to do all the IO for it, sync(1)
can hang forever waiting for flusher thread to do the work.
Generally, when a flusher thread has some work queued, someone submitted
the work to achieve a goal more specific than what background writeback
does. Moreover by working on the specific work, we also reduce amount of
dirty pages which is exactly the target of background writeout. So it
makes sense to give specific work a priority over a generic page cleaning.
Thus we interrupt background writeback if there is some other work to do.
We return to the background writeback after completing all the queued
work.
This may delay the writeback of expired inodes for a while, however the
expired inodes will eventually be flushed to disk as long as the other
works won't livelock.
[fengguang.wu@intel.com: update comment]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 6585027a5e8cb490e3a761b2f3f3c3acf722aff2 upstream.
Check whether background writeback is needed after finishing each work.
When bdi flusher thread finishes doing some work check whether any kind of
background writeback needs to be done (either because
dirty_background_ratio is exceeded or because we need to start flushing
old inodes). If so, just do background write back.
This way, bdi_start_background_writeback() just needs to wake up the
flusher thread. It will do background writeback as soon as there is no
other work.
This is a preparatory patch for the next patch which stops background
writeback as soon as there is other work to do.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 09e099d4bafea3b15be003d548bdf94b4b6e0e17 upstream.
/proc/diskstats would display a strange output as follows.
$ cat /proc/diskstats |grep sda
8 0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089
8 1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691
~~~~~~~~~~
8 2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390
8 3 sda3 54 487 2188 92 0 0 0 0 0 88 92
8 4 sda4 4 0 8 0 0 0 0 0 0 0 0
8 5 sda5 81 2027 2130 138 0 0 0 0 0 87 137
Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is
merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.
The detailed root cause is as follows.
Assuming that there are two partition, sda1 and sda2.
1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight
is 0 and sda2's one is 1.
| hd_struct->in_flight
---------------------------
sda1 | 0
sda2 | 1
---------------------------
2. A bio belongs to sda1 is issued and is merged into the request mentioned on
step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
from sda2 region to sda1 region. However the two partition's
hd_struct->in_flight are not changed.
| hd_struct->in_flight
---------------------------
sda1 | 0
sda2 | 1
---------------------------
3. The request is finished and blk_account_io_done() is called. In this case,
sda2's hd_struct->in_flight, not a sda1's one, is decremented.
| hd_struct->in_flight
---------------------------
sda1 | -1
sda2 | 1
---------------------------
The patch fixes the problem by caching the partition lookup
inside the request structure, hence making sure that the increment
and decrement will always happen on the same partition struct. This
also speeds up IO with accounting enabled, since it cuts down on
the number of lookups we have to do.
Also add a refcount to struct hd_struct to keep the partition in
memory as long as users exist. We use kref_test_and_get() to ensure
we don't add a reference to a partition which is going away.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit ceff1a770933e2ca2bf995b453dade4ec47a9878 upstream.
Commit 34aacb2920 ("procfs: Use generic_file_llseek in /proc/kcore") broke
seeking on /proc/kcore. This changes it back to use default_llseek in
order to restore the original behavior.
The problem with generic_file_llseek is that it only allows seeks up to
inode->i_sb->s_maxbytes, which is 2GB-1 on procfs, where the memory file
offset values in the /proc/kcore PT_LOAD segments may exceed or start
beyond that offset value.
A similar revert was made for /proc/vmcore.
Signed-off-by: Dave Anderson <anderson@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit f6af99ec1b261e21219d5eba99e3af48fc6c32d4 upstream.
According to rfc 3530 BADNAME is for strings that represent paths;
BADOWNER is for user/group names that don't map.
And the too-long name should probably be BADOWNER as well; it's
effectively the same as if we couldn't map it.
Reported-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reported-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 839f7ad6932d95f4d5ae7267b95c574714ff3d5b upstream.
Nick Piggin reports:
> I'm getting use after frees in aio code in NFS
>
> [ 2703.396766] Call Trace:
> [ 2703.396858] [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80
> [ 2703.396959] [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40
> [ 2703.397058] [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140
> [ 2703.397159] [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0
> [ 2703.397260] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
> [ 2703.397361] [<ffffffff81039701>] ? get_parent_ip+0x11/0x50
> [ 2703.397464] [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80
> [ 2703.397564] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
> [ 2703.397662] [<ffffffff811627db>] aio_put_req+0x2b/0x60
> [ 2703.397761] [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0
> [ 2703.397895] [<ffffffff81164d0b>] sys_io_submit+0xb/0x10
> [ 2703.397995] [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b
>
> Adding some tracing, it is due to nfs completing the request then
> returning something other than -EIOCBQUEUED, so aio.c
> also completes the request.
To address this, prevent the NFS direct I/O engine from completing
async iocbs when the forward path returns an error without starting
any I/O.
This fix appears to survive ^C during both "xfstest no. 208" and "fsx
-Z."
It's likely this bug has existed for a very long while, as we are seeing
very similar symptoms in OEL 5. Copying stable.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit e00b8a24041f37e56b4b8415ce4eba1cbc238065 upstream.
There is no reason to be freeing the delegation cred in the rcu callback,
and doing so is resulting in a lockdep complaint that rpc_credcache_lock
is being called from both softirq and non-softirq contexts.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 8a0eebf66e3b1deae036553ba641a9c2bdbae678 upstream.
Commit c0204fd2b8fe047b18b67e07e1bf2a03691240cd (NFS: Clean up
nfs4_proc_create()) broke NFSv3 exclusive open by removing the code
that passes the O_EXCL flag down to nfs3_proc_create(). This patch
reverts that offending hunk from the original commit.
Reported-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 6650239a4b01077e80d5a4468562756d77afaa59 upstream.
vm_map_ram() is not available on NOMMU platforms, and causes trouble
on incoherrent architectures such as ARM when we access the page data
through both the direct and the virtual mapping.
The alternative is to use the direct mapping to access page data
for the case when we are not crossing a page boundary, but to copy
the data into a linear scratch buffer when we are accessing data
that spans page boundaries.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2: Fix system inodes cache overflow.
ocfs2: Hold ip_lock when set/clear flags for indexed dir.
ocfs2: Adjust masklog flag values
Ocfs2: Teach 'coherency=full' O_DIRECT writes to correctly up_read i_alloc_sem.
ocfs2/dlm: Migrate lockres with no locks if it has a reference
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'linus-hot-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix on-line resizing regression
|
|
https://bugzilla.kernel.org/show_bug.cgi?id=25352
This regression was caused by commit a31437b85: "ext4: use
sb_issue_zeroout in setup_new_group_blocks", by accidentally dropping
the code which reserved the block group descriptor and inode table
blocks.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
This happens when __logfs_create() tries to write a new inode to the disk
which is full.
__logfs_create() associates the transaction pointer with inode. During
the logfs_write_inode() function call chain this transaction pointer is
moved from inode to page->private using function move_inode_to_page
(do_write_inode() -> inode_to_page() -> move_inode_to_page)
When the write inode fails, the transaction is aborted and iput is called
on the failed inode. During delete_inode the same transaction pointer
associated with the page is getting used. Thus causing kernel BUG.
The patch checks for error in write_inode() and restores the page->private
to NULL.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=20162
Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Florian Mickler <florian@mickler.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
do_logfs_journal_wl_pass() should use GFP_NOFS for memory allocation GC
code calls btree_insert32 with GFP_KERNEL while holding a mutex
super->s_write_mutex.
The same mutex is used in address_space_operations->writepage(), and a
call to writepage() could be triggered as a result of memory allocation
in btree_insert32, causing a deadlock.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=20342
Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Florian Mickler <florian@mickler.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When we store system inodes cache in ocfs2_super,
we use a array for global system inodes. But unfortunately,
the range is calculated wrongly which makes it overflow and
pollute ocfs2_super->local_system_inodes.
This patch fix it by setting the range properly.
The corresponding bug is ossbug1303.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1303
Cc: stable@kernel.org
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: handle partial result from get_user_pages
ceph: mark user pages dirty on direct-io reads
ceph: fix null pointer dereference in ceph_init_dentry for nfs reexport
ceph: fix direct-io on non-page-aligned buffers
ceph: fix msgr_init error path
|
|
Buggered-in: 76dda93c6ae2 ("Btrfs: add snapshot/subvolume destroy
ioctl")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
For read operation, we have to set the argument _write_ of get_user_pages
to 1 since we will write data to pages. Also, we need to SetPageDirty before
releasing these pages.
Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
The fh_to_dentry etc. methods use ceph_init_dentry(), which assumes that
d_parent is defined. It isn't for those callers, so check!
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
fanotify: fill in the metadata_len field on struct fanotify_event_metadata
fanotify: split version into version and metadata_len
fanotify: Dont try to open a file descriptor for the overflow event
fanotify: Introduce FAN_NOFD
fanotify: do not leak user reference on allocation failure
inotify: stop kernel memory leak on file creation failure
fanotify: on group destroy allow all waiters to bypass permission check
fanotify: Dont allow a mask of 0 if setting or removing a mark
fanotify: correct broken ref counting in case adding a mark failed
fanotify: if set by user unset FMODE_NONOTIFY before fsnotify_perm() is called
fanotify: remove packed from access response message
fanotify: deny permissions when no event was sent
|
|
When we set/clear the dyn_features for an inode we hold the ip_lock.
So do it when we set/clear OCFS2_INDEXED_DIR_FL also.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
|
Two masklogs had the same flag value.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
|
On 2.6.37-rc1, garbage collection ioctl of nilfs was broken due to the
commit 263d90cefc7d82a0 ("nilfs2: remove own inode hash used for GC"),
and leading to filesystem corruption.
The patch doesn't queue gc-inodes for log writer if they are reused
through the vfs inode cache. Here, gc-inode is the inode which
buffers blocks to be relocated on GC. That patch queues gc-inodes in
nilfs_init_gcinode() function, but this function is not called when
they don't have I_NEW flag. Thus, some of live blocks are wrongly
overrode without being moved to new logs.
This resolves the problem by moving the gc-inode queueing to an outer
function to ensure it's done right.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
The user buffer may be 512-byte aligned, not page-aligned. We were
assuming the buffer was page-aligned and only accounting for
non-page-aligned io offsets.
Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix typo which broke '..' detection in ext4_find_entry()
ext4: Turn off multiple page-io submission by default
|
|
The install_special_mapping routine (used, for example, to setup the
vdso) skips the security check before insert_vm_struct, allowing a local
attacker to bypass the mmap_min_addr security restriction by limiting
the available pages for special mappings.
bprm_mm_init() also skips the check, and although I don't think this can
be used to bypass any restrictions, I don't see any reason not to have
the security check.
$ uname -m
x86_64
$ cat /proc/sys/vm/mmap_min_addr
65536
$ cat install_special_mapping.s
section .bss
resb BSS_SIZE
section .text
global _start
_start:
mov eax, __NR_pause
int 0x80
$ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s
$ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o
$ ./install_special_mapping &
[1] 14303
$ cat /proc/14303/maps
0000f000-00010000 r-xp 00000000 00:00 0 [vdso]
00010000-00011000 r-xp 00001000 00:19 2453665 /home/taviso/install_special_mapping
00011000-ffffe000 rwxp 00000000 00:00 0 [stack]
It's worth noting that Red Hat are shipping with mmap_min_addr set to
4096.
Signed-off-by: Tavis Ormandy <taviso@google.com>
Acked-by: Kees Cook <kees@ubuntu.com>
Acked-by: Robert Swiecki <swiecki@google.com>
[ Changed to not drop the error code - akpm ]
Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|