Age | Commit message (Collapse) | Author |
|
commit 9d0be50230b333005635967f7ecd4897dbfd181b upstream (as of v2.6.33-rc3)
In the past, ext4_calc_metadata_amount(), and its sub-functions
ext4_ext_calc_metadata_amount() and ext4_indirect_calc_metadata_amount()
badly over-estimated the number of metadata blocks that might be
required for delayed allocation blocks. This didn't matter as much
when functions which managed the reserved metadata blocks were more
aggressive about dropping reserved metadata blocks as delayed
allocation blocks were written, but unfortunately they were too
aggressive. This was fixed in commit 0637c6f, but as a result the
over-estimation by ext4_calc_metadata_amount() would lead to reserving
2-3 times the number of pending delayed allocation blocks as
potentially required metadata blocks. So if there are 1 megabytes of
blocks which have been not yet been allocation, up to 3 megabytes of
space would get reserved out of the user's quota and from the file
system free space pool until all of the inode's data blocks have been
allocated.
This commit addresses this problem by much more accurately estimating
the number of metadata blocks that will be required. It will still
somewhat over-estimate the number of blocks needed, since it must make
a worst case estimate not knowing which physical blocks will be
needed, but it is much more accurate than before.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit ee5f4d9cdf32fd99172d11665c592a288c2b1ff4 upstream (as of v2.6.33-rc3)
Commit 0637c6f had a typo which caused the reserved metadata blocks to
not be released correctly. Fix this.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0637c6f4135f592f094207c7c21e7c0fc5557834 upstream (as of v2.6.33-rc3)
As reported in Kernel Bugzilla #14936, commit d21cd8f triggered a BUG
in the function ext4_da_update_reserve_space() found in
fs/ext4/inode.c. The root cause of this BUG() was caused by the fact
that ext4_calc_metadata_amount() can severely over-estimate how many
metadata blocks will be needed, especially when using direct
block-mapped files.
In addition, it can also badly *under* estimate how much space is
needed, since ext4_calc_metadata_amount() assumes that the blocks are
contiguous, and this is not always true. If the application is
writing blocks to a sparse file, the number of metadata blocks
necessary can be severly underestimated by the functions
ext4_da_reserve_space(), ext4_da_update_reserve_space() and
ext4_da_release_space(). This was the cause of the dq_claim_space
reports found on kerneloops.org.
Unfortunately, doing this right means that we need to massively
over-estimate the amount of free space needed. So in some cases we
may need to force the inode to be written to disk asynchronously in
to avoid spurious quota failures.
http://bugzilla.kernel.org/show_bug.cgi?id=14936
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 515f41c33a9d44a964264c9511ad2c869af1fac3 upstream (as of v2.6.33-rc3)
This fixes a bug (found by Curt Wohlgemuth) in which new blocks
returned from an extent created with ext4_ext_zeroout() can have dirty
metadata still associated with them.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 2faf2e19dd0e060eeb32442858ef495ac3083277 upstream (as of v2.6.33-rc3)
When ext4_da_writepages increases the nr_to_write in writeback_control
then it must always re-base the return value. Originally there was a
(misguided) attempt prevent wbc.nr_to_write from going negative. In
fact, it's necessary to allow nr_to_write to be negative so that
wb_writeback() can correctly calculate how many pages were actually
written.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit d3533d72e7478a61a3e1936956fc825289a2acf4 upstream (as of v2.6.33-rc3)
b_entry_name and buffer are initially NULL, are initialized within a loop
to the result of calling kmalloc, and are freed at the bottom of this loop.
The loop contains gotos to cleanup, which also frees b_entry_name and
buffer. Some of these gotos are before the reinitializations of
b_entry_name and buffer. To maintain the invariant that b_entry_name and
buffer are NULL at the top of the loop, and thus acceptable arguments to
kfree, these variables are now set to NULL after the kfrees.
This seems to be the simplest solution. A more complicated solution
would be to introduce more labels in the error handling code at the end of
the function.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@r@
identifier E;
expression E1;
iterator I;
statement S;
@@
*kfree(E);
... when != E = E1
when != I(E,...) S
when != &E
*kfree(E);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit cc3e1bea5d87635c519da657303690f5538bb4eb upstream (as of v2.6.33-rc3)
This is a bit complicated because we are trying to optimize when we
send barriers to the fs data disk. We could just throw in an extra
barrier to the data disk whenever we send a barrier to the journal
disk, but that's not always strictly necessary.
We only need to send a barrier during a commit when there are data
blocks which are must be written out due to an inode written in
ordered mode, or if fsync() depends on the commit to force data blocks
to disk. Finally, before we drop transactions from the beginning of
the journal during a checkpoint operation, we need to guarantee that
any blocks that were flushed out to the data disk are firmly on the
rust platter before we drop the transaction from the journal.
Thanks to Oleg Drokin for pointing out this flaw in ext3/ext4.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 034fb4c95fc0fed4ec4a50778127b92c6f2aec01 upstream (as of v2.6.33-rc3)
This patch fixes the Kernel BZ #14286. When the address of an extent
corresponding to a valid block is corrupted, a -EIO should be reported
instead of a BUG(). This situation should not normally not occur
except in the case of a corrupted filesystem. If however it does,
then the system should not panic directly but depending on the mount
time options appropriate action should be taken. If the mount options
so permit, the I/O should be gracefully aborted by returning a -EIO.
http://bugzilla.kernel.org/show_bug.cgi?id=14286
Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit d21cd8f163ac44b15c465aab7306db931c606908 upstream (as of v2.6.33-rc2)
We have to delay vfs_dq_claim_space() until allocation context destruction.
Currently we have following call-trace:
ext4_mb_new_blocks()
/* task is already holding ac->alloc_semp */
->ext4_mb_mark_diskspace_used
->vfs_dq_claim_space() /* acquire dqptr_sem here. Possible deadlock */
->ext4_mb_release_context() /* drop ac->alloc_semp here */
Let's move quota claiming to ext4_da_update_reserve_space()
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.32-rc7 #18
-------------------------------------------------------
write-truncate-/3465 is trying to acquire lock:
(&s->s_dquot.dqptr_sem){++++..}, at: [<c025e73b>] dquot_claim_space+0x3b/0x1b0
but task is already holding lock:
(&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&meta_group_info[i]->alloc_sem){++++..}:
[<c017d04b>] __lock_acquire+0xd7b/0x1260
[<c017d5ea>] lock_acquire+0xba/0xd0
[<c0527191>] down_read+0x51/0x90
[<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
[<c02d0c1c>] ext4_mb_free_blocks+0x46c/0x870
[<c029c9d3>] ext4_free_blocks+0x73/0x130
[<c02c8cfc>] ext4_ext_truncate+0x76c/0x8d0
[<c02a8087>] ext4_truncate+0x187/0x5e0
[<c01e0f7b>] vmtruncate+0x6b/0x70
[<c022ec02>] inode_setattr+0x62/0x190
[<c02a2d7a>] ext4_setattr+0x25a/0x370
[<c022ee81>] notify_change+0x151/0x340
[<c021349d>] do_truncate+0x6d/0xa0
[<c0221034>] may_open+0x1d4/0x200
[<c022412b>] do_filp_open+0x1eb/0x910
[<c021244d>] do_sys_open+0x6d/0x140
[<c021258e>] sys_open+0x2e/0x40
[<c0103100>] sysenter_do_call+0x12/0x32
-> #2 (&ei->i_data_sem){++++..}:
[<c017d04b>] __lock_acquire+0xd7b/0x1260
[<c017d5ea>] lock_acquire+0xba/0xd0
[<c0527191>] down_read+0x51/0x90
[<c02a5787>] ext4_get_blocks+0x47/0x450
[<c02a74c1>] ext4_getblk+0x61/0x1d0
[<c02a7a7f>] ext4_bread+0x1f/0xa0
[<c02bcddc>] ext4_quota_write+0x12c/0x310
[<c0262d23>] qtree_write_dquot+0x93/0x120
[<c0261708>] v2_write_dquot+0x28/0x30
[<c025d3fb>] dquot_commit+0xab/0xf0
[<c02be977>] ext4_write_dquot+0x77/0x90
[<c02be9bf>] ext4_mark_dquot_dirty+0x2f/0x50
[<c025e321>] dquot_alloc_inode+0x101/0x180
[<c029fec2>] ext4_new_inode+0x602/0xf00
[<c02ad789>] ext4_create+0x89/0x150
[<c0221ff2>] vfs_create+0xa2/0xc0
[<c02246e7>] do_filp_open+0x7a7/0x910
[<c021244d>] do_sys_open+0x6d/0x140
[<c021258e>] sys_open+0x2e/0x40
[<c0103100>] sysenter_do_call+0x12/0x32
-> #1 (&sb->s_type->i_mutex_key#7/4){+.+...}:
[<c017d04b>] __lock_acquire+0xd7b/0x1260
[<c017d5ea>] lock_acquire+0xba/0xd0
[<c0526505>] mutex_lock_nested+0x65/0x2d0
[<c0260c9d>] vfs_load_quota_inode+0x4bd/0x5a0
[<c02610af>] vfs_quota_on_path+0x5f/0x70
[<c02bc812>] ext4_quota_on+0x112/0x190
[<c026345a>] sys_quotactl+0x44a/0x8a0
[<c0103100>] sysenter_do_call+0x12/0x32
-> #0 (&s->s_dquot.dqptr_sem){++++..}:
[<c017d361>] __lock_acquire+0x1091/0x1260
[<c017d5ea>] lock_acquire+0xba/0xd0
[<c0527191>] down_read+0x51/0x90
[<c025e73b>] dquot_claim_space+0x3b/0x1b0
[<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380
[<c02d210a>] ext4_mb_new_blocks+0x34a/0x530
[<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0
[<c02a5966>] ext4_get_blocks+0x226/0x450
[<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0
[<c02a6ed6>] ext4_da_writepages+0x506/0x790
[<c01de272>] do_writepages+0x22/0x50
[<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80
[<c01d7b9b>] filemap_flush+0x2b/0x30
[<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60
[<c029e595>] ext4_release_file+0x75/0xb0
[<c0216b59>] __fput+0xf9/0x210
[<c0216c97>] fput+0x27/0x30
[<c02122dc>] filp_close+0x4c/0x80
[<c014510e>] put_files_struct+0x6e/0xd0
[<c01451b7>] exit_files+0x47/0x60
[<c0146a24>] do_exit+0x144/0x710
[<c0147028>] do_group_exit+0x38/0xa0
[<c0159abc>] get_signal_to_deliver+0x2ac/0x410
[<c0102849>] do_notify_resume+0xb9/0x890
[<c01032d2>] work_notifysig+0x13/0x21
other info that might help us debug this:
3 locks held by write-truncate-/3465:
#0: (jbd2_handle){+.+...}, at: [<c02e1f8f>] start_this_handle+0x38f/0x5c0
#1: (&ei->i_data_sem){++++..}, at: [<c02a57f6>] ext4_get_blocks+0xb6/0x450
#2: (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
stack backtrace:
Pid: 3465, comm: write-truncate- Not tainted 2.6.32-rc7 #18
Call Trace:
[<c0524cb3>] ? printk+0x1d/0x22
[<c017ac9a>] print_circular_bug+0xca/0xd0
[<c017d361>] __lock_acquire+0x1091/0x1260
[<c016bca2>] ? sched_clock_local+0xd2/0x170
[<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
[<c017d5ea>] lock_acquire+0xba/0xd0
[<c025e73b>] ? dquot_claim_space+0x3b/0x1b0
[<c0527191>] down_read+0x51/0x90
[<c025e73b>] ? dquot_claim_space+0x3b/0x1b0
[<c025e73b>] dquot_claim_space+0x3b/0x1b0
[<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380
[<c02d210a>] ext4_mb_new_blocks+0x34a/0x530
[<c02c601d>] ? ext4_ext_find_extent+0x25d/0x280
[<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0
[<c016bca2>] ? sched_clock_local+0xd2/0x170
[<c016be60>] ? sched_clock_cpu+0x120/0x160
[<c016beef>] ? cpu_clock+0x4f/0x60
[<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
[<c052712c>] ? down_write+0x8c/0xa0
[<c02a5966>] ext4_get_blocks+0x226/0x450
[<c016be60>] ? sched_clock_cpu+0x120/0x160
[<c016beef>] ? cpu_clock+0x4f/0x60
[<c017908b>] ? trace_hardirqs_off+0xb/0x10
[<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0
[<c01d69cc>] ? find_get_pages_tag+0x16c/0x180
[<c01d6860>] ? find_get_pages_tag+0x0/0x180
[<c02a73bd>] ? __mpage_da_writepage+0x16d/0x1a0
[<c01dfc4e>] ? pagevec_lookup_tag+0x2e/0x40
[<c01ddf1b>] ? write_cache_pages+0xdb/0x3d0
[<c02a7250>] ? __mpage_da_writepage+0x0/0x1a0
[<c02a6ed6>] ext4_da_writepages+0x506/0x790
[<c016beef>] ? cpu_clock+0x4f/0x60
[<c016bca2>] ? sched_clock_local+0xd2/0x170
[<c016be60>] ? sched_clock_cpu+0x120/0x160
[<c016be60>] ? sched_clock_cpu+0x120/0x160
[<c02a69d0>] ? ext4_da_writepages+0x0/0x790
[<c01de272>] do_writepages+0x22/0x50
[<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80
[<c01d7b9b>] filemap_flush+0x2b/0x30
[<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60
[<c029e595>] ext4_release_file+0x75/0xb0
[<c0216b59>] __fput+0xf9/0x210
[<c0216c97>] fput+0x27/0x30
[<c02122dc>] filp_close+0x4c/0x80
[<c014510e>] put_files_struct+0x6e/0xd0
[<c01451b7>] exit_files+0x47/0x60
[<c0146a24>] do_exit+0x144/0x710
[<c017b163>] ? lock_release_holdtime+0x33/0x210
[<c0528137>] ? _spin_unlock_irq+0x27/0x30
[<c0147028>] do_group_exit+0x38/0xa0
[<c017babb>] ? trace_hardirqs_on+0xb/0x10
[<c0159abc>] get_signal_to_deliver+0x2ac/0x410
[<c0102849>] do_notify_resume+0xb9/0x890
[<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
[<c017b163>] ? lock_release_holdtime+0x33/0x210
[<c0165b50>] ? autoremove_wake_function+0x0/0x50
[<c017ba54>] ? trace_hardirqs_on_caller+0x134/0x190
[<c017babb>] ? trace_hardirqs_on+0xb/0x10
[<c0300ba4>] ? security_file_permission+0x14/0x20
[<c0215761>] ? vfs_write+0x131/0x190
[<c0214f50>] ? do_sync_write+0x0/0x120
[<c0103115>] ? sysenter_do_call+0x27/0x32
[<c01032d2>] work_notifysig+0x13/0x21
CC: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit aca92ff6f57c000d1b4523e383c8bd6b8269b8b1 upstream.
ext4_fiemap() rounds the length of the requested range down to
blocksize, which is is not the true number of blocks that cover the
requested region. This problem is especially impressive if the user
requests only the first byte of a file: not a single extent will be
reported.
We fix this by calculating the last block of the region and then
subtract to find the number of blocks in the extents.
Signed-off-by: Leonard Michlmayr <leonard.michlmayr@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit a1de02dccf906faba2ee2d99cac56799bda3b96a upstream.
The "offset" member in ext4_io_end holds bytes, not blocks, so
ext4_lblk_t is wrong - and too small (u32).
This caused the async i/o writes to sparse files beyond 4GB to fail
when they wrapped around to 0.
Also fix up the type of arguments to ext4_convert_unwritten_extents(),
it gets ssize_t from ext4_end_aio_dio_nolock() and
ext4_ext_direct_IO().
Reported-by: Giel de Nijs <giel@vectorwise.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c8afb44682fcef6273e8b8eb19fab13ddd05b386 upstream.
Creating many small files in rapid succession on a small
filesystem can lead to spurious ENOSPC; on a 104MB filesystem:
for i in `seq 1 22500`; do
echo -n > $SCRATCH_MNT/$i
echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i
done
leads to ENOSPC even though after a sync, 40% of the fs is free
again.
This is because we reserve worst-case metadata for delalloc writes,
and when data is allocated that worst-case reservation is not
usually needed.
When freespace is low, kicking off an async writeback will start
converting that worst-case space usage into something more realistic,
almost always freeing up space to continue.
This resolves the testcase for me, and survives all 4 generic
ENOSPC tests in xfstests.
We'll still need a hard synchronous sync to squeeze out the last bit,
but this fixes things up to a large degree.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 39bc680a8160bb9d6743f7873b535d553ff61058 upstream.
Unlock i_block_reservation_lock before vfs_dq_reserve_block().
This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=14739
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit a9e7f4472075fb6937c545af3f6329e9946bbe66 upstream.
This patch also fixes write vs chown race condition.
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit fab3a549e204172236779f502eccb4f9bf0dc87d)
Fix the following potential circular locking dependency between
mm->mmap_sem and ei->i_data_sem:
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.32-04115-gec044c5 #37
-------------------------------------------------------
ureadahead/1855 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: [<ffffffff81107224>] might_fault+0x5c/0xac
but task is already holding lock:
(&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&ei->i_data_sem){++++..}:
[<ffffffff81099bfa>] __lock_acquire+0xb67/0xd0f
[<ffffffff81099e7e>] lock_acquire+0xdc/0x102
[<ffffffff81516633>] down_read+0x51/0x84
[<ffffffff811a2414>] ext4_get_blocks+0x50/0x2a5
[<ffffffff811a3453>] ext4_get_block+0xab/0xef
[<ffffffff81154f39>] do_mpage_readpage+0x198/0x48d
[<ffffffff81155360>] mpage_readpages+0xd0/0x114
[<ffffffff811a104b>] ext4_readpages+0x1d/0x1f
[<ffffffff810f8644>] __do_page_cache_readahead+0x12f/0x1bc
[<ffffffff810f86f2>] ra_submit+0x21/0x25
[<ffffffff810f0cfd>] filemap_fault+0x19f/0x32c
[<ffffffff81107b97>] __do_fault+0x55/0x3a2
[<ffffffff81109db0>] handle_mm_fault+0x327/0x734
[<ffffffff8151aaa9>] do_page_fault+0x292/0x2aa
[<ffffffff81518205>] page_fault+0x25/0x30
[<ffffffff812a34d8>] clear_user+0x38/0x3c
[<ffffffff81167e16>] padzero+0x20/0x31
[<ffffffff81168b47>] load_elf_binary+0x8bc/0x17ed
[<ffffffff81130e95>] search_binary_handler+0xc2/0x259
[<ffffffff81166d64>] load_script+0x1b8/0x1cc
[<ffffffff81130e95>] search_binary_handler+0xc2/0x259
[<ffffffff8113255f>] do_execve+0x1ce/0x2cf
[<ffffffff81027494>] sys_execve+0x43/0x5a
[<ffffffff8102918a>] stub_execve+0x6a/0xc0
-> #0 (&mm->mmap_sem){++++++}:
[<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f
[<ffffffff81099e7e>] lock_acquire+0xdc/0x102
[<ffffffff81107251>] might_fault+0x89/0xac
[<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda
[<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157
[<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1
[<ffffffff811be21e>] ext4_fiemap+0x13c/0x159
[<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6
[<ffffffff811392ca>] sys_ioctl+0x56/0x79
[<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
1 lock held by ureadahead/1855:
#0: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159
stack backtrace:
Pid: 1855, comm: ureadahead Not tainted 2.6.32-04115-gec044c5 #37
Call Trace:
[<ffffffff81098c70>] print_circular_bug+0xa8/0xb7
[<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f
[<ffffffff8102f229>] ? sched_clock+0x9/0xd
[<ffffffff81099e7e>] lock_acquire+0xdc/0x102
[<ffffffff81107224>] ? might_fault+0x5c/0xac
[<ffffffff81107251>] might_fault+0x89/0xac
[<ffffffff81107224>] ? might_fault+0x5c/0xac
[<ffffffff81124b44>] ? __kmalloc+0x13b/0x18c
[<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda
[<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157
[<ffffffff811bca0b>] ? ext4_ext_fiemap_cb+0x0/0x157
[<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1
[<ffffffff811be21e>] ext4_fiemap+0x13c/0x159
[<ffffffff81107224>] ? might_fault+0x5c/0xac
[<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6
[<ffffffff8129f6d0>] ? __up_read+0x8d/0x95
[<ffffffff81517fb5>] ? retint_swapgs+0x13/0x1b
[<ffffffff811392ca>] sys_ioctl+0x56/0x79
[<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 4a58579b9e4e2a35d57e6c9c8483e52f6f1b7fd6)
This patch fixes three problems in the handling of the
EXT4_IOC_MOVE_EXT ioctl:
1. In current EXT4_IOC_MOVE_EXT, there are read access mode checks for
original and donor files, but they allow the illegal write access to
donor file, since donor file is overwritten by original file data. To
fix this problem, change access mode checks of original (r->r/w) and
donor (r->w) files.
2. Disallow the use of donor files that have a setuid or setgid bits.
3. Call mnt_want_write() and mnt_drop_write() before and after
ext4_move_extents() calling to get write access to a mount.
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit b436b9bef84de6893e86346d8fbf7104bc520645)
We cannot rely on buffer dirty bits during fsync because pdflush can come
before fsync is called and clear dirty bits without forcing a transaction
commit. What we do is that we track which transaction has last changed
the inode and which transaction last changed allocation and force it to
disk on fsync.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 194074acacebc169ded90a4657193f5180015051)
Inside ->setattr() call both ATTR_UID and ATTR_GID may be valid
This means that we may end-up with transferring all quotas. Add
we have to reserve QUOTA_DEL_BLOCKS for all quotas, as we do in
case of QUOTA_INIT_BLOCKS.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 5aca07eb7d8f14d90c740834d15ca15277f4820c)
Currently all quota block reservation macros contains hard-coded "2"
aka MAXQUOTAS value. This is no good because in some places it is not
obvious to understand what does this digit represent. Let's introduce
new macro with self descriptive name.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Acked-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 8aa6790f876e81f5a2211fe1711a5fe3fe2d7b20)
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit b844167edc7fcafda9623955c05e4c1b3c32ebc7)
This fixes a leak of blocks in an inode prealloc list if device failures
cause ext4_mb_mark_diskspace_used() to fail.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit d4edac314e9ad0b21ba20ba8bc61b61f186f79e1)
There is a potential race when a transaction is committing right when
the file system is being umounting. This could reduce in a race
because EXT4_SB(sb)->s_group_info could be freed in ext4_put_super
before the commit code calls a callback so the mballoc code can
release freed blocks in the transaction, resulting in a panic trying
to access the freed s_group_info.
The fix is to wait for the transaction to finish committing before we
shutdown the multiblock allocator.
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit b9a4207d5e911b938f73079a83cc2ae10524ec7f)
When ext4_write_begin fails after allocating some blocks or
generic_perform_write fails to copy data to write, we truncate blocks
already instantiated beyond i_size. Although these blocks were never
inside i_size, we have to truncate the pagecache of these blocks so
that corresponding buffers get unmapped. Otherwise subsequent
__block_prepare_write (called because we are retrying the write) will
find the buffers mapped, not call ->get_block, and thus the page will
be backed by already freed blocks leading to filesystem and data
corruption.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit c09eef305dd43846360944ad072f051f964fa383)
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit ac48b0a1d068887141581bea8285de5fcab182b0)
Integrate duplicate lines (acquire/release semaphore and invalidate
extent cache in move_extent_per_page()) into mext_replace_branches(),
to reduce source and object code size.
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 446aaa6e7e993b38a6f21c6acfa68f3f1af3dbe3)
The move_extent.moved_len is used to pass back the number of exchanged
blocks count to user space. Currently the caller must clear this
field; but we spend more code space checking for this requirement than
simply zeroing the field ourselves, so let's just make life easier for
everyone all around.
Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 94d7c16cbbbd0e03841fcf272bcaf0620ad39618)
At the beginning of ext4_move_extent(), we call
ext4_discard_preallocations() to discard inode PAs of orig and donor
inodes. But in the following case, blocks can be double freed, so
move ext4_discard_preallocations() to the end of ext4_move_extents().
1. Discard inode PAs of orig and donor inodes with
ext4_discard_preallocations() in ext4_move_extents().
orig : [ DATA1 ]
donor: [ DATA2 ]
2. While data blocks are exchanging between orig and donor inodes, new
inode PAs is created to orig by other process's block allocation.
(Since there are semaphore gaps in ext4_move_extents().) And new
inode PAs is used partially (2-1).
2-1 Create new inode PAs to orig inode
orig : [ DATA1 | used PA1 | free PA1 ]
donor: [ DATA2 ]
3. Donor inode which has old orig inode's blocks is deleted after
EXT4_IOC_MOVE_EXT finished (3-1, 3-2). So the block bitmap
corresponds to old orig inode's blocks are freed.
3-1 After EXT4_IOC_MOVE_EXT finished
orig : [ DATA2 | free PA1 ]
donor: [ DATA1 | used PA1 ]
3-2 Delete donor inode
orig : [ DATA2 | free PA1 ]
donor: [ FREE SPACE(DATA1) | FREE SPACE(used PA1) ]
4. The double-free of blocks is occurred, when close() is called to
orig inode. Because ext4_discard_preallocations() for orig inode
frees used PA1 and free PA1, though used PA1 is already freed in 3.
4-1 Double-free of blocks is occurred
orig : [ DATA2 | FREE SPACE(free PA1) ]
donor: [ FREE SPACE(DATA1) | DOUBLE FREE(used PA1) ]
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit e3bb52ae2bb9573e84c17b8e3560378d13a5c798)
Users on the linux-ext4 list recently complained about differences
across filesystems w.r.t. how to mount without a journal replay.
In the discussion it was noted that xfs's "norecovery" option is
perhaps more descriptively accurate than "noload," so let's make
that an alias for ext4.
Also show this status in /proc/mounts
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 5328e635315734d42080de9a5a1ee87bf4cae0a4)
It is anticipated that when sb_issue_discard starts doing
real work on trim-capable devices, we may see issues. Make
this mount-time optional, and default it to off until we know
that things are working out OK.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 2bba702d4f88d7b010ec37e2527b552588404ae7)
When an error happened in ext4_splice_branch we failed to notice that
in ext4_ind_get_blocks and mapped the buffer anyway. Fix the problem
by checking for error properly.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 6b17d902fdd241adfa4ce780df20547b28bf5801)
We don't to issue an I/O barrier on an error or if we force commit
because we are doing data journaling.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 1032988c71f3f85483b2b4319684d1205a704c02)
The block validity checks used by ext4_data_block_valid() wasn't
correctly written to check file systems with the meta_bg feature. Fix
this.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 8dadb198cb70ef811916668fe67eeec82e8858dd)
The number of old-style block group descriptor blocks is
s_meta_first_bg when the meta_bg feature flag is set.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 3f8fb9490efbd300887470a2a880a64e04dcc3f5)
commit a71ce8c6c9bf269b192f352ea555217815cf027e updated ext4_statfs()
to update the on-disk superblock counters, but modified this buffer
directly without any journaling of the change. This is one of the
accesses that was causing the crc errors in journal replay as seen in
kernel.org bugzilla #14354.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 86ebfd08a1930ccedb8eac0aeb1ed4b8b6a41dbc)
ext4_xattr_set_handle() was zeroing out an inode outside
of journaling constraints; this is one of the accesses that
was causing the crc errors in journal replay as seen in
kernel.org bugzilla #14354.
Reviewed-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 30c6e07a92ea4cb87160d32ffa9bce172576ae4c)
We need to be testing the i_flags field in the ext4 specific portion
of the inode, instead of the (confusingly aliased) i_flags field in
the generic struct inode.
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 50689696867d95b38d9c7be640a311494a04fb86)
When an inode gets unlinked, the functions ext4_clear_blocks() and
ext4_remove_blocks() call ext4_forget() for all the buffer heads
corresponding to the deleted inode's data blocks. If the inode is a
directory or a symlink, the is_metadata parameter must be non-zero so
ext4_forget() will revoke them via jbd2_journal_revoke(). Otherwise,
if these blocks are reused for a data file, and the system crashes
before a journal checkpoint, the journal replay could end up
corrupting these data blocks.
Thanks to Curt Wohlgemuth for pointing out potential problems in this
area.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 567f3e9a70d71e5c9be03701b8578be77857293b)
One of the invalid error paths in ext4_iget() forgot to brelse() the
inode buffer head. Fix it by adding a brelse() in the common error
return path, which also simplifies function.
Thanks to Andi Kleen <ak@linux.intel.com> reporting the problem.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 49bd22bc4d603a2a4fc2a6a60e156cbea52eb494)
If CONFIG_PROVE_LOCKING is enabled, the double_down_write_data_sem()
will trigger a false-positive warning of a recursive lock. Since we
take i_data_sem for the two inodes ordered by their inode numbers,
this isn't a problem. Use of down_write_nested() will notify the lock
dependency checker machinery that there is no problem here.
This problem was reported by Brian Rogers:
http://marc.info/?l=linux-ext4&m=125115356928011&w=1
Reported-by: Brian Rogers <brian@xyzw.org>
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit fc04cb49a898c372a22b21fffc47f299d8710801)
ext4_move_extents() checks the logical block contiguousness
of original file with ext4_find_extent() and mext_next_extent().
Therefore the extent which ext4_ext_path structure indicates
must not be changed between above functions.
But in current implementation, there is no i_data_sem protection
between ext4_ext_find_extent() and mext_next_extent(). So the extent
which ext4_ext_path structure indicates may be overwritten by
delalloc. As a result, ext4_move_extents() will exchange wrong blocks
between original and donor files. I change the place where
acquire/release i_data_sem to solve this problem.
Moreover, I changed move_extent_per_page() to start transaction first,
and then acquire i_data_sem. Without this change, there is a
possibility of the deadlock between mmap() and ext4_move_extents():
* NOTE: "A", "B" and "C" mean different processes
A-1: ext4_ext_move_extents() acquires i_data_sem of two inodes.
B: do_page_fault() starts the transaction (T),
and then tries to acquire i_data_sem.
But process "A" is already holding it, so it is kept waiting.
C: While "A" and "B" running, kjournald2 tries to commit transaction (T)
but it is under updating, so kjournald2 waits for it.
A-2: Call ext4_journal_start with holding i_data_sem,
but transaction (T) is locked.
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit f868a48d06f8886cb0367568a12367fa4f21ea0d)
If the EXT4_IOC_MOVE_EXT ioctl fails, the number of blocks that were
exchanged before the failure should be returned to the userspace
caller. Unfortunately, currently if the block size is not the same as
the page size, the returned block count that is returned is the
page-aligned block count instead of the actual block count. This
commit addresses this bug.
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 503358ae01b70ce6909d19dd01287093f6b6271c)
If s_log_groups_per_flex is greater than 31, then groups_per_flex will
will overflow and cause a divide by zero error. This can cause kernel
BUG if such a file system is mounted.
Thanks to Nageswara R Sastry for analyzing the failure and providing
an initial patch.
http://bugzilla.kernel.org/show_bug.cgi?id=14287
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from commit 2de770a406b06dfc619faabbf5d85c835ed3f2e1)
Previously add_dirent_to_buf() did not free its passed-in buffer head
in the case of ENOSPC, since in some cases the caller still needed it.
However, this led to potential buffer head leaks since not all callers
dealt with this correctly. Fix this by making simplifying the freeing
convention; now add_dirent_to_buf() *never* frees the passed-in buffer
head, and leaves that to the responsibility of its caller. This makes
things cleaner and easier to prove that the code is neither leaking
buffer heads or calling brelse() one time too many.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
This is a partial revert of commit 6487a9d (only the changes made to
fs/ext4/namei.c), since it is causing the following brelse()
double-free warning when running fsstress on a file system with 1k
blocksize and we run into a block allocation failure while converting
a single-block directory to a multi-block hash-tree indexed directory.
WARNING: at fs/buffer.c:1197 __brelse+0x2e/0x33()
Hardware name:
VFS: brelse: Trying to free free buffer
Modules linked in:
Pid: 2226, comm: jbd2/sdd-8 Not tainted 2.6.32-rc6-00577-g0003f55 #101
Call Trace:
[<c01587fb>] warn_slowpath_common+0x65/0x95
[<c0158869>] warn_slowpath_fmt+0x29/0x2c
[<c021168e>] __brelse+0x2e/0x33
[<c0288a9f>] jbd2_journal_refile_buffer+0x67/0x6c
[<c028a9ed>] jbd2_journal_commit_transaction+0x319/0x14d8
[<c0164d73>] ? try_to_del_timer_sync+0x58/0x60
[<c0175bcc>] ? sched_clock_cpu+0x12a/0x13e
[<c017f6b4>] ? trace_hardirqs_off+0xb/0xd
[<c0175c1f>] ? cpu_clock+0x3f/0x5b
[<c017f6ec>] ? lock_release_holdtime+0x36/0x137
[<c0664ad0>] ? _spin_unlock_irqrestore+0x44/0x51
[<c0180af3>] ? trace_hardirqs_on_caller+0x103/0x124
[<c0180b1f>] ? trace_hardirqs_on+0xb/0xd
[<c0164d73>] ? try_to_del_timer_sync+0x58/0x60
[<c0290d1c>] kjournald2+0x11a/0x310
[<c017118e>] ? autoremove_wake_function+0x0/0x38
[<c0290c02>] ? kjournald2+0x0/0x310
[<c0170ee6>] kthread+0x66/0x6b
[<c0170e80>] ? kthread+0x0/0x6b
[<c01251b3>] kernel_thread_helper+0x7/0x10
---[ end trace 5579351b86af61e3 ]---
Commit 6487a9d was an attempt some buffer head leaks in an ENOSPC
error path, but in some cases it actually results in an excess ENOSPC,
as shown above. Fixing this means cleaning up who is responsible for
releasing the buffer heads from the callee to the caller of
add_dirent_to_buf().
Since that's a relatively complex change, and we're late in the rcX
development cycle, I'm reverting this now, and holding back a more
complete fix until after 2.6.32 ships. We've lived with this
buffer_head leak on ENOSPC in ext3 and ext4 for a very long time; a
few more months won't kill us.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Curt Wohlgemuth <curtw@google.com>
|
|
To prepare for a direct I/O write, we need to split the unwritten
extents before submitting the I/O. When no extents needed to be
split, ext4_split_unwritten_extents() was incorrectly returning 0
instead of the size of uninitialized extents. This bug caused the
wrong return value sent back to VFS code when it gets called from
async IO path, leading to an unnecessary fall back to buffered IO.
This bug also hid the fact that the check to see whether or not a
split would be necessary was incorrect; we can only skip splitting the
extent if the write completely covers the uninitialized extent.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The ext4_debug() call in ext4_end_io_dio() should be moved after the
check to make sure that io_end is non-NULL.
The comment above ext4_get_block_dio_write() ("Maximum number of
blocks...") is a duplicate; the original and correct comment is above
the #define DIO_MAX_BLOCKS up above.
Based on review comments from Curt Wohlgemuth.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
At the end of direct I/O operation, ext4_ext_direct_IO() always called
ext4_convert_unwritten_extents(), regardless of whether there were any
unwritten extents involved in the I/O or not.
This commit adds a state flag so that ext4_ext_direct_IO() only calls
ext4_convert_unwritten_extents() when necessary.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
After a direct I/O request covering an uninitalized extent (i.e.,
created using the fallocate system call) or a hole in a file, ext4
will convert the uninitialized extent so it is marked as initialized
by calling ext4_convert_unwritten_extents(). This function returns
zero on success.
This return value was getting returned by ext4_direct_IO(); however
the file system's direct_IO function is supposed to return the number
of bytes read or written on a success. By returning zero, it confused
the direct I/O code into falling back to buffered I/O unnecessarily.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Th |