aboutsummaryrefslogtreecommitdiff
path: root/fs/ext4
AgeCommit message (Collapse)Author
2014-02-20ext4: protect group inode free counting with group lockTao Ma
commit 6f2e9f0e7d795214b9cf5a47724a273b705fd113 upstream. Now when we set the group inode free count, we don't have a proper group lock so that multiple threads may decrease the inode free count at the same time. And e2fsck will complain something like: Free inodes count wrong for group #1 (1, counted=0). Fix? no Free inodes count wrong for group #2 (3, counted=0). Fix? no Directories count wrong for group #2 (780, counted=779). Fix? no Free inodes count wrong for group #3 (2272, counted=2273). Fix? no So this patch try to protect it with the ext4_lock_group. btw, it is found by xfstests test case 269 and the volume is mkfsed with the parameter "-O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr" and I have run it 100 times and the error in e2fsck doesn't show up again. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-08ext4: check for overlapping extents in ext4_valid_extent_entries()Eryu Guan
commit 5946d089379a35dda0e531710b48fca05446a196 upstream. A corrupted ext4 may have out of order leaf extents, i.e. extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT ^^^^ overlap with previous extent Reading such extent could hit BUG_ON() in ext4_es_cache_extent(). BUG_ON(end < lblk); The problem is that __read_extent_tree_block() tries to cache holes as well but assumes 'lblk' is greater than 'prev' and passes underflowed length to ext4_es_cache_extent(). Fix it by checking for overlapping extents in ext4_valid_extent_entries(). I hit this when fuzz testing ext4, and am able to reproduce it by modifying the on-disk extent by hand. Also add the check for (ee_block + len - 1) in ext4_valid_extent() to make sure the value is not overflow. Ran xfstests on patched ext4 and no regression. Cc: Lukáš Czerner <lczerner@redhat.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-08ext4: fix use-after-free in ext4_mb_new_blocksJunho Ryu
commit 4e8d2139802ce4f41936a687f06c560b12115247 upstream. ext4_mb_put_pa should hold pa->pa_lock before accessing pa->pa_count. While ext4_mb_use_preallocated checks pa->pa_deleted first and then increments pa->count later, ext4_mb_put_pa decrements pa->pa_count before holding pa->pa_lock and then sets pa->pa_deleted. * Free sequence ext4_mb_put_pa (1): atomic_dec_and_test pa->pa_count ext4_mb_put_pa (2): lock pa->pa_lock ext4_mb_put_pa (3): check pa->pa_deleted ext4_mb_put_pa (4): set pa->pa_deleted=1 ext4_mb_put_pa (5): unlock pa->pa_lock ext4_mb_put_pa (6): remove pa from a list ext4_mb_pa_callback: free pa * Use sequence ext4_mb_use_preallocated (1): iterate over preallocation ext4_mb_use_preallocated (2): lock pa->pa_lock ext4_mb_use_preallocated (3): check pa->pa_deleted ext4_mb_use_preallocated (4): increase pa->pa_count ext4_mb_use_preallocated (5): unlock pa->pa_lock ext4_mb_release_context: access pa * Use-after-free sequence [initial status] <pa->pa_deleted = 0, pa_count = 1> ext4_mb_use_preallocated (1): iterate over preallocation ext4_mb_use_preallocated (2): lock pa->pa_lock ext4_mb_use_preallocated (3): check pa->pa_deleted ext4_mb_put_pa (1): atomic_dec_and_test pa->pa_count [pa_count decremented] <pa->pa_deleted = 0, pa_count = 0> ext4_mb_use_preallocated (4): increase pa->pa_count [pa_count incremented] <pa->pa_deleted = 0, pa_count = 1> ext4_mb_use_preallocated (5): unlock pa->pa_lock ext4_mb_put_pa (2): lock pa->pa_lock ext4_mb_put_pa (3): check pa->pa_deleted ext4_mb_put_pa (4): set pa->pa_deleted=1 [race condition!] <pa->pa_deleted = 1, pa_count = 1> ext4_mb_put_pa (5): unlock pa->pa_lock ext4_mb_put_pa (6): remove pa from a list ext4_mb_pa_callback: free pa ext4_mb_release_context: access pa AddressSanitizer has detected use-after-free in ext4_mb_new_blocks Bug report: http://goo.gl/rG1On3 Signed-off-by: Junho Ryu <jayr@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-04ext4: avoid bh leak in retry path of ext4_expand_extra_isize_ea()Theodore Ts'o
commit dcb9917ba041866686fe152850364826c4622a36 upstream. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-22ext4: fix memory leak in xattrDave Jones
commit 6e4ea8e33b2057b85d75175dd89b93f5e26de3bc upstream. If we take the 2nd retry path in ext4_expand_extra_isize_ea, we potentionally return from the function without having freed these allocations. If we don't do the return, we over-write the previous allocation pointers, so we leak either way. Spotted with Coverity. [ Fixed by tytso to set is and bs to NULL after freeing these pointers, in case in the retry loop we later end up triggering an error causing a jump to cleanup, at which point we could have a double free bug. -- Ted ] Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13ext4: avoid hang when mounting non-journal filesystems with orphan listTheodore Ts'o
commit 0e9a9a1ad619e7e987815d20262d36a2f95717ca upstream. When trying to mount a file system which does not contain a journal, but which does have a orphan list containing an inode which needs to be truncated, the mount call with hang forever in ext4_orphan_cleanup() because ext4_orphan_del() will return immediately without removing the inode from the orphan list, leading to an uninterruptible loop in kernel code which will busy out one of the CPU's on the system. This can be trivially reproduced by trying to mount the file system found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs source tree. If a malicious user were to put this on a USB stick, and mount it on a Linux desktop which has automatic mounts enabled, this could be considered a potential denial of service attack. (Not a big deal in practice, but professional paranoids worry about such things, and have even been known to allocate CVE numbers for such problems.) -js: This is a fix for CVE-2013-2015. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-20jbd2: Fix use after free after error in jbd2_journal_dirty_metadata()Jan Kara
commit 91aa11fae1cf8c2fd67be0609692ea9741cdcc43 upstream. When jbd2_journal_dirty_metadata() returns error, __ext4_handle_dirty_metadata() stops the handle. However callers of this function do not count with that fact and still happily used now freed handle. This use after free can result in various issues but very likely we oops soon. The motivation of adding __ext4_journal_stop() into __ext4_handle_dirty_metadata() in commit 9ea7a0df seems to be only to improve error reporting. So replace __ext4_journal_stop() with ext4_journal_abort_handle() which was there before that commit and add WARN_ON_ONCE() to dump stack to provide useful information. Reported-by: Sage Weil <sage@inktank.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-14ext4: fix mount/remount error messages for incompatible mount optionsPiotr Sarna
commit 6ae6514b33f941d3386da0dfbe2942766eab1577 upstream. Commit 5688978 ("ext4: improve handling of conflicting mount options") introduced incorrect messages shown while choosing wrong mount options. First of all, both cases of incorrect mount options, "data=journal,delalloc" and "data=journal,dioread_nolock" result in the same error message. Secondly, the problem above isn't solved for remount option: the mismatched parameter is simply ignored. Moreover, ext4_msg states that remount with options "data=journal,delalloc" succeeded, which is not true. To fix it up, I added a simple check after parse_options() call to ensure that data=journal and delalloc/dioread_nolock parameters are not present at the same time. Signed-off-by: Piotr Sarna <p.sarna@partner.samsung.com> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-14ext4: make sure group number is bumped after a inode allocation raceTheodore Ts'o
commit a34eb503742fd25155fd6cff6163daacead9fbc3 upstream. When we try to allocate an inode, and there is a race between two CPU's trying to grab the same inode, _and_ this inode is the last free inode in the block group, make sure the group number is bumped before we continue searching the rest of the block groups. Otherwise, we end up searching the current block group twice, and we end up skipping searching the last block group. So in the unlikely situation where almost all of the inodes are allocated, it's possible that we will return ENOSPC even though there might be free inodes in that last block group. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ext4: don't allow ext4_free_blocks() to fail due to ENOMEMTheodore Ts'o
commit e7676a704ee0a1ef71a6b23760b5a8f6896cb1a1 upstream. The filesystem should not be marked inconsistent if ext4_free_blocks() is not able to allocate memory. Unfortunately some callers (most notably ext4_truncate) don't have a way to reflect an error back up to the VFS. And even if we did, most userspace applications won't deal with most system calls returning ENOMEM anyway. Reported-by: Nagachandra P <nagachandra@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ext4: fix overflow when counting used blocks on 32-bit architecturesJan Kara
commit 8af8eecc1331dbf5e8c662022272cf667e213da5 upstream. The arithmetics adding delalloc blocks to the number of used blocks in ext4_getattr() can easily overflow on 32-bit archs as we first multiply number of blocks by blocksize and then divide back by 512. Make the arithmetics more clever and also use proper type (unsigned long long instead of unsigned long). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ext4: fix data offset overflow in ext4_xattr_fiemap() on 32-bit archsJan Kara
commit a60697f411eb365fb09e639e6f183fe33d1eb796 upstream. On 32-bit architectures with 32-bit sector_t computation of data offset in ext4_xattr_fiemap() can overflow resulting in reporting bogus data location. Fix the problem by typing block number to proper type before shifting. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree()Al Viro
commit 64cb927371cd2ec43758d8a094a003d27bc3d0dc upstream. Both ext3 and ext4 htree_dirblock_to_tree() is just filling the in-core rbtree for use by call_filldir(). All updates of ->f_pos are done by the latter; bumping it here (on error) is obviously wrong - we might very well have it nowhere near the block we'd found an error in. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-19ext4: limit group search loop for non-extent filesLachlan McIlroy
commit e6155736ad76b2070652745f9e54cdea3f0d8567 upstream. In the case where we are allocating for a non-extent file, we must limit the groups we allocate from to those below 2^32 blocks, and ext4_mb_regular_allocator() attempts to do this initially by putting a cap on ngroups for the subsequent search loop. However, the initial target group comes in from the allocation context (ac), and it may already be beyond the artificially limited ngroups. In this case, the limit if (group == ngroups) group = 0; at the top of the loop is never true, and the loop will run away. Catch this case inside the loop and reset the search to start at group 0. [sandeen@redhat.com: add commit msg & comments] Signed-off-by: Lachlan McIlroy <lmcilroy@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-11ext4: add check for inodes_count overflow in new resize ioctlTheodore Ts'o
commit 3f8a6411fbada1fa482276591e037f3b1adcf55b upstream. Addresses-Red-Hat-Bugzilla: #913245 Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Lingzhu Xiang <lxiang@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-07ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUGTheodore Ts'o
commit 7f3e3c7cfcec148ccca9c0dd2dbfd7b00b7ac10f upstream. Fox the Kconfig documentation for CONFIG_EXT4_DEBUG to match the change made by commit a0b30c1229: ext4: use module parameters instead of debugfs for mballoc_debug Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-07ext4: fix online resizing for ext3-compat file systemsTheodore Ts'o
commit c5c72d814cf0f650010337c73638b25e6d14d2d4 upstream. Commit fb0a387dcdc restricts block allocations for indirect-mapped files to block groups less than s_blockfile_groups. However, the online resizing code wasn't setting s_blockfile_groups, so the newly added block groups were not available for non-extent mapped files. Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-07ext4: fix journal callback list traversalDmitry Monakhov
commit 5d3ee20855e28169d711b394857ee608a5023094 upstream. It is incorrect to use list_for_each_entry_safe() for journal callback traversial because ->next may be removed by other task: ->ext4_mb_free_metadata() ->ext4_mb_free_metadata() ->ext4_journal_callback_del() This results in the following issue: WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250() Hardware name: list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod Pid: 16400, comm: jbd2/dm-1-8 Tainted: G W 3.8.0-rc3+ #107 Call Trace: [<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0 [<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50 [<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0 [<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250 [<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0 [<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570 [<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0 [<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0 [<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0 [<ffffffff810ad630>] ? wake_up_bit+0x40/0x40 [<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80 [<ffffffff810ac6be>] kthread+0x10e/0x120 [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70 [<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0 [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70 This patch fix the issue as follows: - ext4_journal_commit_callback() make list truly traversial safe simply by always starting from list_head - fix race between two ext4_journal_callback_del() and ext4_journal_callback_try_del() Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-05ext4: use atomic64_t for the per-flexbg free_clusters countTheodore Ts'o
commit 90ba983f6889e65a3b506b30dc606aa9d1d46cd2 upstream. A user who was using a 8TB+ file system and with a very large flexbg size (> 65536) could cause the atomic_t used in the struct flex_groups to overflow. This was detected by PaX security patchset: http://forums.grsecurity.net/viewtopic.php?f=3&t=3289&p=12551#p12551 This bug was introduced in commit 9f24e4208f7e, so it's been around since 2.6.30. :-( Fix this by using an atomic64_t for struct orlav_stats's free_clusters. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Lingzhu Xiang <lxiang@redhat.com> Reviewed-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-05ext4: convert number of blocks to clusters properlyLukas Czerner
commit 810da240f221d64bf90020f25941b05b378186fe upstream. We're using macro EXT4_B2C() to convert number of blocks to number of clusters for bigalloc file systems. However, we should be using EXT4_NUM_B2C(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Lingzhu Xiang <lxiang@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-28ext4: fix data=journal fast mount/umount hangTheodore Ts'o
commit 2b405bfa84063bfa35621d2d6879f52693c614b0 upstream. In data=journal mode, if we unmount the file system before a transaction has a chance to complete, when the journal inode is being evicted, we can end up calling into jbd2_log_wait_commit() for the last transaction, after the journalling machinery has been shut down. Arguably we should adjust ext4_should_journal_data() to return FALSE for the journal inode, but the only place it matters is ext4_evict_inode(), and so to save a bit of CPU time, and to make the patch much more obviously correct by inspection(tm), we'll fix it by explicitly not trying to waiting for a journal commit when we are evicting the journal inode, since it's guaranteed to never succeed in this case. This can be easily replicated via: mount -t ext4 -o data=journal /dev/vdb /vdb ; umount /vdb ------------[ cut here ]------------ WARNING: at /usr/projects/linux/ext4/fs/jbd2/journal.c:542 __jbd2_log_start_commit+0xba/0xcd() Hardware name: Bochs JBD2: bad log_start_commit: 3005630206 3005630206 0 0 Modules linked in: Pid: 2909, comm: umount Not tainted 3.8.0-rc3 #1020 Call Trace: [<c015c0ef>] warn_slowpath_common+0x68/0x7d [<c02b7e7d>] ? __jbd2_log_start_commit+0xba/0xcd [<c015c177>] warn_slowpath_fmt+0x2b/0x2f [<c02b7e7d>] __jbd2_log_start_commit+0xba/0xcd [<c02b8075>] jbd2_log_start_commit+0x24/0x34 [<c0279ed5>] ext4_evict_inode+0x71/0x2e3 [<c021f0ec>] evict+0x94/0x135 [<c021f9aa>] iput+0x10a/0x110 [<c02b7836>] jbd2_journal_destroy+0x190/0x1ce [<c0175284>] ? bit_waitqueue+0x50/0x50 [<c028d23f>] ext4_put_super+0x52/0x294 [<c020efe3>] generic_shutdown_super+0x48/0xb4 [<c020f071>] kill_block_super+0x22/0x60 [<c020f3e0>] deactivate_locked_super+0x22/0x49 [<c020f5d6>] deactivate_super+0x30/0x33 [<c0222795>] mntput_no_expire+0x107/0x10c [<c02233a7>] sys_umount+0x2cf/0x2e0 [<c02233ca>] sys_oldumount+0x12/0x14 [<c08096b8>] syscall_call+0x7/0xb ---[ end trace 6a954cc790501c1f ]--- jbd2_log_wait_commit: error: j_commit_request=-1289337090, tid=0 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-28ext4: fix the wrong number of the allocated blocks in ext4_split_extent()Zheng Liu
commit 3a2256702e47f68f921dfad41b1764d05c572329 upstream. This commit fixes a wrong return value of the number of the allocated blocks in ext4_split_extent. When the length of blocks we want to allocate is greater than the length of the current extent, we return a wrong number. Let's see what happens in the following case when we call ext4_split_extent(). map: [48, 72] ex: [32, 64, u] 'ex' will be split into two parts: ex1: [32, 47, u] ex2: [48, 64, w] 'map->m_len' is returned from this function, and the value is 24. But the real length is 16. So it should be fixed. Meanwhile in this commit we use right length of the allocated blocks when get_reserved_cluster_alloc in ext4_ext_handle_uninitialized_extents is called. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-04ext4: fix free clusters calculation in bigalloc filesystemLukas Czerner
commit 304e220f0879198b1f5309ad6f0be862b4009491 upstream. ext4_has_free_clusters() should tell us whether there is enough free clusters to allocate, however number of free clusters in the file system is converted to blocks using EXT4_C2B() which is not only wrong use of the macro (we should have used EXT4_NUM_B2C) but it's also completely wrong concept since everything else is in cluster units. Moreover when calculating number of root clusters we should be using macro EXT4_NUM_B2C() instead of EXT4_B2C() otherwise the result might be off by one. However r_blocks_count should always be a multiple of the cluster ratio so doing a plain bit shift should be enough here. We avoid using EXT4_B2C() because it's confusing. As a result of the first problem number of free clusters is much bigger than it should have been and ext4_has_free_clusters() would return 1 even if there is really not enough free clusters available. Fix this by removing the EXT4_C2B() conversion of free clusters and using bit shift when calculating number of root clusters. This bug affects number of xfstests tests covering file system ENOSPC situation handling. With this patch most of the ENOSPC problems with bigalloc file system disappear, especially the errors caused by delayed allocation not having enough space when the actual allocation is finally requested. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-04ext4: fix xattr block allocation/release with bigallocLukas Czerner
commit 1231b3a1eb5740192aeebf5344dd6d6da000febf upstream. Currently when new xattr block is created or released we we would call dquot_free_block() or dquot_alloc_block() respectively, among the else decrementing or incrementing the number of blocks assigned to the inode by one block. This however does not work for bigalloc file system because we always allocate/free the whole cluster so we have to count with that in dquot_free_block() and dquot_alloc_block() as well. Use the clusters-to-blocks conversion EXT4_C2B() when passing number of blocks to the dquot_alloc/free functions to fix the problem. The problem has been revealed by xfstests #117 (and possibly others). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-04ext4: fix race in ext4_mb_add_n_trim()Niu Yawei
commit f1167009711032b0d747ec89a632a626c901a1ad upstream. In ext4_mb_add_n_trim(), lg_prealloc_lock should be taken when changing the lg_prealloc_list. Signed-off-by: Niu Yawei <yawei.niu@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-04ext4: check bh in ext4_read_block_bitmap()Eryu Guan
commit 15b49132fc972c63894592f218ea5a9a61b1a18f upstream. Validate the bh pointer before using it, since ext4_read_block_bitmap_nowait() might return NULL. I've seen this in fsfuzz testing. EXT4-fs error (device loop0): ext4_read_block_bitmap_nowait:385: comm touch: Cannot get buffer for block bitmap - block_group = 0, block_bitmap = 3925999616 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8121de25>] ext4_wait_block_bitmap+0x25/0xe0 ... Call Trace: [<ffffffff8121e1e5>] ext4_read_block_bitmap+0x35/0x60 [<ffffffff8125e9c6>] ext4_free_blocks+0x236/0xb80 [<ffffffff811d0d36>] ? __getblk+0x36/0x70 [<ffffffff811d0a5f>] ? __find_get_block+0x8f/0x210 [<ffffffff81191ef3>] ? kmem_cache_free+0x33/0x140 [<ffffffff812678e5>] ext4_xattr_release_block+0x1b5/0x1d0 [<ffffffff812679be>] ext4_xattr_delete_inode+0xbe/0x100 [<ffffffff81222a7c>] ext4_free_inode+0x7c/0x4d0 [<ffffffff812277b8>] ? ext4_mark_inode_dirty+0x88/0x230 [<ffffffff8122993c>] ext4_evict_inode+0x32c/0x490 [<ffffffff811b8cd7>] evict+0xa7/0x1c0 [<ffffffff811b8ed3>] iput_final+0xe3/0x170 [<ffffffff811b8f9e>] iput+0x3e/0x50 [<ffffffff812316fd>] ext4_add_nondir+0x4d/0x90 [<ffffffff81231d0b>] ext4_create+0xeb/0x170 [<ffffffff811aae9c>] vfs_create+0xac/0xd0 [<ffffffff811ac845>] lookup_open+0x185/0x1c0 [<ffffffff8129e3b9>] ? selinux_inode_permission+0xa9/0x170 [<ffffffff811acb54>] do_last+0x2d4/0x7a0 [<ffffffff811af743>] path_openat+0xb3/0x480 [<ffffffff8116a8a1>] ? handle_mm_fault+0x251/0x3b0 [<ffffffff811afc49>] do_filp_open+0x49/0xa0 [<ffffffff811bbaad>] ? __alloc_fd+0xdd/0x150 [<ffffffff8119da28>] do_sys_open+0x108/0x1f0 [<ffffffff8119db51>] sys_open+0x21/0x30 [<ffffffff81618959>] system_call_fastpath+0x16/0x1b Also fix comment for ext4_read_block_bitmap_nowait() Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21ext4: init pagevec in ext4_da_block_invalidatepagesEric Sandeen
commit 66bea92c69477a75a5d37b9bfed5773c92a3c4b4 upstream. ext4_da_block_invalidatepages is missing a pagevec_init(), which means that pvec->cold contains random garbage. This affects whether the page goes to the front or back of the LRU when ->cold makes it to free_hot_cold_page() Reviewed-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ext4: lock i_mutex when truncating orphan inodesTheodore Ts'o
commit 721e3eba21e43532e438652dd8f1fcdfce3187e7 upstream. Commit c278531d39 added a warning when ext4_flush_unwritten_io() is called without i_mutex being taken. It had previously not been taken during orphan cleanup since races weren't possible at that point in the mount process, but as a result of this c278531d39, we will now see a kernel WARN_ON in this case. Take the i_mutex in ext4_orphan_cleanup() to suppress this warning. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ext4: do not try to write superblock on ro remount w/o journalMichael Tokarev
commit d096ad0f79a782935d2e06ae8fb235e8c5397775 upstream. When a journal-less ext4 filesystem is mounted on a read-only block device (blockdev --setro will do), each remount (for other, unrelated, flags, like suid=>nosuid etc) results in a series of scary messages from kernel telling about I/O errors on the device. This is becauese of the following code ext4_remount(): if (sbi->s_journal == NULL) ext4_commit_super(sb, 1); at the end of remount procedure, which forces writing (flushing) of a superblock regardless whenever it is dirty or not, if the filesystem is readonly or not, and whenever the device itself is readonly or not. We only need call ext4_commit_super when the file system had been previously mounted read/write. Thanks to Eric Sandeen for help in diagnosing this issue. Signed-off-By: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ext4: check dioread_nolock on remountJan Kara
commit 261cb20cb2f0737a247aaf08dff7eb065e3e5b66 upstream. Currently we allow enabling dioread_nolock mount option on remount for filesystems where blocksize < PAGE_CACHE_SIZE. This isn't really supported so fix the bug by moving the check for blocksize != PAGE_CACHE_SIZE into parse_options(). Change the original PAGE_SIZE to PAGE_CACHE_SIZE along the way because that's what we are really interested in. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ext4: fix extent tree corruption caused by hole punchForrest Liu
commit c36575e663e302dbaa4d16b9c72d2c9a913a9aef upstream. When depth of extent tree is greater than 1, logical start value of interior node is not correctly updated in ext4_ext_rm_idx. Signed-off-by: Forrest Liu <forrestl@synology.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Ashish Sangwan <ashishsangwan2@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ext4: fix possible use after free with metadata csumTheodore Ts'o
commit aeb1e5d69a5be592e86a926be73efb38c55af404 upstream. Commit fa77dcfafeaa introduces block bitmap checksum calculation into ext4_new_inode() in the case that block group was uninitialized. However we brelse() the bitmap buffer before we attempt to checksum it so we have no guarantee that the buffer is still there. Fix this by releasing the buffer after the possible checksum computation. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17ext4: fix memory leak in ext4_xattr_set_acl()'s error pathEugene Shatokhin
commit 24ec19b0ae83a385ad9c55520716da671274b96c upstream. In ext4_xattr_set_acl(), if ext4_journal_start() returns an error, posix_acl_release() will not be called for 'acl' which may result in a memory leak. This patch fixes that. Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-05ext4: fix unjournaled inode bitmap modificationEric Sandeen
commit ffb5387e85d528fb6d0d924abfa3fbf0fc484071 upstream. commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 changed ext4_new_inode() such that the inode bitmap was being modified outside a transaction, which could lead to corruption, and was discovered when journal_checksum found a bad checksum in the journal during log replay. Nix ran into this when using the journal_async_commit mount option, which enables journal checksumming. The ensuing journal replay failures due to the bad checksums led to filesystem corruption reported as the now infamous "Apparent serious progressive ext4 data corruption bug" [ Changed by tytso to only call ext4_journal_get_write_access() only when we're fairly certain that we're going to allocate the inode. ] I've tested this by mounting with journal_checksum and running fsstress then dropping power; I've also tested by hacking DM to create snapshots w/o first quiescing, which allows me to test journal replay repeatedly w/o actually power-cycling the box. Without the patch I hit a journal checksum error every time. With this fix it survives many iterations. Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-28ext4: Avoid underflow in ext4_trim_fs()Lukas Czerner
commit 5de35e8d5c02d271c20e18337e01bc20e6ef472e upstream. Currently if len argument in ext4_trim_fs() is smaller than one block, the 'end' variable underflow. Avoid that by returning EINVAL if len is smaller than file system block. Also remove useless unlikely(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-28ext4: race-condition protection for ext4_convert_unwritten_extents_endioDmitry Monakhov
commit dee1f973ca341c266229faa5a1a5bb268bed3531 upstream. We assumed that at the time we call ext4_convert_unwritten_extents_endio() extent in question is fully inside [map.m_lblk, map->m_len] because it was already split during submission. But this may not be true due to a race between writeback vs fallocate. If extent in question is larger than requested we will split it again. Special precautions should being done if zeroout required because [map.m_lblk, map->m_len] already contains valid data. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: fix fdatasync() for files with only i_size changesJan Kara
commit b71fc079b5d8f42b2a52743c8d2f1d35d655b1c5 upstream. Code tracking when transaction needs to be committed on fdatasync(2) forgets to handle a situation when only inode's i_size is changed. Thus in such situations fdatasync(2) doesn't force transaction with new i_size to disk and that can result in wrong i_size after a crash. Fix the issue by updating inode's i_datasync_tid whenever its size is updated. Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: always set i_op in ext4_mknod()Bernd Schubert
commit 6a08f447facb4f9e29fcc30fb68060bb5a0d21c2 upstream. ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR to mask those methods. And ext4_iget also always sets it, so there is an inconsistency. Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: online defrag is not supported for journaled filesDmitry Monakhov
commit f066055a3449f0e5b0ae4f3ceab4445bead47638 upstream. Proper block swap for inodes with full journaling enabled is truly non obvious task. In order to be on a safe side let's explicitly disable it for now. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: move_extent code cleanupDmitry Monakhov
commit 03bd8b9b896c8e940f282f540e6b4de90d666b7c upstream. - Remove usless checks, because it is too late to check that inode != NULL at the moment it was referenced several times. - Double lock routines looks very ugly and locking ordering relays on order of i_ino, but other kernel code rely on order of pointers. Let's make them simple and clean. - check that inodes belongs to the same SB as soon as possible. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: fix crash when accessing /proc/mounts concurrentlyHerton Ronaldo Krzesinski
commit 50df9fd55e4271e89a7adf3b1172083dd0ca199d upstream. The crash was caused by a variable being erronously declared static in token2str(). In addition to /proc/mounts, the problem can also be easily replicated by accessing /proc/fs/ext4/<partition>/options in parallel: $ cat /proc/fs/ext4/<partition>/options > options.txt ... and then running the following command in two different terminals: $ while diff /proc/fs/ext4/<partition>/options options.txt; do true; done This is also the cause of the following a crash while running xfstests #234, as reported in the following bug reports: https://bugs.launchpad.net/bugs/1053019 https://bugzilla.kernel.org/show_bug.cgi?id=47731 Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Brad Figg <brad.figg@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: fix potential deadlock in ext4_nonda_switch()Theodore Ts'o
commit 00d4e7362ed01987183e9528295de3213031309c upstream. In ext4_nonda_switch(), if the file system is getting full we used to call writeback_inodes_sb_if_idle(). The problem is that we can be holding i_mutex already, and this causes a potential deadlock when writeback_inodes_sb_if_idle() when it tries to take s_umount. (See lockdep output below). As it turns out we don't need need to hold s_umount; the fact that we are in the middle of the write(2) system call will keep the superblock pinned. Unfortunately writeback_inodes_sb() checks to make sure s_umount is taken, and the VFS uses a different mechanism for making sure the file system doesn't get unmounted out from under us. The simplest way of dealing with this is to just simply grab s_umount using a trylock, and skip kicking the writeback flusher thread in the very unlikely case that we can't take a read lock on s_umount without blocking. Also, we now check the cirteria for kicking the writeback thread before we decide to whether to fall back to non-delayed writeback, so if there are any outstanding delayed allocation writes, we try to get them resolved as soon as possible. [ INFO: possible circular locking dependency detected ] 3.6.0-rc1-00042-gce894ca #367 Not tainted ------------------------------------------------------- dd/8298 is trying to acquire lock: (&type->s_umount_key#18){++++..}, at: [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46 but task is already holding lock: (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3 which lock already depends on the new lock. 2 locks held by dd/8298: #0: (sb_writers#2){.+.+.+}, at: [<c01ddcc5>] generic_file_aio_write+0x56/0xd3 #1: (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3 stack backtrace: Pid: 8298, comm: dd Not tainted 3.6.0-rc1-00042-gce894ca #367 Call Trace: [<c015b79c>] ? console_unlock+0x345/0x372 [<c06d62a1>] print_circular_bug+0x190/0x19d [<c019906c>] __lock_acquire+0x86d/0xb6c [<c01999db>] ? mark_held_locks+0x5c/0x7b [<c0199724>] lock_acquire+0x66/0xb9 [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46 [<c06db935>] down_read+0x28/0x58 [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46 [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46 [<c026f3b2>] ext4_nonda_switch+0xe1/0xf4 [<c0271ece>] ext4_da_write_begin+0x27/0x193 [<c01dcdb0>] generic_file_buffered_write+0xc8/0x1bb [<c01ddc47>] __generic_file_aio_write+0x1dd/0x205 [<c01ddce7>] generic_file_aio_write+0x78/0xd3 [<c026d336>] ext4_file_write+0x480/0x4a6 [<c0198c1d>] ? __lock_acquire+0x41e/0xb6c [<c0180944>] ? sched_clock_cpu+0x11a/0x13e [<c01967e9>] ? trace_hardirqs_off+0xb/0xd [<c018099f>] ? local_clock+0x37/0x4e [<c0209f2c>] do_sync_write+0x67/0x9d [<c0209ec5>] ? wait_on_retry_sync_kiocb+0x44/0x44 [<c020a7b9>] vfs_write+0x7b/0xe6 [<c020a9a6>] sys_write+0x3b/0x64 [<c06dd4bd>] syscall_call+0x7/0xb Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: avoid duplicate writes of the backup bg descriptor blocksYongqiang Yang
commit 2ebd1704ded88a8ae29b5f3998b13959c715c4be upstream. The resize code was needlessly writing the backup block group descriptor blocks multiple times (once per block group) during an online resize. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: don't copy non-existent gdt blocks when resizingYongqiang Yang
commit 6df935ad2fced9033ab52078825fcaf6365f34b7 upstream. The resize code was copying blocks at the beginning of each block group in order to copy the superblock and block group descriptor table (gdt) blocks. This was, unfortunately, being done even for block groups that did not have super blocks or gdt blocks. This is a complete waste of perfectly good I/O bandwidth, to skip writing those blocks for sparse bg's. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13ext4: ignore last group w/o enough space when resizing instead of BUG'ingYongqiang Yang
commit 03c1c29053f678234dbd51bf3d65f3b7529021de upstream. If the last group does not have enough space for group tables, ignore it instead of calling BUG_ON(). Reported-by: Daniel Drake <dsd@laptop.org> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-26ext4: fix kernel BUG on large-scale rm -rf commandsTheodore Ts'o
commit 89a4e48f8479f8145eca9698f39fe188c982212f upstream. Commit 968dee7722: "ext4: fix hole punch failure when depth is greater than 0" introduced a regression in v3.5.1/v3.6-rc1 which caused kernel crashes when users ran run "rm -rf" on large directory hierarchy on ext4 filesystems on RAID devices: BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 Process rm (pid: 18229, threadinfo ffff8801276bc000, task ffff880123631710) Call Trace: [<ffffffff81236483>] ? __ext4_handle_dirty_metadata+0x83/0x110 [<ffffffff812353d3>] ext4_ext_truncate+0x193/0x1d0 [<ffffffff8120a8cf>] ? ext4_mark_inode_dirty+0x7f/0x1f0 [<ffffffff81207e05>] ext4_truncate+0xf5/0x100 [<ffffffff8120cd51>] ext4_evict_inode+0x461/0x490 [<ffffffff811a1312>] evict+0xa2/0x1a0 [<ffffffff811a1513>] iput+0x103/0x1f0 [<ffffffff81196d84>] do_unlinkat+0x154/0x1c0 [<ffffffff8118cc3a>] ? sys_newfstatat+0x2a/0x40 [<ffffffff81197b0b>] sys_unlinkat+0x1b/0x50 [<ffffffff816135e9>] system_call_fastpath+0x16/0x1b Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00 RIP [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0 This could be reproduced as follows: The problem in commit 968dee7722 was that caused the variable 'i' to be left uninitialized if the truncate required more space than was available in the journal. This resulted in the function ext4_ext_truncate_extend_restart() returning -EAGAIN, which caused ext4_ext_remove_space() to restart the truncate operation after starting a new jbd2 handle. Reported-by: Maciej Żenczykowski <maze@google.com> Reported-by: Marti Raudsepp <marti@juffo.org> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-26ext4: fix long mount times on very big file systemsTheodore Ts'o
commit 0548bbb85337e532ca2ed697c3e9b227ff2ed4b4 upstream. Commit 8aeb00ff85a: "ext4: fix overhead calculation used by ext4_statfs()" introduced a O(n**2) calculation which makes very large file systems take forever to mount. Fix this with an optimization for non-bigalloc file systems. (For bigalloc file systems the overhead needs to be set in the the superblock.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-26ext4: avoid kmemcheck complaint from reading uninitialized memoryTheodore Ts'o
commit 7e731bc9a12339f344cddf82166b82633d99dd86 upstream. Commit 03179fe923 introduced a kmemcheck complaint in ext4_da_get_block_prep() because we save and restore ei->i_da_metadata_calc_last_lblock even though it is left uninitialized in the case where i_da_metadata_calc_len is zero. This doesn't hurt anything, but silencing the kmemcheck complaint makes it easier for people to find real bugs. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631 (which is marked as a regression). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-26ext4: make sure the journal sb is written in ext4_clear_journal_err()Theodore Ts'o
commit d796c52ef0b71a988364f6109aeb63d79c5b116b upstream. After we transfer set the EXT4_ERROR_FS bit in the file system superblock, it's not enough to call jbd2_journal_clear_err() to clear the error indication from journal superblock --- we need to call jbd2_journal_update_sb_errno() as well. Otherwise, when the root file system is mounted read-only, the journal is replayed, and the error indicator is transferred to the superblock --- but the s_errno field in the jbd2 superblock is left set (since although we cleared it in memory, we never flushed it out to disk). This can end up confusing e2fsck. We should make e2fsck more robust in this case, but the kernel shouldn't be leaving things in this confused state, either. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09ext4: undo ext4_calc_metadata_amount if we fail to claim spaceTheodore Ts'o
commit 03179fe92318e7934c180d96f12eff2cb36ef7b6 upstream. The function ext4_calc_metadata_amount() has side effects, although it's not obvious from its function name. So if we fail to claim space, regardless of whether we retry to claim the space again, or return an error, we need to undo these side effects. Otherwise we can end up incorrectly calculating the number of metadata blocks needed for the operation, which was responsible for an xfstests failure for test #271 when using an ext2 file system with delalloc enabled. Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>