aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2009-12-14ext4: make "norecovery" an alias for "noload"Eric Sandeen
(cherry picked from commit e3bb52ae2bb9573e84c17b8e3560378d13a5c798) Users on the linux-ext4 list recently complained about differences across filesystems w.r.t. how to mount without a journal replay. In the discussion it was noted that xfs's "norecovery" option is perhaps more descriptively accurate than "noload," so let's make that an alias for ext4. Also show this status in /proc/mounts Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: make trim/discard optional (and off by default)Eric Sandeen
(cherry picked from commit 5328e635315734d42080de9a5a1ee87bf4cae0a4) It is anticipated that when sb_issue_discard starts doing real work on trim-capable devices, we may see issues. Make this mount-time optional, and default it to off until we know that things are working out OK. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: fix error handling in ext4_ind_get_blocks()Jan Kara
(cherry picked from commit 2bba702d4f88d7b010ec37e2527b552588404ae7) When an error happened in ext4_splice_branch we failed to notice that in ext4_ind_get_blocks and mapped the buffer anyway. Fix the problem by checking for error properly. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: avoid issuing unnecessary barriersTheodore Ts'o
(cherry picked from commit 6b17d902fdd241adfa4ce780df20547b28bf5801) We don't to issue an I/O barrier on an error or if we force commit because we are doing data journaling. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: fix block validity checks so they work correctly with meta_bgTheodore Ts'o
(cherry picked from commit 1032988c71f3f85483b2b4319684d1205a704c02) The block validity checks used by ext4_data_block_valid() wasn't correctly written to check file systems with the meta_bg feature. Fix this. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: fix uninit block bitmap initialization when s_meta_first_bg is non-zeroTheodore Ts'o
(cherry picked from commit 8dadb198cb70ef811916668fe67eeec82e8858dd) The number of old-style block group descriptor blocks is s_meta_first_bg when the meta_bg feature flag is set. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: don't update the superblock in ext4_statfs()Theodore Ts'o
(cherry picked from commit 3f8fb9490efbd300887470a2a880a64e04dcc3f5) commit a71ce8c6c9bf269b192f352ea555217815cf027e updated ext4_statfs() to update the on-disk superblock counters, but modified this buffer directly without any journaling of the change. This is one of the accesses that was causing the crc errors in journal replay as seen in kernel.org bugzilla #14354. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: journal all modifications in ext4_xattr_set_handleEric Sandeen
(cherry picked from commit 86ebfd08a1930ccedb8eac0aeb1ed4b8b6a41dbc) ext4_xattr_set_handle() was zeroing out an inode outside of journaling constraints; this is one of the accesses that was causing the crc errors in journal replay as seen in kernel.org bugzilla #14354. Reviewed-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: fix i_flags access in ext4_da_writepages_trans_blocks()Julia Lawall
(cherry picked from commit 30c6e07a92ea4cb87160d32ffa9bce172576ae4c) We need to be testing the i_flags field in the ext4 specific portion of the inode, instead of the (confusingly aliased) i_flags field in the generic struct inode. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: make sure directory and symlink blocks are revokedTheodore Ts'o
(cherry picked from commit 50689696867d95b38d9c7be640a311494a04fb86) When an inode gets unlinked, the functions ext4_clear_blocks() and ext4_remove_blocks() call ext4_forget() for all the buffer heads corresponding to the deleted inode's data blocks. If the inode is a directory or a symlink, the is_metadata parameter must be non-zero so ext4_forget() will revoke them via jbd2_journal_revoke(). Otherwise, if these blocks are reused for a data file, and the system crashes before a journal checkpoint, the journal replay could end up corrupting these data blocks. Thanks to Curt Wohlgemuth for pointing out potential problems in this area. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: plug a buffer_head leak in an error path of ext4_iget()Theodore Ts'o
(cherry picked from commit 567f3e9a70d71e5c9be03701b8578be77857293b) One of the invalid error paths in ext4_iget() forgot to brelse() the inode buffer head. Fix it by adding a brelse() in the common error return path, which also simplifies function. Thanks to Andi Kleen <ak@linux.intel.com> reporting the problem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: fix possible recursive locking warning in EXT4_IOC_MOVE_EXTAkira Fujita
(cherry picked from commit 49bd22bc4d603a2a4fc2a6a60e156cbea52eb494) If CONFIG_PROVE_LOCKING is enabled, the double_down_write_data_sem() will trigger a false-positive warning of a recursive lock. Since we take i_data_sem for the two inodes ordered by their inode numbers, this isn't a problem. Use of down_write_nested() will notify the lock dependency checker machinery that there is no problem here. This problem was reported by Brian Rogers: http://marc.info/?l=linux-ext4&m=125115356928011&w=1 Reported-by: Brian Rogers <brian@xyzw.org> Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: fix lock order problem in ext4_move_extents()Akira Fujita
(cherry picked from commit fc04cb49a898c372a22b21fffc47f299d8710801) ext4_move_extents() checks the logical block contiguousness of original file with ext4_find_extent() and mext_next_extent(). Therefore the extent which ext4_ext_path structure indicates must not be changed between above functions. But in current implementation, there is no i_data_sem protection between ext4_ext_find_extent() and mext_next_extent(). So the extent which ext4_ext_path structure indicates may be overwritten by delalloc. As a result, ext4_move_extents() will exchange wrong blocks between original and donor files. I change the place where acquire/release i_data_sem to solve this problem. Moreover, I changed move_extent_per_page() to start transaction first, and then acquire i_data_sem. Without this change, there is a possibility of the deadlock between mmap() and ext4_move_extents(): * NOTE: "A", "B" and "C" mean different processes A-1: ext4_ext_move_extents() acquires i_data_sem of two inodes. B: do_page_fault() starts the transaction (T), and then tries to acquire i_data_sem. But process "A" is already holding it, so it is kept waiting. C: While "A" and "B" running, kjournald2 tries to commit transaction (T) but it is under updating, so kjournald2 waits for it. A-2: Call ext4_journal_start with holding i_data_sem, but transaction (T) is locked. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: fix the returned block count if EXT4_IOC_MOVE_EXT failsAkira Fujita
(cherry picked from commit f868a48d06f8886cb0367568a12367fa4f21ea0d) If the EXT4_IOC_MOVE_EXT ioctl fails, the number of blocks that were exchanged before the failure should be returned to the userspace caller. Unfortunately, currently if the block size is not the same as the page size, the returned block count that is returned is the page-aligned block count instead of the actual block count. This commit addresses this bug. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: avoid divide by zero when trying to mount a corrupted file systemTheodore Ts'o
(cherry picked from commit 503358ae01b70ce6909d19dd01287093f6b6271c) If s_log_groups_per_flex is greater than 31, then groups_per_flex will will overflow and cause a divide by zero error. This can cause kernel BUG if such a file system is mounted. Thanks to Nageswara R Sastry for analyzing the failure and providing an initial patch. http://bugzilla.kernel.org/show_bug.cgi?id=14287 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: fix potential buffer head leak when add_dirent_to_buf() returns ENOSPCTheodore Ts'o
(cherry picked from commit 2de770a406b06dfc619faabbf5d85c835ed3f2e1) Previously add_dirent_to_buf() did not free its passed-in buffer head in the case of ENOSPC, since in some cases the caller still needed it. However, this led to potential buffer head leaks since not all callers dealt with this correctly. Fix this by making simplifying the freeing convention; now add_dirent_to_buf() *never* frees the passed-in buffer head, and leaves that to the responsibility of its caller. This makes things cleaner and easier to prove that the code is neither leaking buffer heads or calling brelse() one time too many. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Curt Wohlgemuth <curtw@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/OMingming
(cherry picked from commit ba230c3f6dc88ec008806adb27b12088486d508e) To prepare for a direct I/O write, we need to split the unwritten extents before submitting the I/O. When no extents needed to be split, ext4_split_unwritten_extents() was incorrectly returning 0 instead of the size of uninitialized extents. This bug caused the wrong return value sent back to VFS code when it gets called from async IO path, leading to an unnecessary fall back to buffered IO. This bug also hid the fact that the check to see whether or not a split would be necessary was incorrect; we can only skip splitting the extent if the write completely covers the uninitialized extent. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: code clean up for dio fallocate handlingMingming
(cherry picked from commit 4b70df181611012a3556f017b57dfcef7e1d279f) The ext4_debug() call in ext4_end_io_dio() should be moved after the check to make sure that io_end is non-NULL. The comment above ext4_get_block_dio_write() ("Maximum number of blocks...") is a duplicate; the original and correct comment is above the #define DIO_MAX_BLOCKS up above. Based on review comments from Curt Wohlgemuth. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: skip conversion of uninit extents after direct IO if there isn't anyMingming
(cherry picked from commit 5f5249507e4b5c4fc0f9c93f33d133d8c95f47e1) At the end of direct I/O operation, ext4_ext_direct_IO() always called ext4_convert_unwritten_extents(), regardless of whether there were any unwritten extents involved in the I/O or not. This commit adds a state flag so that ext4_ext_direct_IO() only calls ext4_convert_unwritten_extents() when necessary. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: fix ext4_ext_direct_IO()'s return value after converting uninit extentsMingming
(cherry picked from commit 109f55651954def97fa41ee71c464d268c512ab0) After a direct I/O request covering an uninitalized extent (i.e., created using the fallocate system call) or a hole in a file, ext4 will convert the uninitialized extent so it is marked as initialized by calling ext4_convert_unwritten_extents(). This function returns zero on success. This return value was getting returned by ext4_direct_IO(); however the file system's direct_IO function is supposed to return the number of bytes read or written on a success. By returning zero, it confused the direct I/O code into falling back to buffered I/O unnecessarily. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: discard preallocation when restarting a transaction during truncateAneesh Kumar K.V
(cherry picked from commit fa5d11133b07053270e18fa9c18560e66e79217e) When restart a transaction during a truncate operation, we drop and reacquire i_data_sem. After reacquiring i_data_sem, we need to discard any inode-based preallocation that might have been grabbed while we released i_data_sem (for example, if pdflush is allocating blocks and racing against the truncate). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: retry failed direct IO allocationsEric Sandeen
(cherry picked from commit fbbf69456619de5d251cb9f1df609069178c62d5) On a 256M filesystem, doing this in a loop: xfs_io -F -f -d -c 'pwrite 0 64m' test rm -f test eventually leads to ENOSPC. (the xfs_io command does a 64m direct IO write to the file "test") As with other block allocation callers, it looks like we need to potentially retry the allocations on the initial ENOSPC. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: fix a BUG_ON crash by checking that page has buffers attached to itTheodore Ts'o
(cherry picked from commit 1f94533d9cd75f6d2826018d54a971b9cc085992) In ext4_num_dirty_pages() we were calling page_buffers() before checking to see if the page actually had pages attached to it; this would cause a BUG check crash in the inline function page_buffers(). Thanks to Markus Trippelsdorf for reporting this bug. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Fix time encoding with extra epoch bitsTheodore Ts'o
(cherry picked from commit c1fccc0696bcaff6008c11865091f5ec4b0937ab) "Looking at ext4.h, I think the setting of extra time fields forgets to mask the epoch bits so the epoch part overwrites nsec part. The second change is only for coherency (2 -> EXT4_EPOCH_BITS)." Thanks to Damien Guibouret for pointing out this problem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Handle nested ext4_journal_start/stop calls without a journalCurt Wohlgemuth
(cherry picked from commit d3d1faf6a74496ea4435fd057c6a2cad49f3e523) This patch fixes a problem with handling nested calls to ext4_journal_start/ext4_journal_stop, when there is no journal present. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Make sure ext4_dirty_inode() updates the inode in no journal modeCurt Wohlgemuth
(cherry picked from commit f3dc272fd5e2ae08244796bb39e7e1ce4b25d3b3) This patch a problem that ext4_dirty_inode() was not calling ext4_mark_inode_dirty() if the current_handle is not valid, which it is the case in no journal mode. It also removes a test for non-matching transaction which can never happen. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Avoid updating the inode table bh twice in no journal modeFrank Mayhar
(cherry picked from commit 830156c79b0a99ddf0f62496bcf4de640f9f52cd) This is a cleanup of commit 91ac6f4. Since ext4_mark_inode_dirty() has already called ext4_mark_iloc_dirty(), which in turn calls ext4_do_update_inode(), it's not necessary to have ext4_write_inode() call ext4_do_update_inode() in no journal mode. Indeed, it would be duplicated work. Reviewed-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Frank Mayhar <fmayhar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes firstTheodore Ts'o
(cherry picked from commit f3ce8064b388ccf420012c5a4907aae4f13fe9d0) Move the check to make sure the original and donor inodes are different earlier, to avoid a potential deadlock by trying to lock the same inode twice. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: async direct IO for holes and fallocate supportMingming Cao
(cherry picked from commit 8d5d02e6b176565c77ff03604908b1453a22044d) For async direct IO that covers holes or fallocate, the end_io callback function now queued the convertion work on workqueue but don't flush the work rightaway as it might take too long to afford. But when fsync is called after all the data is completed, user expects the metadata also being updated before fsync returns. Thus we need to flush the conversion work when fsync() is called. This patch keep track of a listed of completed async direct io that has a work queued on workqueue. When fsync() is called, it will go through the list and do the conversion. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Use end_io callback to avoid direct I/O fallback to buffered I/OMingming Cao
(cherry picked from commit 4c0425ff68b1b87b802ffeda7b6a46ff7da7241c) Currently the DIO VFS code passes create = 0 when writing to the middle of file. It does this to avoid block allocation for holes, so as not to expose stale data out when there is a parallel buffered read (which does not hold the i_mutex lock). Direct I/O writes into holes falls back to buffered IO for this reason. Since preallocated extents are treated as holes when doing a get_block() look up (buffer is not mapped), direct IO over fallocate also falls back to buffered IO. Thus ext4 actually silently falls back to buffered IO in above two cases, which is undesirable. To fix this, this patch creates unitialized extents when a direct I/O write into holes in sparse files, and registering an end_io callback which converts the uninitialized extent to an initialized extent after the I/O is completed. Singed-Off-By: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Split uninitialized extents for direct I/OMingming Cao
(cherry picked from commit 0031462b5b392f90d17f1d75abb795883c44e969) When writing into an unitialized extent via direct I/O, and the direct I/O doesn't exactly cover the unitialized extent, split the extent into uninitialized and initialized extents before submitting the I/O. This avoids needing to deal with an ENOSPC error in the end_io callback that gets used for direct I/O. When the IO is complete, the written extent will be marked as initialized. Singed-Off-By: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: release reserved quota when block reservation for delalloc retryMingming Cao
(cherry picked from commit 9f0ccfd8e07d61b413e6536ffa02fbf60d2e20d8) ext4_da_reserve_space() can reserve quota blocks multiple times if ext4_claim_free_blocks() fail and we retry the allocation. We should release the quota reservation before restarting. Bug found by Jan Kara. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Adjust ext4_da_writepages() to write out larger contiguous chunksTheodore Ts'o
(cherry picked from commit 55138e0bc29c0751e2152df9ad35deea542f29b3) Work around problems in the writeback code to force out writebacks in larger chunks than just 4mb, which is just too small. This also works around limitations in the ext4 block allocator, which can't allocate more than 2048 blocks at a time. So we need to defeat the round-robin characteristics of the writeback code and try to write out as many blocks in one inode before allowing the writeback code to move on to another inode. We add a a new per-filesystem tunable, max_writeback_mb_bump, which caps this to a default of 128mb per inode. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Fix hueristic which avoids group preallocation for closed filesTheodore Ts'o
(cherry picked from commit 71780577306fd1e76c7a92e3b308db624d03adb9) The hueristic was designed to avoid using locality group preallocation when writing the last segment of a closed file. Fix it by move setting size to the maximum of size and isize until after we check whether size == isize. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Fix the alloc on close after a truncate hueristicTheodore Ts'o
(cherry picked from commit 5534fb5bb35a62a94e0bd1fa2421f7fb6e894f10) In an attempt to avoid doing an unneeded flush after opening a (previously non-existent) file with O_CREAT|O_TRUNC, the code only triggered the hueristic if ei->disksize was non-zero. Turns out that the VFS doesn't call ->truncate() if the file doesn't exist, and ei->disksize is always zero even if the file previously existed. So remove the test, since it isn't necessary and in fact disabled the hueristic. Thanks to Clemens Eisserer that he was seeing problems with files written using kwrite and eclipse after sudden crashes caused by a buggy Intel video driver. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: store EXT4_EXT_MIGRATE in i_state instead of i_flagsTheodore Ts'o
(cherry picked from commit 1b9c12f44c1eb614fd3b8822bfe8f1f5d8e53737) EXT4_EXT_MIGRATE is only intended to be used for an in-memory flag, and the hex value assigned to it collides with FS_DIRECTIO_FL (which is also stored in i_flags). There's no reason for the EXT4_EXT_MIGRATE bit to be stored in i_flags, so we switch it to use i_state instead. Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: limit block allocations for indirect-block files to < 2^32Eric Sandeen
(cherry picked from commit fb0a387dcdcd21aab1b09ee7fd80b7c979bdbbfd) Today, the ext4 allocator will happily allocate blocks past 2^32 for indirect-block files, which results in the block numbers getting truncated, and corruption ensues. This patch limits such allocations to < 2^32, and adds BUG_ONs if we do get blocks larger than that. This should address RH Bug 519471, ext4 bitmap allocator must limit blocks to < 2^32 * ext4_find_goal() is modified to choose a goal < UINT_MAX, so that our starting point is in an acceptable range. * ext4_xattr_block_set() is modified such that the goal block is < UINT_MAX, as above. * ext4_mb_regular_allocator() is modified so that the group search does not continue into groups which are too high * ext4_mb_use_preallocated() has a check that we don't use preallocated space which is too far out * ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs No attempt has been made to limit inode locations to < 2^32, so we may wind up with blocks far from their inodes. Doing this much already will lead to some odd ENOSPC issues when the "lower 32" gets full, and further restricting inodes could make that even weirder. For high inodes, choosing a goal of the original, % UINT_MAX, may be a bit odd, but then we're in an odd situation anyway, and I don't know of a better heuristic. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Fix different block exchange issue in EXT4_IOC_MOVE_EXTAkira Fujita
(cherry picked from commit c40ce3c9ea97425a12d7e44031a98fe50add6fc1) If logical block offset of original file which is passed to EXT4_IOC_MOVE_EXT is different from donor file's, a calculation error occurs in ext4_calc_swap_extents(), therefore wrong block is exchanged between original file and donor file. As a result, we hit ext4_error() in check_block_validity(). To detect the logical offset difference in EXT4_IOC_MOVE_EXT, add checks to mext_calc_swap_extents() and handle it as error, since data exchange must be done between the same blocks in EXT4_IOC_MOVE_EXT. Reported-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Add null extent check to ext_get_pathAkira Fujita
(cherry picked from commit 347fa6f1c7cb5df2b38d3c9167cfe242ce0cd1da) There is the possibility that path structure which is taken by ext4_ext_find_extent() indicates null extents. Because during data block exchanging in ext4_move_extents(), constitution of an extent tree may be changed. As a solution, the patch adds null extent check to ext_get_path(). Reported-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Replace BUG_ON() with ext4_error() in move_extents.cAkira Fujita
(cherry picked from commit 2147b1a6a48e28399120ca51d4a91840a278611f) Replace BUG_ON calls with a call to ext4_error() to print an error message if EXT4_IOC_MOVE_EXT failed with some kind of reasons. This will help to debug. Ted pointed this out, thanks. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Replace get_ext_path macro with an inline funcitonAkira Fujita
(cherry picked from commit e8505970af46658ece2545e9bc1fe594998fdcdf) Replace get_ext_path macro with an inline function, since this macro looks like a function call but its arguments get modified. Ted pointed this out, thanks. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Fix small typo for move_extent_per_page()Akira Fujita
(cherry picked from commit 44fc48f7048ab9657b524938a832fec4e0acea98) This function means moving extents every page, so change its name from move_exgtent_par_page(). Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.co.jp> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Fix include/trace/events/ext4.h to work with SystemtapTheodore Ts'o
(cherry picked from commit 3661d28615ea580c1db02a972fd4d3898df1cb01) Using relative pathnames in #include statements interacts badly with SystemTap, since the fs/ext4/*.h header files are not packaged up as part of a distribution kernel's header files. Since systemtap doesn't use TP_fast_assign(), we can use a blind structure definition and then make sure the needed header files are defined before the ext4 source files #include the trace/events/ext4.h header file. https://bugzilla.redhat.com/show_bug.cgi?id=512478 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Fix initalization of s_flex_groupsTheodore Ts'o
(cherry picked from commit 7ad9bb651fc2036ea94bed94da76a4b08959a911) The s_flex_groups array should have been initialized using atomic_add to sum up the free counts from the block groups that make up a flex_bg. By using atomic_set, the value of the s_flex_groups array was set to the values of the last block group in the flex_bg. The impact of this bug is that the block and inode allocation algorithms might not pick the best flex_bg for new allocation. Thanks to Damien Guibouret for pointing out this problem! Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Always set dx_node's fake_dirent explicitly.Andreas Schlick
(cherry picked from commit 1f7bebb9e911d870fa8f997ddff838e82b5715ea) When ext4_dx_add_entry() has to split an index node, it has to ensure that name_len of dx_node's fake_dirent is also zero, because otherwise e2fsck won't recognise it as an intermediate htree node and consider the htree to be corrupted. Signed-off-by: Andreas Schlick <schlick@lavabit.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Don't update superblock write time when filesystem is read-onlyTheodore Ts'o
(cherry picked from commit 71290b368ad5e1e0b0b300c9d5638490a9fd1a2d) This avoids updating the superblock write time when we are mounting the root file system read/only but we need to replay the journal; at that point, for people who are east of GMT and who make their clock tick in localtime for Windows bug-for-bug compatibility, and this will cause e2fsck to complain and force a full file system check. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: check for need init flag in ext4_mb_load_buddyAneesh Kumar K.V
(cherry picked from commit f41c0750538667b87a19c93952e5d42fcc069bd7) We should check for need init flag with the group's alloc_sem held, to make sure while we are loading the buddy cache and holding a reference to it, a file system resize can't add new blocks to same group. The patch also drops the need init flag check in ext4_mb_regular_allocator() because doing the check without holding alloc_sem is racy. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: move ext4_mb_init_group() function earlier in the mballoc.cAneesh Kumar K.V
(cherry picked from commit b6a758ec3af3ec236dbfdcf6a06b84ac8f94957e) This moves the function around so that it can be called from ext4_mb_load_buddy(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Make non-journal fsync work properlyFrank Mayhar
(cherry picked from commit 91ac6f43317c0bf99969665f98016548011dfa38) Teach ext4_write_inode() and ext4_do_update_inode() about non-journal mode: If we're not using a journal, ext4_write_inode() now calls ext4_do_update_inode() (after getting the iloc via ext4_get_inode_loc()) with a new "do_sync" parameter. If that parameter is nonzero _and_ we're not using a journal, ext4_do_update_inode() calls sync_dirty_buffer() instead of ext4_handle_dirty_metadata(). This problem was found in power-fail testing, checking the amount of loss of files and blocks after a power failure when using fsync() and when not using fsync(). It turned out that using fsync() was actually worse than not doing so, possibly because it increased the likelihood that the inodes would remain unflushed and would therefore be lost at the power failure. Signed-off-by: Frank Mayhar <fmayhar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-14ext4: Assure that metadata blocks are written during fsync in no journal modeTheodore Ts'o
(cherry picked from commit fe188c0e084bdf3038dc0ac963c21d764f53f7da) When there is no journal present, we must attach buffer heads associated with extent tree and indirect blocks to the inode's mapping->private_list via mark_buffer_dirty_inode() so that ext4_sync_file() --- which is called to service fsync() and fdatasync() system calls --- can write out the inode's metadata blocks by calling sync_mapping_buffers(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>