aboutsummaryrefslogtreecommitdiff
path: root/fs/ext3
AgeCommit message (Collapse)Author
2006-12-07[PATCH] ext3/4: don't do orphan processing on readonly devicesEric Sandeen
If you do something like: # touch foo # tail -f foo & # rm foo # <take snapshot> # <mount snapshot> you'll panic, because ext3/4 tries to do orphan list processing on the readonly snapshot device, and: kernel: journal commit I/O error kernel: Assertion failure in journal_flush_Rsmp_e2f189ce() at journal.c:1356: "!journal->j_checkpoint_transactions" kernel: Kernel panic: Fatal exception for a truly readonly underlying device, it's reasonable and necessary to just skip orphan list processing. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] ext3: fix reservation extensionMingming Cao
Hugh Dickins wrote: > Not found anything relevant, but I keep noticing these lines > in ext2_try_to_allocate_with_rsv(), ext3 and ext4 similar: > > } else if (grp_goal > 0 && > (my_rsv->rsv_end - grp_goal + 1) < *count) > try_to_extend_reservation(my_rsv, sb, > *count-my_rsv->rsv_end + grp_goal - 1); > > They're wrong, a no-op in most groups, aren't they? rsv_end is an > absolute block number, whereas grp_goal is group-relative, so the > calculation ought to bring in group_first_block? Or I'm confused. > Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: "linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] retries in ext3_prepare_write() violate ordering requirementsAndrey Savochkin
In journal=ordered or journal=data mode retry in ext3_prepare_write() breaks the requirements of journaling of data with respect to metadata. The fix is to call commit_write to commit allocated zero blocks before retry. Signed-off-by: Kirill Korotaev <dev@openvz.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ken Chen <kenneth.w.chen@intel.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] ext3: uninline large functionsAndrew Morton
Saves nearly 4kbytes on x86. Cc: Arnaldo Carvalho de Melo <acme@mandriva.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] handle ext3 directory corruption betterEric Sandeen
I've been using Steve Grubb's purely evil "fsfuzzer" tool, at http://people.redhat.com/sgrubb/files/fsfuzzer-0.4.tar.gz Basically it makes a filesystem, splats some random bits over it, then tries to mount it and do some simple filesystem actions. At best, the filesystem catches the corruption gracefully. At worst, things spin out of control. As you might guess, we found a couple places in ext3 where things spin out of control :) First, we had a corrupted directory that was never checked for consistency... it was corrupt, and pointed to another bad "entry" of length 0. The for() loop looped forever, since the length of ext3_next_entry(de) was 0, and we kept looking at the same pointer over and over and over and over... I modeled this check and subsequent action on what is done for other directory types in ext3_readdir... (adding this check adds some computational expense; I am testing a followup patch to reduce the number of times we check and re-check these directory entries, in all cases. Thanks for the idea, Andreas). Next we had a root directory inode which had a corrupted size, claimed to be > 200M on a 4M filesystem. There was only really 1 block in the directory, but because the size was so large, readdir kept coming back for more, spewing thousands of printk's along the way. Per Andreas' suggestion, if we're in this read error condition and we're trying to read an offset which is greater than i_blocks worth of bytes, stop trying, and break out of the loop. With these two changes fsfuzz test survives quite well on ext3. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] Remove superfluous lock_super() in extN xattr codeAndreas Gruenbacher
lock_super() is unnecessary for setting super-block feature flags. Use the provided *_SET_COMPAT_FEATURE() macros as well. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] ext3: fsid for statvfsPekka Enberg
Update ext3_statfs to return an FSID that is a 64 bit XOR of the 128 bit filesystem UUID as suggested by Andreas Dilger. See the following Bugzilla entry for details: http://bugzilla.kernel.org/show_bug.cgi?id=136 Cc: Andreas Dilger <adilger@clusterfs.com> Cc: Stephen Tweedie <sct@redhat.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] slab: remove kmem_cache_tChristoph Lameter
Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] slab: remove SLAB_NOFSChristoph Lameter
SLAB_NOFS is an alias of GFP_NOFS. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] ext3: errors behaviour fixDmitry Mishin
Current error behaviour for ext2 and ext3 filesystems does not fully correspond to the documentation and should be fixed. According to man 8 mount, ext2 and ext3 file systems allow to set one of 3 different on-errors behaviours: ---- start of quote man 8 mount ---- errors=continue / errors=remount-ro / errors=panic Define the behaviour when an error is encountered. (Either ignore errors and just mark the file system erroneous and continue, or remount the file system read-only, or panic and halt the system.) The default is set in the filesystem superblock, and can be changed using tune2fs(8). ---- end of quote ---- However EXT3_ERRORS_CONTINUE is not read from the superblock, and thus ERRORS_CONT is not saved on the sbi->s_mount_opt. It leads to the incorrect handle of errors on ext3. Then we've checked corresponding code in ext2 and discovered that it is buggy as well: - EXT2_ERRORS_CONTINUE is not read from the superblock (the same); - parse_option() does not clean the alternative values and thus something like (ERRORS_CONT|ERRORS_RO) can be set; - if options are omitted, parse_option() does not set any of these options. Therefore it is possible to set any combination of these options on the ext2: - none of them may be set: EXT2_ERRORS_CONTINUE on superblock / empty mount options; - any of them may be set using mount options; - 2 any options may be set: by using EXT2_ERRORS_RO/EXT2_ERRORS_PANIC on the superblock and other value in mount options; - and finally all three options may be set by adding third option in remount. Currently ext2 uses these values only in ext2_error() and it is not leading to any noticeable troubles. However somebody may be discouraged when he will try to workaround EXT2_ERRORS_PANIC on the superblock by using errors=continue in mount options. This patch: EXT3_ERRORS_CONTINUE should be taken from the superblock as default value for error behaviour. Signed-off-by: Dmitry Mishin <dim@openvz.org> Acked-by: Vasily Averin <vvs@sw.ru> Acked-by: Kirill Korotaev <dev@openvz.org> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01[PATCH] r/o bind mounts: monitor zeroing of i_nlinkDave Hansen
Some filesystems, instead of simply decrementing i_nlink, simply zero it during an unlink operation. We need to catch these in addition to the decrement operations. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01[PATCH] r/o bind mount prepwork: inc_nlink() helperDave Hansen
This is mostly included for parity with dec_nlink(), where we will have some more hooks. This one should stay pretty darn straightforward for now. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01[PATCH] r/o bind mounts: unlink: monitor i_nlinkDave Hansen
When a filesystem decrements i_nlink to zero, it means that a write must be performed in order to drop the inode from the filesystem. We're shortly going to have keep filesystems from being remounted r/o between the time that this i_nlink decrement and that write occurs. So, add a little helper function to do the decrements. We'll tie into it in a bit to note when i_nlink hits zero. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01[PATCH] Remove readv/writev methods and use aio_read/aio_write insteadBadari Pulavarty
This patch removes readv() and writev() methods and replaces them with aio_read()/aio_write() methods. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01[PATCH] Vectorize aio_read/aio_write fileop methodsBadari Pulavarty
This patch vectorizes aio_read() and aio_write() methods to prepare for collapsing all aio & vectored operations into one interface - which is aio_read()/aio_write(). Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Michael Holzheu <HOLZHEU@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-30[PATCH] BLOCK: Move the Ext3 device ioctl compat stuff to the Ext3 driver ↵David Howells
[try #6] Move the Ext3 device ioctl compat stuff from fs/compat_ioctl.c to the Ext3 driver so that the Ext3 header file doesn't need to be included. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30[PATCH] ext3: make meta data reads use READ_METAJens Axboe
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-27[PATCH] inode-diet: Eliminate i_blksize from the inode structureTheodore Ts'o
This eliminates the i_blksize field from struct inode. Filesystems that want to provide a per-inode st_blksize can do so by providing their own getattr routine instead of using the generic_fillattr() function. Note that some filesystems were providing pretty much random (and incorrect) values for i_blksize. [bunk@stusta.de: cleanup] [akpm@osdl.org: generic_fillattr() fix] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] Really ignore kmem_cache_destroy return valueAlexey Dobriyan
* Rougly half of callers already do it by not checking return value * Code in drivers/acpi/osl.c does the following to be sure: (void)kmem_cache_destroy(cache); * Those who check it printk something, however, slab_error already printed the name of failed cache. * XFS BUGs on failed kmem_cache_destroy which is not the decision low-level filesystem driver should make. Converted to ignore. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] fs: Removing useless castsPanagiotis Issaris
* Removing useless casts * Removing useless wrapper * Conversion from kmalloc+memset to kzalloc Signed-off-by: Panagiotis Issaris <takis@issaris.org> Acked-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] fs: Conversions from kmalloc+memset to k(z|c)allocPanagiotis Issaris
Conversions from kmalloc+memset to kzalloc. Signed-off-by: Panagiotis Issaris <takis@issaris.org> Jffs2-bit-acked-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] more ext3 16T overflow fixesEric Sandeen
Some of the changes in balloc.c are just cosmetic, as Andreas pointed out - if they overflow they'll then underflow and things are fine. 5th hunk actually fixes an overflow problem. Also check for potential overflows in inode & block counts when resizing. Signed-off-by: Eric Sandeen <esandeen@redhat.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] ext3: Fix sparse warningsDave Kleikamp
Fixing up some endian-ness warnings in preparation to clone ext4 from ext3. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] ext3: More whitespace cleanupsDave Kleikamp
More white space cleanups in preparation of cloning ext4 from ext3. Removing spaces that precede a tab. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] ext3: wrong error behaviorVasily Averin
SWsoft Virtuozzo/OpenVZ Linux kernel team has discovered that ext3 error behavior was broken in linux kernels since 2.5.x versions by the following patch: 2002/10/31 02:15:26-05:00 tytso@snap.thunk.org Default mount options from superblock for ext2/3 filesystems http://linux.bkbits.net:8080/linux-2.6/gnupatch@3dc0d88eKbV9ivV4ptRNM8fBuA3JBQ In case ext3 file system is mounted with errors=continue (EXT3_ERRORS_CONTINUE) errors should be ignored when possible. However at present in case of any error kernel aborts journal and remounts filesystem to read-only. Such behavior was hit number of times and noted to differ from that of 2.4.x kernels. This patch fixes this: - do nothing in case of EXT3_ERRORS_CONTINUE, - set EXT3_MOUNT_ABORT and call journal_abort() in all other cases - panic() should be called after ext3_commit_super() to save sb marked as EXT3_ERROR_FS Signed-off-by: Vasily Averin <vvs@sw.ru> Acked-by: Kirill Korotaev <dev@sw.ru> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Stephen C. Tweedie" <sct@redhat.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] ext3: more comments about block allocation/reservation codeMingming Cao
Signed-off-by: Mingming Cao <cmm@us.ibm.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] ext3: turn on reservation dump on block allocation errorsMingming Cao
In the past there were a few kernel panics related to block reservation tree operations failure (insert/remove etc). It would be very useful to get the block allocation reservation map info when such error happens. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] ext3: inode numbers are unsigned longEric Sandeen
This is primarily format string fixes, with changes to ialloc.c where large inode counts could overflow, and also pass around journal_inum as an unsigned long, just to be pedantic about it.... Signed-off-by: Eric Sandeen <esandeen@redhat.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] fix ext3 mounts at 16TEric Sandeen
I need to do some actual IO testing now, but this gets things mounting for a 16T ext3 filesystem. (patched up e2fsprogs is needed too, I'll send that off the kernel list) This patch fixes these issues in the kernel: o sbi->s_groups_count overflows in ext3_fill_super() sbi->s_groups_count = (le32_to_cpu(es->s_blocks_count) - le32_to_cpu(es->s_first_data_block) + EXT3_BLOCKS_PER_GROUP(sb) - 1) / EXT3_BLOCKS_PER_GROUP(sb); at 16T, s_blocks_count is already maxed out; adding EXT3_BLOCKS_PER_GROUP(sb) overflows it and groups_count comes out to 0. Not really what we want, and causes a failed mount. Feel free to check my math (actually, please do!), but changing it this way should work & avoid the overflow: (A + B - 1)/B changed to: ((A - 1)/B) + 1 o ext3_check_descriptors() overflows range checks ext3_check_descriptors() iterates over all block groups making sure that various bits are within the right block ranges... on the last pass through, it is checking the error case [item] >= block + EXT3_BLOCKS_PER_GROUP(sb) where "block" is the first block in the last block group. The last block in this group (and the last one that will fit in 32 bits) is block + EXT3_BLOCKS_PER_GROUP(sb)- 1. block + EXT3_BLOCKS_PER_GROUP(sb) wraps back around to 0. so, make things clearer with "first_block" and "last_block" where those are first and last, inclusive, and use <, > rather than <, >=. Finally, the last block group may be smaller than the rest, so account for this on the last pass through: last_block = sb->s_blocks_count - 1; (a similar patch could be done for ext2; does anyone in their right mind use ext2 at 16T? I'll send an ext2 patch doing the same thing if that's warranted) Signed-off-by: Eric Sandeen <esandeen@redhat.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] ext3 and jbd cleanup: remove whitespaceMingming Cao
Remove whitespace from ext3 and jbd, before we clone ext4. Signed-off-by: Mingming Cao<cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-16[PATCH] ext3 sequential read regression fixSuparna Bhattacharya
ext3-get-blocks support caused ~20% degrade in Sequential read performance (tiobench). Problem is with marking the buffer boundary so IO can be submitted right away. Here is the patch to fix it. 2.6.18-rc6: ----------- # ./iotest 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB) copied, 75.2726 seconds, 57.1 MB/s real 1m15.285s user 0m0.276s sys 0m3.884s 2.6.18-rc6 + fix: ----------------- [root@elm3a241 ~]# ./iotest 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB) copied, 62.9356 seconds, 68.2 MB/s The boundary block check in ext3_get_blocks_handle needs to be adjusted against the count of blocks mapped in this call, now that it can map more than one block. Signed-off-by: Suparna Bhattacharya <suparna@in.ibm.com> Tested-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-16[PATCH] knfsd: Make ext3 reject filehandles referring to invalid inode numberNeilBrown
Inodes earlier than the 'first' inode (e.g. journal, resize) should be rejected early - except the root inode. Also inode numbers that are too big should be rejected early. [akpm@osdl.org: cleanup] Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-08[PATCH] ext3_getblk() should handle HOLE correctlyBadari Pulavarty
It has been reported that ext3_getblk() is not doing the right thing and triggering following WARN(): BUG: warning at fs/ext3/inode.c:1016/ext3_getblk() <c01c5140> ext3_getblk+0x98/0x2a6 <c03b2806> md_wakeup_thread+0x26/0x2a <c01c536d> ext3_bread+0x1f/0x88 <c01cedf9> ext3_quota_read+0x136/0x1ae <c018b683> v1_read_dqblk+0x61/0xac <c0188f32> dquot_acquire+0xf6/0x107 <c01ceaba> ext3_acquire_dquot+0x46/0x68 <c01897d4> dqget+0x155/0x1e7 <c018a97b> dquot_transfer+0x3e0/0x3e9 <c016fe52> dput+0x23/0x13e <c01c7986> ext3_setattr+0xc3/0x240 <c0120f66> current_fs_time+0x52/0x6a <c017320e> notify_change+0x2bd/0x30d <c0159246> chown_common+0x9c/0xc5 <c02a222c> strncpy_from_user+0x3b/0x68 <c0167fe6> do_path_lookup+0xdf/0x266 <c016841b> __user_walk_fd+0x44/0x5a <c01592b9> sys_chown+0x4a/0x55 <c015a43c> vfs_write+0xe7/0x13c <c01695d4> sys_mkdir+0x1f/0x23 <c0102a97> syscall_call+0x7/0xb Looking at the code, it looks like it's not handle HOLE correctly. It ends up returning -EIO. Here is the patch to fix it. If we really want to be paranoid, we can allow return values 0 (HOLE), 1 (we asked for one block) and return -EIO for more than 1 block. But I really don't see a reason for doing it - all we need is the block# here. (doesn't matter how many blocks are mapped). ext3_get_blocks_handle() returns number of blocks it mapped. It returns 0 in case of HOLE. ext3_getblk() should handle HOLE properly (currently its dumping warning stack and returning -EIO). Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] ext3 filesystem bogus ENOSPC with reservation fixMingming Cao
To handle the earlier bogus ENOSPC error caused by filesystem full of block reservation, current code falls back to non block reservation, starts to allocate block(s) from the goal allocation block group as if there is no block reservation. Current code needs to re-load the corresponding block group descriptor for the initial goal block group in this case. The patch fixes this. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-31[PATCH] ext3 -nobh option causes oopsBadari Pulavarty
For files other than IFREG, nobh option doesn't make sense. Modifications to them are journalled and needs buffer heads to do that. Without this patch, we get kernel oops in page_buffers(). Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-31[PATCH] ext3: avoid triggering ext3_error on bad NFS file handleNeil Brown
The inode number out of an NFS file handle gets passed eventually to ext3_get_inode_block() without any checking. If ext3_get_inode_block() allows it to trigger an error, then bad filehandles can have unpleasant effect - ext3_error() will usually cause a forced read-only remount, or a panic if `errors=panic' was used. So remove the call to ext3_error there and put a matching check in ext3/namei.c where inode numbers are read off storage. [akpm@osdl.org: fix off-by-one error] Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: <stable@kernel.org> Cc: "Stephen C. Tweedie" <sct@redhat.com> Cc: Eric Sandeen <esandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10[PATCH] Remove leftover ext3 acl declarationsAndreas Gruenbacher
These functions no longer exist; remove their declarations. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03[PATCH] lockdep: annotate the quota codeArjan van de Ven
The quota code plays interesting games with the lock ordering; to quote Jan: | i_mutex of inode containing quota file is acquired after all other | quota locks. i_mutex of all other inodes is acquired before quota | locks. Quota code makes sure (by resetting inode operations and | setting special flag on inode) that noone tries to enter quota code | while holding i_mutex on a quota file... The good news is that all of this special case i_mutex grabbing happens in the (per filesystem) low level quota write function. For this special case we need a new I_MUTEX_* nesting level, since this just entirely outside any of the regular VFS locking rules for i_mutex. I trust Jan on his blue eyes that this is not ever going to deadlock; and based on that the patch below is what it takes to inform lockdep of these very interesting new locking rules. The new locking rule for the I_MUTEX_QUOTA nesting level is that this is the deepest possible level of nesting for i_mutex, and that this only should be used in quota write (and possibly read) function of filesystems. This makes the lock ordering of the I_MUTEX_* levels: I_MUTEX_PARENT -> I_MUTEX_CHILD -> I_MUTEX_NORMAL -> I_MUTEX_QUOTA Has no effect on non-lockdep kernels. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30Remove obsolete #include <linux/config.h>Jörn Engel
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-28[PATCH] mark address_space_operations constChristoph Hellwig
Same as with already do with the file operations: keep them in .rodata and prevents people from doing runtime patching. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] ext3: Add "-o bh" optionBadari Pulavarty
This patch adds "-o bh" option to force use of buffer_heads. This option is needed when we make "nobh" as default - and if we run into problems. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ext3: cleanup dead code in ext3_add_entry()Johann Lombardi
The variables nlen and rlen are defined/initialized but not used in ext3_add_entry(). Signed-off-by: Johann Lombardi <johann.lombardi@bull.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ext3_fsblk_t: the rest of in-kernel filesystem blocks conversionMingming Cao
Convert the ext3 in-kernel filesystem blocks to ext3_fsblk_t. Convert the rest of all unsigned long type in-kernel filesystem blocks to ext3_fsblk_t, and replace the printk format string respondingly. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ext3_fsblk_t: filesystem, group blocks and bug fixesMingming Cao
Some of the in-kernel ext3 block variable type are treated as signed 4 bytes int type, thus limited ext3 filesystem to 8TB (4kblock size based). While trying to fix them, it seems quite confusing in the ext3 code where some blocks are filesystem-wide blocks, some are group relative offsets that need to be signed value (as -1 has special meaning). So it seem saner to define two types of physical blocks: one is filesystem wide blocks, another is group-relative blocks. The following patches clarify these two types of blocks in the ext3 code, and fix the type bugs which limit current 32 bit ext3 filesystem limit to 8TB. With this series of patches and the percpu counter data type changes in the mm tree, we are able to extend exts filesystem limit to 16TB. This work is also a pre-request for the recent >32 bit ext3 work, and makes the kernel to able to address 48 bit ext3 block a lot easier: Simply redefine ext3_fsblk_t from unsigned long to sector_t and redefine the format string for ext3 filesystem block corresponding. Two RFC with a series patches have been posted to ext2-devel list and have been reviewed and discussed: http://marc.theaimsgroup.com/?l=ext2-devel&m=114722190816690&w=2 http://marc.theaimsgroup.com/?l=ext2-devel&m=114784919525942&w=2 Patches are tested on both 32 bit machine and 64 bit machine, <8TB ext3 and >8TB ext3 filesystem(with the latest to be released e2fsprogs-1.39). Tests includes overnight fsx, tiobench, dbench and fsstress. This patch: Defines ext3_fsblk_t and ext3_grpblk_t, and the printk format string for filesystem wide blocks. This patch classifies all block group relative blocks, and ext3_fsblk_t blocks occurs in the same function where used to be confusing before. Also include kernel bug fixes for filesystem wide in-kernel block variables. There are some fileystem wide blocks are treated as int/unsigned int type in the kernel currently, especially in ext3 block allocation and reservation code. This patch fixed those bugs by converting those variables to ext3_fsblk_t(unsigned long) type. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ext3: remove inconsistent space before exclamation point in mount codeTheodore Ts'o
This was reported as Debian bug #336604. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] Avoid disk sector_t overflow for >2TB ext3 filesystemMingming Cao
If ext3 filesystem is larger than 2TB, and sector_t is a u32 (i.e. CONFIG_LBD not defined in the kernel), the calculation of the disk sector will overflow. Add check at ext3_fill_super() and ext3_group_extend() to prevent mount/remount/resize >2TB ext3 filesystem if sector_t size is 4 bytes. Verified this patch on a 32 bit platform without CONFIG_LBD defined (sector_t is 32 bits long), mount refuse to mount a 10TB ext3. Signed-off-by: Mingming Cao<cmm@us.ibm.com> Acked-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] percpu counter data type changes to suppport more than 2**31 ext3 ↵Mingming Cao
free blocks counter The percpu counter data type are changed in this set of patches to support more users like ext3 who need more than 32 bit to store the free blocks total in the filesystem. - Generic perpcu counters data type changes. The size of the global counter and local counter were explictly specified using s64 and s32. The global counter is changed from long to s64, while the local counter is changed from long to s32, so we could avoid doing 64 bit update in most cases. - Users of the percpu counters are updated to make use of the new percpu_counter_init() routine now taking an additional parameter to allow users to pass the initial value of the global counter. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] ext3_clear_inode(): avoid kfree(NULL)Andrew Morton
Steven Rostedt <rostedt@goodmis.org> points out that `rsv' here is usually NULL, so we should avoid calling kfree(). Also, fix up some nearby whitespace damage. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] VFS: Permit filesystem to perform statfs with a known root dentryDavid Howells
Give the statfs superblock operation a dentry pointer rather than a superblock pointer. This complements the get_sb() patch. That reduced the significance of sb->s_root, allowing NFS to place a fake root there. However, NFS does require a dentry to use as a target for the statfs operation. This permits the root in the vfsmount to be used instead. linux/mount.h has been added where necessary to make allyesconfig build successfully. Interest has also been expressed for use with the FUSE and XFS filesystems. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] VFS: Permit filesystem to override root dentry on mountDavid Howells
Extend the get_sb() filesystem operation to take an extra argument that permits the VFS to pass in the target vfsmount that defines the mountpoint. The filesystem is then required to manually set the superblock and root dentry pointers. For most filesystems, this should be done with simple_set_mnt() which will set the superblock pointer and then set the root dentry to the superblock's s_root (as per the old default behaviour). The get_sb() op now returns an integer as there's now no need to return the superblock pointer. This patch permits a superblock to be implicitly shared amongst several mount points, such as can be done with NFS to avoid potential inode aliasing. In such a case, simple_set_mnt() would not be called, and instead the mnt_root and mnt_sb would be set directly. The patch also makes the following changes: (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount pointer argument and return an integer, so most filesystems have to change very little. (*) If one of the convenience function is not used, then get_sb() should normally call simple_set_mnt() to instantiate the vfsmount. This will always return 0, and so can be tail-called from get_sb(). (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the dcache upon superblock destruction rather than shrink_dcache_anon(). This is required because the superblock may now have multiple trees that aren't actually bound to s_root, but that still need to be cleaned up. The currently called functions assume that the whole tree is rooted at s_root, and that anonymous dentries are not the roots of trees which results in dentries being left unculled. However, with the way NFS superblock sharing are currently set to be implemented, these assumptions are violated: the root of the filesystem is simply a dummy dentry and inode (the real inode for '/' may well be inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries with child trees. [*] Anonymous until discovered from another tree. (*) The documentation has been adjusted, including the additional bit of changing ext2_* into foo_* in the documentation. [akpm@osdl.org: convert ipath_fs, do other stuff] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>