aboutsummaryrefslogtreecommitdiff
path: root/fs/ext2/inode.c
AgeCommit message (Collapse)Author
2014-01-25ext2/3/4: use generic posix ACL infrastructureChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-05ext2: Fix fs corruption in ext2_get_xip_mem()Jan Kara
Commit 8e3dffc651cb "Ext2: mark inode dirty after the function dquot_free_block_nodirty is called" unveiled a bug in __ext2_get_block() called from ext2_get_xip_mem(). That function called ext2_get_block() mistakenly asking it to map 0 blocks while 1 was intended. Before the above mentioned commit things worked out fine by luck but after that commit we started returning that we allocated 0 blocks while we in fact allocated 1 block and thus allocation was looping until all blocks in the filesystem were exhausted. Fix the problem by properly asking for one block and also add assertion in ext2_get_blocks() to catch similar problems. Reported-and-tested-by: Andiry Xu <andiry.xu@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-09-12truncate: drop 'oldsize' truncate_pagecache() parameterKirill A. Shutemov
truncate_pagecache() doesn't care about old size since commit cedabed49b39 ("vfs: Fix vmtruncate() regression"). Let's drop it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07aio: don't include aio.h in sched.hKent Overstreet
Faster kernel compiles by way of fewer unnecessary includes. [akpm@linux-foundation.org: fix fallout] [akpm@linux-foundation.org: fix build] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-13ext2: Fix BUG_ON in evict() on inode deletionJan Kara
Commit 8e3dffc6 introduced a regression where deleting inode with large extended attributes leads to triggering BUG_ON(inode->i_state != (I_FREEING | I_CLEAR)) in fs/inode.c:evict(). That happens because freeing of xattr block dirtied the inode and it happened after clear_inode() has been called. Fix the issue by moving removal of xattr block into ext2_evict_inode() before clear_inode() call close to a place where data blocks are truncated. That is also more logical place and removes surprising requirement that ext2_free_blocks() mustn't dirty the inode. Reported-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21Ext2: free memory allocated and forget buffer head when io error happensWang Shilong
Add a necessary check when an io error happens. If io error happens,free the memory allocated and forget buffer head. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-31ext2: Implement freezingJan Kara
The only missing piece to make freezing work reliably with ext2 is to stop iput() of unlinked inode from deleting the inode on frozen filesystem. So add a necessary protection to ext2_evict_inode(). We also provide appropriate ->freeze_fs and ->unfreeze_fs functions. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-28Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linuxLinus Torvalds
Pull writeback tree from Wu Fengguang: "Mainly from Jan Kara to avoid iput() in the flusher threads." * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Avoid iput() from flusher thread vfs: Rename end_writeback() to clear_inode() vfs: Move waiting for inode writeback from end_writeback() to evict_inode() writeback: Refactor writeback_single_inode() writeback: Remove wb->list_lock from writeback_single_inode() writeback: Separate inode requeueing after writeback writeback: Move I_DIRTY_PAGES handling writeback: Move requeueing when I_SYNC set to writeback_sb_inodes() writeback: Move clearing of I_SYNC into inode_sync_complete() writeback: initialize global_dirty_limit fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds mm: page-writeback.c: local functions should not be exposed globally
2012-05-15userns: Convert ext2 to use kuid/kgid where appropriate.Eric W. Biederman
Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-06vfs: Rename end_writeback() to clear_inode()Jan Kara
After we moved inode_sync_wait() from end_writeback() it doesn't make sense to call the function end_writeback() anymore. Rename it to clear_inode() which well says what the function really does - set I_CLEAR flag. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-01-09ext2/3/4: delete unneeded includes of module.hPaul Gortmaker
Delete any instances of include module.h that were not strictly required. In the case of ext2, the declaration of MODULE_LICENSE etc. were in inode.c but the module_init/exit were in super.c, so relocate the MODULE_LICENCE/AUTHOR block to super.c which makes it consistent with ext3 and ext4 at the same time. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jan Kara <jack@suse.cz>
2011-11-02filesystems: add set_nlink()Miklos Szeredi
Replace remaining direct i_nlink updates with a new set_nlink() updater function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-07-20fs: simplify the blockdev_direct_IO prototypeChristoph Hellwig
Simple filesystems always pass inode->i_sb_bdev as the block device argument, and never need a end_io handler. Let's simply things for them and for my grepping activity by dropping these arguments. The only thing not falling into that scheme is ext4, which passes and end_io handler without needing special flags (yet), but given how messy the direct I/O code there is use of __blockdev_direct_IO in one instead of two out of three cases isn't going to make a large difference anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20fs: move inode_dio_wait calls into ->setattrChristoph Hellwig
Let filesystems handle waiting for direct I/O requests themselves instead of doing it beforehand. This means filesystem-specific locks to prevent new dio referenes from appearing can be held. This is important to allow generalizing i_dio_count to non-DIO_LOCKING filesystems. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-31Fix common misspellingsLucas De Marchi
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-10block: remove per-queue pluggingJens Axboe
Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-25fs: add sync_inode_metadataChristoph Hellwig
Add a new helper to write out the inode using the writeback code, that is including the correct dirty bit and list manipulation. A few of filesystems already opencode this, and a lot of others should be using it instead of using write_inode_now which also writes out the data. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-09-23ext2: fix a typo on comment in ext2/inode.cNamhyung Kim
'excpet' should be 'except'. 'ext3_get_branch' should be 'ext2_get_branch'. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-08-09merge ext2 delete_inode and clear_inode, switch to ->evict_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09Take dirtying the inode to callers of ext2_free_blocks()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09check ATTR_SIZE contraints in inode_change_okChristoph Hellwig
Make sure we check the truncate constraints early on in ->setattr by adding those checks to inode_change_ok. Also clean up and document inode_change_ok to make this obvious. As a fallout we don't have to call inode_newsize_ok from simple_setsize and simplify it down to a truncate_setsize which doesn't return an error. This simplifies a lot of setattr implementations and means we use truncate_setsize almost everywhere. Get rid of fat_setsize now that it's trivial and mark ext2_setsize static to make the calling convention obvious. Keep the inode_newsize_ok in vmtruncate for now as all callers need an audit for its removal anyway. Note: setattr code in ecryptfs doesn't call inode_change_ok at all and needs a deeper audit, but that is left for later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09rename generic_setattrChristoph Hellwig
Despite its name it's now a generic implementation of ->setattr, but rather a helper to copy attributes from a struct iattr to the inode. Rename it to setattr_copy to reflect this fact. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09get rid of block_write_begin_newtruncChristoph Hellwig
Move the call to vmtruncate to get rid of accessive blocks to the callers in preparation of the new truncate sequence and rename the non-truncating version to block_write_begin. While we're at it also remove several unused arguments to block_write_begin. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09clean up write_begin usage for directories in pagecacheChristoph Hellwig
For filesystem that implement directories in pagecache we call block_write_begin with an already allocated page for this code, while the normal regular file write path uses the default block_write_begin behaviour. Get rid of the __foofs_write_begin helper and opencode the normal write_begin call in foofs_write_begin, while adding a new foofs_prepare_chunk helper for the directory code. The added benefit is that foofs_prepare_chunk has a much saner calling convention. Note that the interruptible flag passed into block_write_begin is always ignored if we already pass in a page (see next patch for details), and we never were doing truncations of exessive blocks for this case either so we can switch directly to block_write_begin_newtrunc. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09get rid of nobh_write_begin_newtruncChristoph Hellwig
Move the call to vmtruncate to get rid of accessive blocks to the only remaining caller and rename the non-truncating version to nobh_write_begin. Get rid of the superflous file argument to it while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09sort out blockdev_direct_IO variantsChristoph Hellwig
Move the call to vmtruncate to get rid of accessive blocks to the callers in prepearation of the new truncate calling sequence. This was only done for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant was not needed anyway. Get rid of blockdev_direct_IO_no_locking and its _newtrunc variant while at it as just opencoding the two additional paramters is shorted than the name suffix. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-06-04fix truncate inode time modification breakageNick Piggin
mtime and ctime should be changed only if the file size has actually changed. Patches changing ext2 and tmpfs from vmtruncate to new truncate sequence has caused regressions where they always update timestamps. There is some strange cases in POSIX where truncate(2) must not update times unless the size has acutally changed, see 6e656be89. This area is all still rather buggy in different ways in a lot of filesystems and needs a cleanup and audit (ideally the vfs will provide a simple attribute or call to direct all filesystems exactly which attributes to change). But coming up with the best solution will take a while and is not appropriate for rc anyway. So fix recent regression for now. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27ext2: convert to use the new truncate convention.npiggin@suse.de
I also have commented a possible bug in existing ext2 code, marked with XXX. Cc: linux-ext4@vger.kernel.org Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21quota: unify quota init condition in setattrDmitry Monakhov
Quota must being initialized if size or uid/git changes requested. But initialization performed in two different places: in case of i_size file system is responsible for dquot init , but in case of uid/gid init will be called internally in dquot_transfer(). This ambiguity makes code harder to understand. Let's move this logic to one common helper function. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21BKL: Remove BKL from ext2 filesystemJan Blunck
The BKL is still used in ext2_put_super(), ext2_fill_super(), ext2_sync_fs() ext2_remount() and ext2_write_inode(). From these calls ext2_put_super(), ext2_fill_super() and ext2_remount() are protected against each other by the struct super_block s_umount rw semaphore. The call in ext2_write_inode() could only protect the modification of the ext2_sb_info through ext2_update_dynamic_rev() against concurrent ext2_sync_fs() or ext2_remount(). ext2_fill_super() and ext2_put_super() can be left out because you need a valid filesystem reference in all three cases, which you do not have when you are one of these functions. If the BKL is only protecting the modification of the ext2_sb_info it can safely be removed since this is protected by the struct ext2_sb_info s_lock. Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21ext2: Add ext2_sb_info s_lock spinlockJan Blunck
Add a spinlock that protects against concurrent modifications of s_mount_state, s_blocks_last, s_overhead_last and the content of the superblock's buffer pointed to by sbi->s_es. The spinlock is now used in ext2_xattr_update_super_block() which was setting the EXT2_FEATURE_COMPAT_EXT_ATTR flag on the superblock without protection before. Likewise the spinlock is used in ext2_show_options() to have a consistent view of the mount options. This is a preparation patch for removing the BKL from ext2 in the next patch. Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jan Kara <jack@suse.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits) quota: stop using QUOTA_OK / NO_QUOTA dquot: cleanup dquot initialize routine dquot: move dquot initialization responsibility into the filesystem dquot: cleanup dquot drop routine dquot: move dquot drop responsibility into the filesystem dquot: cleanup dquot transfer routine dquot: move dquot transfer responsibility into the filesystem dquot: cleanup inode allocation / freeing routines dquot: cleanup space allocation / freeing routines ext3: add writepage sanity checks ext3: Truncate allocated blocks if direct IO write fails to update i_size quota: Properly invalidate caches even for filesystems with blocksize < pagesize quota: generalize quota transfer interface quota: sb_quota state flags cleanup jbd: Delay discarding buffers in journal_unmap_buffer ext3: quota_write cross block boundary behaviour quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota quota: split out compat_sys_quotactl support from quota.c quota: split out netlink notification support from quota.c quota: remove invalid optimization from quota_sync_all ... Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c
2010-03-05pass writeback_control to ->write_inodeChristoph Hellwig
This gives the filesystem more information about the writeback that is happening. Trond requested this for the NFS unstable write handling, and other filesystems might benefit from this too by beeing able to distinguish between the different callers in more detail. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05dquot: cleanup dquot initialize routineChristoph Hellwig
Get rid of the initialize dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_initialize helper to __dquot_initialize and vfs_dq_init to dquot_initialize to have a consistent namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05dquot: move dquot initialization responsibility into the filesystemChristoph Hellwig
Currently various places in the VFS call vfs_dq_init directly. This means we tie the quota code into the VFS. Get rid of that and make the filesystem responsible for the initialization. For most metadata operations this is a straight forward move into the methods, but for truncate and open it's a bit more complicated. For truncate we currently only call vfs_dq_init for the sys_truncate case because open already takes care of it for ftruncate and open(O_TRUNC) - the new code causes an additional vfs_dq_init for those which is harmless. For open the initialization is moved from do_filp_open into the open method, which means it happens slightly earlier now, and only for regular files. The latter is fine because we don't need to initialize it for operations on special files, and we already do it as part of the namespace operations for directories. Add a dquot_file_open helper that filesystems that support generic quotas can use to fill in ->open. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05dquot: cleanup dquot transfer routineChristoph Hellwig
Get rid of the transfer dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_transfer helper to __dquot_transfer and vfs_dq_transfer to dquot_transfer to have a consistent namespace, and make the new dquot_transfer return a normal negative errno value which all callers expect. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10ext2: Unify log messages in ext2Alexey Fisher
make messages produced by ext2 more unified. It should be easy to parse. dmesg before patch: [ 4893.684892] reservations ON [ 4893.684896] xip option not supported [ 4893.684961] EXT2-fs warning: mounting ext3 filesystem as ext2 [ 4893.684964] EXT2-fs warning: maximal mount count reached, running e2fsck is recommended [ 4893.684990] EXT II FS: 0.5b, 95/08/09, bs=1024, fs=1024, gc=2, bpg=8192, ipg=1280, mo=80010] dmesg after patch: [ 4893.684892] EXT2-fs (loop0): reservations ON [ 4893.684896] EXT2-fs (loop0): xip option not supported [ 4893.684961] EXT2-fs (loop0): warning: mounting ext3 filesystem as ext2 [ 4893.684964] EXT2-fs (loop0): warning: maximal mount count reached, running e2fsck is recommended [ 4893.684990] EXT2-fs (loop0): 0.5b, 95/08/09, bs=1024, fs=1024, gc=2, bpg=8192, ipg=1280, mo=80010] Signed-off-by: Alexey Fisher <bug-track@fisher-privat.net> Reviewed-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-16HWPOISON: Enable .remove_error_page for migration aware file systemsAndi Kleen
Enable removing of corrupted pages through truncation for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs These should cover most server needs. I chose the set of migration aware file systems for this for now, assuming they have been especially audited. But in general it should be safe for all file systems on the data area that support read/write and truncate. Caveat: the hardware error handler does not take i_mutex for now before calling the truncate function. Is that ok? Cc: tytso@mit.edu Cc: hch@infradead.org Cc: mfasheh@suse.com Cc: aia21@cantab.net Cc: hugh.dickins@tiscali.co.uk Cc: swhiteho@redhat.com Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-09-14ext2: Update comment about generic_osync_inodeJan Kara
We rely on generic_write_sync() now. CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2009-06-24switch ext2 to inode->i_aclAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11switch ext2 to simple_fsync()Al Viro
kill ext2_sync_file() (along with ext2/fsync.c), get rid of ext2_update_inode() - it's an alias of ext2_write_inode(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-04-13ext2: fix data corruption for racing writesJan Kara
If two writers allocating blocks to file race with each other (e.g. because writepages races with ordinary write or two writepages race with each other), ext2_getblock() can be called on the same inode in parallel. Before we are going to allocate new blocks, we have to recheck the block chain we have obtained so far without holding truncate_mutex. Otherwise we could overwrite the indirect block pointer set by the other writer leading to data loss. The below test program by Ying is able to reproduce the data loss with ext2 on in BRD in a few minutes if the machine is under memory pressure: long kMemSize = 50 << 20; int kPageSize = 4096; int main(int argc, char **argv) { int status; int count = 0; int i; char *fname = "/mnt/test.mmap"; char *mem; unlink(fname); int fd = open(fname, O_CREAT | O_EXCL | O_RDWR, 0600); status = ftruncate(fd, kMemSize); mem = mmap(0, kMemSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); // Fill the memory with 1s. memset(mem, 1, kMemSize); sleep(2); for (i = 0; i < kMemSize; i++) { int byte_good = mem[i] != 0; if (!byte_good && ((i % kPageSize) == 0)) { //printf("%d ", i / kPageSize); count++; } } munmap(mem, kMemSize); close(fd); unlink(fname); if (count > 0) { printf("Running %d bad page\n", count); return 1; } return 0; } Cc: Ying Han <yinghan@google.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Jan Kara <jack@suse.cz> Cc: Mingming Cao <cmm@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-26ext2: Use lowercase names of quota functionsJan Kara
Use lowercase names of quota functions instead of old uppercase ones. Signed-off-by: Jan Kara <jack@suse.cz> CC: linux-ext4@vger.kernel.org
2009-01-08ext2: fix ext2_splice_branch() commentsQinghuang Feng
There is no argument named @chain in ext2_splice_branch, remove references to it. Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-31ext2: ensure fast symlinks are NUL-terminatedDuane Griffin
Ensure fast symlink targets are NUL-terminated, even if corrupted on-disk. Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-03generic block based fiemap implementationJosef Bacik
Any block based fs (this patch includes ext3) just has to declare its own fiemap() function and then call this generic function with its own get_block_t. This works well for block based filesystems that will map multiple contiguous blocks at one time, but will work for filesystems that only map one block at a time, you will just end up with an "extent" for each block. One gotcha is this will not play nicely where there is hole+data after the EOF. This function will assume its hit the end of the data as soon as it hits a hole after the EOF, so if there is any data past that it will not pick that up. AFAIK no block based fs does this anyway, but its in the comments of the function anyway just in case. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org
2008-07-28vfs: pagecache usage optimization for pagesize!=blocksizeHisashi Hifumi
When we read some part of a file through pagecache, if there is a pagecache of corresponding index but this page is not uptodate, read IO is issued and this page will be uptodate. I think this is good for pagesize == blocksize environment but there is room for improvement on pagesize != blocksize environment. Because in this case a page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate. So I suggest that when all buffers which correspond to a part of a file that we want to read are uptodate, use this pagecache and copy data from this pagecache to user buffer even if a page is not uptodate. This can reduce read IO and improve system throughput. I wrote a benchmark program and got result number with this program. This benchmark do: 1: mount and open a test file. 2: create a 512MB file. 3: close a file and umount. 4: mount and again open a test file. 5: pwrite randomly 300000 times on a test file. offset is aligned by IO size(1024bytes). 6: measure time of preading randomly 100000 times on a test file. The result was: 2.6.26 330 sec 2.6.26-patched 226 sec Arch:i386 Filesystem:ext3 Blocksize:1024 bytes Memory: 1GB On ext3/4, a file is written through buffer/block. So random read/write mixed workloads or random read after random write workloads are optimized with this patch under pagesize != blocksize environment. This test result showed this. The benchmark program is as follows: #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <time.h> #include <stdlib.h> #include <string.h> #include <sys/mount.h> #define LEN 1024 #define LOOP 1024*512 /* 512MB */ main(void) { unsigned long i, offset, filesize; int fd; char buf[LEN]; time_t t1, t2; if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) { perror("cannot mount\n"); exit(1); } memset(buf, 0, LEN); fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC); if (fd < 0) { perror("cannot open file\n"); exit(1); } for (i = 0; i < LOOP; i++) write(fd, buf, LEN); close(fd); if (umount("/root/test1/") < 0) { perror("cannot umount\n"); exit(1); } if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) { perror("cannot mount\n"); exit(1); } fd = open("/root/test1/testfile", O_RDWR); if (fd < 0) { perror("cannot open file\n"); exit(1); } filesize = LEN * LOOP; for (i = 0; i < 300000; i++){ offset = (random() % filesize) & (~(LEN - 1)); pwrite(fd, buf, LEN, offset); } printf("start test\n"); time(&t1); for (i = 0; i < 100000; i++){ offset = (random() % filesize) & (~(LEN - 1)); pread(fd, buf, LEN, offset); } time(&t2); printf("%ld sec\n", t2-t1); close(fd); if (umount("/root/test1/") < 0) { perror("cannot umount\n"); exit(1); } } Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jan Kara <jack@ucw.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28ext2: use ext2_fsblk_t typeAkinobu Mita
Use ext2_fsblk_t type for filesystem-wide blocks number Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28ext2: use ext2_group_first_block_no()Akinobu Mita
Use ext2_group_first_block_no() and assign the return values to ext2_fsblk_t variables. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28xip: support non-struct page backed memoryNick Piggin
Convert XIP to support non-struct page backed memory, using VM_MIXEDMAP for the user mappings. This requires the get_xip_page API to be changed to an address based one. Improve the API layering a little bit too, while we're here. This is required in order to support XIP filesystems on memory that isn't backed with struct page (but memory with struct page is still supported too). Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Carsten Otte <cotte@de.ibm.com> Cc: Jared Hulbert <jaredeh@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>