aboutsummaryrefslogtreecommitdiff
path: root/fs/btrfs/ctree.h
AgeCommit message (Collapse)Author
2010-10-29Btrfs: allow subvol deletion by unprivileged user with -o user_subvol_rm_allowedSage Weil
Add a mount option user_subvol_rm_allowed that allows users to delete a (potentially non-empty!) subvol when they would otherwise we allowed to do an rmdir(2). We duplicate the may_delete() checks from the core VFS code to implement identical security checks (minus the directory size check). We additionally require that the user has write+exec permission on the subvol root inode. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: async transaction commitSage Weil
Add support for an async transaction commit that is ordered such that any subsequent operations will join the following transaction, but does not wait until the current commit is fully on disk. This avoids much of the latency associated with the btrfs_commit_transaction for callers concerned with serialization and not safety. The wait_for_unblock flag controls whether we wait for the 'middle' portion of commit_transaction to complete, which is necessary if the caller expects some of the modifications contained in the commit to be available (this is the case for subvol/snapshot creation). Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Merge branch 'bug-fixes' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work Conflicts: fs/btrfs/extent-tree.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-29Btrfs: Add a clear_cache mount optionJosef Bacik
If something goes wrong with the free space cache we need a way to make sure it's not loaded on mount and that it's cleared for everybody. When you pass the clear_cache option it will make it so all block groups are setup to be cleared, which keeps them from being loaded and then they will be truncated when the transaction is committed. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-29Btrfs: add support for mixed data+metadata block groupsJosef Bacik
There are just a few things that need to be fixed in the kernel to support mixed data+metadata block groups. Mostly we just need to make sure that if we are using mixed block groups that we continue to allocate mixed block groups as we need them. Also we need to make sure __find_space_info will find our space info if we search for DATA or METADATA only. Tested this with xfstests and it works nicely. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-29Btrfs: write out free space cacheJosef Bacik
This is a simple bit, just dump the free space cache out to our preallocated inode when we're writing out dirty block groups. There are a bunch of changes in inode.c in order to account for special cases. Mostly when we're doing the writeout we're holding trans_mutex, so we need to use the nolock transacation functions. Also we can't do asynchronous completions since the async thread could be blocked on already completed IO waiting for the transaction lock. This has been tested with xfstests and btrfs filesystem balance, as well as my ENOSPC tests. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-28Btrfs: create special free space cache inodeJosef Bacik
In order to save free space cache, we need an inode to hold the data, and we need a special item to point at the right inode for the right block group. So first, create a special item that will point to the right inode, and the number of extent entries we will have and the number of bitmaps we will have. We truncate and pre-allocate space everytime to make sure it's uptodate. This feature will be turned on as soon as you mount with -o space_cache, however it is safe to boot into old kernels, they will just generate the cache the old fashion way. When you boot back into a newer kernel we will notice that we modified and not the cache and automatically discard the cache. Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-22Btrfs: rework how we reserve metadata bytesJosef Bacik
With multi-threaded writes we were getting ENOSPC early because somebody would come in, start flushing delalloc because they couldn't make their reservation, and in the meantime other threads would come in and use the space that was getting freed up, so when the original thread went to check to see if they had space they didn't and they'd return ENOSPC. So instead if we have some free space but not enough for our reservation, take the reservation and then start doing the flushing. The only time we don't take reservations is when we've already overcommitted our space, that way we don't have people who come late to the party way overcommitting ourselves. This also moves all of the retrying and flushing code into reserve_metdata_bytes so it's all uniform. This keeps my fs_mark test from returning -ENOSPC as soon as it starts and actually lets me fill up the disk. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-22Btrfs: re-work delalloc flushingJosef Bacik
Currently we try and flush delalloc, but we only do that in a sort of weak way, which works fine in most cases but if we're under heavy pressure we need to be able to wait for flushing to happen. Also instead of checking the bytes reserved in the block_rsv, check the space info since it is more accurate. The sync option will be used in a future patch. Signed-off-by: Josef Bacik <josef@redhat.com>
2010-10-22Btrfs: fix df regressionJosef Bacik
The new ENOSPC stuff breaks out the raid types which breaks the way we were reporting df to the system. This fixes it back so that Available is the total space available to data and used is the actual bytes used by the filesystem. This means that Available is Total - data used - all of the metadata space. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-08-09Make ->drop_inode() just return whether inode needs to be droppedAl Viro
... and let iput_final() do the actual eviction or retention Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09convert btrfs to ->evict_inode()Al Viro
NB: do we want btrfs_wait_ordered_range() on eviction of inodes with positive i_nlink on subvolume with zero root_refs? If not, btrfs_evict_inode() can be simplified by unconditionally bailing out in case of i_nlink > 0 in the very beginning... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27drop unused dentry argument to ->fsyncChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-25Btrfs: add basic DIO read/write supportJosef Bacik
This provides basic DIO support for reading and writing. It does not do the work to recover from mismatching checksums, that will come later. A few design changes have been made from Jim's code (sorry Jim!) 1) Use the generic direct-io code. Jim originally re-wrote all the generic DIO code in order to account for all of BTRFS's oddities, but thanks to that work it seems like the best bet is to just ignore compression and such and just opt to fallback on buffered IO. 2) Fallback on buffered IO for compressed or inline extents. Jim's code did it's own buffering to make dio with compressed extents work. Now we just fallback onto normal buffered IO. 3) Use ordered extents for the writes so that all of the lock_extent() lookup_ordered() type checks continue to work. 4) Do the lock_extent() lookup_ordered() loop in readpage so we don't race with DIO writes. I've tested this with fsx and everything works great. This patch depends on my dio and filemap.c patches to work. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Metadata ENOSPC handling for balanceYan, Zheng
This patch adds metadata ENOSPC handling for the balance code. It is consisted by following major changes: 1. Avoid COW tree leave in the phrase of merging tree. 2. Handle interaction with snapshot creation. 3. make the backref cache can live across transactions. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Pre-allocate space for data relocationYan, Zheng
Pre-allocate space for data relocation. This can detect ENOPSC condition caused by fragmentation of free space. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Metadata reservation for orphan inodesYan, Zheng
reserve metadata space for handling orphan inodes Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Introduce global metadata reservationYan, Zheng
Reserve metadata space for extent tree, checksum tree and root tree Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Update metadata reservation for delayed allocationYan, Zheng
Introduce metadata reservation context for delayed allocation and update various related functions. This patch also introduces EXTENT_FIRST_DELALLOC control bit for set/clear_extent_bit. It tells set/clear_bit_hook whether they are processing the first extent_state with EXTENT_DELALLOC bit set. This change is important if set/clear_extent_bit involves multiple extent_state. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Integrate metadata reservation with start_transactionYan, Zheng
Besides simplify the code, this change makes sure all metadata reservation for normal metadata operations are released after committing transaction. Changes since V1: Add code that check if unlink and rmdir will free space. Add ENOSPC handling for clone ioctl. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Introduce contexts for metadata reservationYan, Zheng
Introducing metadata reseravtion contexts has two major advantages. First, it makes metadata reseravtion more traceable. Second, it can reclaim freed space and re-add them to the itself after transaction committed. Besides add btrfs_block_rsv structure and related helper functions, This patch contains following changes: Move code that decides if freed tree block should be pinned into btrfs_free_tree_block(). Make space accounting more accurate, mainly for handling read only block groups. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Shrink delay allocated space in a synchronizedYan, Zheng
Shrink delayed allocation space in a synchronized manner is more controllable than flushing all delay allocated space in an async thread. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Kill allocate_wait in space_infoYan, Zheng
We already have fs_info->chunk_mutex to avoid concurrent chunk creation. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-05-25Btrfs: Link block groups of different raid typesYan, Zheng
The size of reserved space is stored in space_info. If block groups of different raid types are linked to separate space_info, changing allocation profile will corrupt reserved space accounting. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-04-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: add check for changed leaves in setup_leaf_for_split Btrfs: create snapshot references in same commit as snapshot Btrfs: fix small race with delalloc flushing waitqueue's Btrfs: use add_to_page_cache_lru, use __page_cache_alloc Btrfs: fix chunk allocate size calculation Btrfs: kill max_extent mount option Btrfs: fail to mount if we have problems reading the block groups Btrfs: check btrfs_get_extent return for IS_ERR() Btrfs: handle kmalloc() failure in inode lookup ioctl Btrfs: dereferencing freed memory Btrfs: Simplify num_stripes's calculation logical for __btrfs_alloc_chunk() Btrfs: Add error handle for btrfs_search_slot() in btrfs_read_chunk_tree() Btrfs: Remove unnecessary finish_wait() in wait_current_trans() Btrfs: add NULL check for do_walk_down() Btrfs: remove duplicate include in ioctl.c Fix trivial conflict in fs/btrfs/compression.c due to slab.h include cleanups.
2010-03-30Btrfs: kill max_extent mount optionJosef Bacik
As Yan pointed out, theres not much reason for all this complicated math to account for file extents being split up into max_extent chunks, since they are likely to all end up in the same leaf anyway. Since there isn't much reason to use max_extent, just remove the option altogether so we have one less thing we need to test. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (30 commits) Btrfs: fix the inode ref searches done by btrfs_search_path_in_tree Btrfs: allow treeid==0 in the inode lookup ioctl Btrfs: return keys for large items to the search ioctl Btrfs: fix key checks and advance in the search ioctl Btrfs: buffer results in the space_info ioctl Btrfs: use __u64 types in ioctl.h Btrfs: fix search_ioctl key advance Btrfs: fix gfp flags masking in the compression code Btrfs: don't look at bio flags after submit_bio btrfs: using btrfs_stack_device_id() get devid btrfs: use memparse Btrfs: add a "df" ioctl for btrfs Btrfs: cache the extent state everywhere we possibly can V2 Btrfs: cache ordered extent when completing io Btrfs: cache extent state in find_delalloc_range Btrfs: change the ordered tree to use a spinlock instead of a mutex Btrfs: finish read pages in the order they are submitted btrfs: fix btrfs_mkdir goto for no free objectids Btrfs: flush data on snapshot creation Btrfs: make df be a little bit more understandable ...
2010-03-15btrfs: use memparseAkinobu Mita
Use memparse() instead of its own private implementation. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-15Btrfs: cache the extent state everywhere we possibly can V2Josef Bacik
This patch just goes through and fixes everybody that does lock_extent() blah unlock_extent() to use lock_extent_bits() blah unlock_extent_cached() and pass around a extent_state so we only have to do the searches once per function. This gives me about a 3 mb/s boots on my random write test. I have not converted some things, like the relocation and ioctl's, since they aren't heavily used and the relocation stuff is in the middle of being re-written. I also changed the clear_extent_bit() to only unset the cached state if we are clearing EXTENT_LOCKED and related stuff, so we can do things like this lock_extent_bits() clear delalloc bits unlock_extent_cached() without losing our cached state. I tested this thoroughly and turned on LEAK_DEBUG to make sure we weren't leaking extent states, everything worked out fine. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-15Btrfs: add new defrag-range ioctl.Chris Mason
The btrfs defrag ioctl was limited to doing the entire file. This commit adds a new interface that can defrag a specific range inside the file. It can also force compression on the file, allowing you to selectively compress individual files after they were created, even when mount -o compress isn't turned on. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-15Btrfs: add ioctl and incompat flag to set the default mount subvolJosef Bacik
This patch needs to go along with my previous patch. This lets us set the default dir item's location to whatever root we want to use as our default mounting subvol. With this we don't have to use mount -o subvol=<tree id> anymore to mount a different subvol, we can just set the new one and it will just magically work. I've done some moderate testing with this, mostly just switching the default mount around, mounting subvols and the default mount at the same time and such, everything seems to work. Thanks, Older kernels would generally be able to still mount the filesystem with the default subvolume set, but it would result in a different volume being mounted, which could be an even more unpleasant suprise for users. So if you set your default subvolume, you can't go back to older kernels. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-15Btrfs: change how we mount subvolumesJosef Bacik
This work is in preperation for being able to set a different root as the default mounting root. There is currently a problem with how we mount subvolumes. We cannot currently mount a subvolume of a subvolume, you can only mount subvolumes/snapshots of the default subvolume. So say you take a snapshot of the default subvolume and call it snap1, and then take a snapshot of snap1 and call it snap2, so now you have / /snap1 /snap1/snap2 as your available volumes. Currently you can only mount / and /snap1, you cannot mount /snap1/snap2. To fix this problem instead of passing subvolid=<name> you must pass in subvolid=<treeid>, where <treeid> is the tree id that gets spit out via the subvolume listing you get from the subvolume listing patches (btrfs filesystem list). This allows us to mount /, /snap1 and /snap1/snap2 as the root volume. In addition to the above, we also now read the default dir item in the tree root to get the root key that it points to. For now this just points at what has always been the default subvolme, but later on I plan to change it to point at whatever root you want to be the new default root, so you can just set the default mount and not have to mount with -o subvolid=<treeid>. I tested this out with the above scenario and it worked perfectly. Thanks, mount -o subvol operates inside the selected subvolid. For example: mount -o subvol=snap1,subvolid=256 /dev/xxx /mnt /mnt will have the snap1 directory for the subvolume with id 256. mount -o subvol=snap /dev/xxx /mnt /mnt will be the snap directory of whatever the default subvolume is. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-15Btrfs: make set/get functions for the super compat_ro flags use compat_roJosef Bacik
Our set/get functions for compat_ro_flags actually look at compat_flags. This will mess any attempt to use compat flags up. The fix is obvious. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-03-05pass writeback_control to ->write_inodeChristoph Hellwig
This gives the filesystem more information about the writeback that is happening. Trond requested this for the NFS unstable write handling, and other filesystems might benefit from this too by beeing able to distinguish between the different callers in more detail. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-28Btrfs: Add mount -o compress-forceChris Mason
The default btrfs mount -o compress mode will quickly back off compressing a file if it notices that compression does not reduce the size of the data being written. This can save considerable CPU because all future writes to the file go through uncompressed. But some files are both very large and have mixed data stored in them. In that case, we want to add the ability to always try compressing data before writing it. This commit adds mount -o compress-force. A later commit will add a new inode flag that does the same thing. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17Btrfs: Fix per root used space accountingYan, Zheng
The bytes_used field in root item was originally planned to trace the amount of used data and tree blocks. But it never worked right since we can't trace freeing of data accurately. This patch changes it to only trace the amount of tree blocks. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17Btrfs: Add delayed iputYan, Zheng
iput() can trigger new transactions if we are dropping the final reference, so calling it in btrfs_commit_transaction may end up deadlock. This patch adds delayed iput to avoid the issue. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17Btrfs: Pass transaction handle to security and ACL initialization functionsYan, Zheng
Pass transaction handle down to security and ACL initialization functions, so we can avoid starting nested transactions Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-17Btrfs: Avoid orphan inodes cleanup while replaying logYan, Zheng
We do log replay in a single transaction, so it's not good to do unbound operations. This patch cleans up orphan inodes cleanup after replaying the log. It also avoids doing other unbound operations such as truncating a file during replaying log. These unbound operations are postponed to the orphan inode cleanup stage. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-15Btrfs: Rewrite btrfs_drop_extentsYan, Zheng
Rewrite btrfs_drop_extents by using btrfs_duplicate_item, so we can avoid calling lock_extent within transaction. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-12-15Btrfs: Add btrfs_duplicate_itemYan, Zheng
btrfs_duplicate_item duplicates item with new key, guaranteeing the source item and the new items are in the same tree leaf and contiguous. It allows us to split file extent in place, without using lock_extent to prevent bookend extent race. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-15Merge branch 'master' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: always pin metadata in discard mode Btrfs: enable discard support Btrfs: add -o discard option Btrfs: properly wait log writers during log sync Btrfs: fix possible ENOSPC problems with truncate Btrfs: fix btrfs acl #ifdef checks Btrfs: streamline tree-log btree block writeout Btrfs: avoid tree log commit when there are no changes Btrfs: only write one super copy during fsync
2009-10-14Btrfs: add -o discard optionChristoph Hellwig
Enable discard by default is not a good idea given the the trim speed of SSD prototypes we've seen, and the carecteristics for many high-end arrays. Turn of discards by default and require the -o discard option to enable them on. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-13Btrfs: fix btrfs acl #ifdef checksChris Mason
The btrfs acl code was #ifdefing for a define that didn't exist. This correctly matches it to the values used by the Kconfig file. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-13Btrfs: avoid tree log commit when there are no changesChris Mason
rpm has a habit of running fdatasync when the file hasn't changed. We already detect if a file hasn't been changed in the current transaction but it might have been sent to the tree-log in this transaction and not changed since the last call to fsync. In this case, we want to avoid a tree log sync, which includes a number of synchronous writes and barriers. This commit extends the existing tracking of the last transaction to change a file to also track the last sub-transaction. The end result is that rpm -ivh and -Uvh are roughly twice as fast, and on par with ext3. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix file clone ioctl for bookend extents Btrfs: fix uninit compiler warning in cow_file_range_nocow Btrfs: constify dentry_operations Btrfs: optimize back reference update during btrfs_drop_snapshot Btrfs: remove negative dentry when deleting subvolumne Btrfs: optimize fsync for the single writer case Btrfs: async delalloc flushing under space pressure Btrfs: release delalloc reservations on extent item insertion Btrfs: delay clearing EXTENT_DELALLOC for compressed extents Btrfs: cleanup extent_clear_unlock_delalloc flags Btrfs: fix possible softlockup in the allocator Btrfs: fix deadlock on async thread startup
2009-10-09Btrfs: constify dentry_operationsAlexey Dobriyan
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-08Btrfs: optimize fsync for the single writer caseJosef Bacik
This patch optimizes the tree logging stuff so it doesn't always wait 1 jiffie for new people to join the logging transaction if there is only ever 1 writer. This helps a little bit with latency where we have something like RPM where it will fdatasync every file it writes, and so waiting the 1 jiffie for every fdatasync really starts to add up. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-10-08Btrfs: async delalloc flushing under space pressureJosef Bacik
This patch moves the delalloc flushing that occurs when we are under space pressure off to a async thread pool. This helps since we only free up metadata space when we actually insert the extent item, which means it takes quite a while for space to be free'ed up if we wait on all ordered extents. However, if space is freed up due to inline extents being inserted, we can wake people who are waiting up early, and they can finish their work. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>