aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2011-01-07config fs: avoid switching ->d_op on live dentryNick Piggin
Switching d_op on a live dentry is racy in general, so avoid it. In this case it is a negative dentry, which is safer, but there are still concurrent ops which may be called on d_op in that case (eg. d_revalidate). So in general a filesystem may not do this. Fix configfs so as not to do this. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: use fast counters for vfs cachesNick Piggin
percpu_counter library generates quite nasty code, so unless you need to dynamically allocate counters or take fast approximate value, a simple per cpu set of counters is much better. The percpu_counter can never be made to work as well, because it has an indirection from pointer to percpu memory, and it can't use direct this_cpu_inc interfaces because it doesn't use static PER_CPU data, so code will always be worse. In the fastpath, it is the difference between this: incl %gs:nr_dentry # nr_dentry and this: movl percpu_counter_batch(%rip), %edx # percpu_counter_batch, movl $1, %esi #, movq $nr_dentry, %rdi #, call __percpu_counter_add # (plus I clobber registers) __percpu_counter_add: pushq %rbp # movq %rsp, %rbp #, subq $32, %rsp #, movq %rbx, -24(%rbp) #, movq %r12, -16(%rbp) #, movq %r13, -8(%rbp) #, movq %rdi, %rbx # fbc, fbc #APP # 216 "/home/npiggin/usr/src/linux-2.6/arch/x86/include/asm/thread_info.h" 1 movq %gs:kernel_stack,%rax #, pfo_ret__ # 0 "" 2 #NO_APP incl -8124(%rax) # <variable>.preempt_count movq 32(%rdi), %r12 # <variable>.counters, tcp_ptr__ #APP # 78 "lib/percpu_counter.c" 1 add %gs:this_cpu_off, %r12 # this_cpu_off, tcp_ptr__ # 0 "" 2 #NO_APP movslq (%r12),%r13 #* tcp_ptr__, tmp73 movslq %edx,%rax # batch, batch addq %rsi, %r13 # amount, count cmpq %rax, %r13 # batch, count jge .L27 #, negl %edx # tmp76 movslq %edx,%rdx # tmp76, tmp77 cmpq %rdx, %r13 # tmp77, count jg .L28 #, .L27: movq %rbx, %rdi # fbc, call _raw_spin_lock # addq %r13, 8(%rbx) # count, <variable>.count movq %rbx, %rdi # fbc, movl $0, (%r12) #,* tcp_ptr__ call _raw_spin_unlock # .L29: #APP # 216 "/home/npiggin/usr/src/linux-2.6/arch/x86/include/asm/thread_info.h" 1 movq %gs:kernel_stack,%rax #, pfo_ret__ # 0 "" 2 #NO_APP decl -8124(%rax) # <variable>.preempt_count movq -8136(%rax), %rax #, D.14625 testb $8, %al #, D.14625 jne .L32 #, .L31: movq -24(%rbp), %rbx #, movq -16(%rbp), %r12 #, movq -8(%rbp), %r13 #, leave ret .p2align 4,,10 .p2align 3 .L28: movl %r13d, (%r12) # count,* jmp .L29 # .L32: call preempt_schedule # .p2align 4,,6 jmp .L31 # .size __percpu_counter_add, .-__percpu_counter_add .p2align 4,,15 Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07vfs: revert per-cpu nr_unused counters for dentry and inodesNick Piggin
The nr_unused counters count the number of objects on an LRU, and as such they are synchronized with LRU object insertion and removal and scanning, and protected under the LRU lock. Making it per-cpu does not actually get any concurrency improvements because of this lock, and summing the counter is much slower, and incrementing/decrementing it costs more code size and is slower too. These counters should stay per-LRU, which currently means global. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: d_validate fixesNick Piggin
d_validate has been broken for a long time. kmem_ptr_validate does not guarantee that a pointer can be dereferenced if it can go away at any time. Even rcu_read_lock doesn't help, because the pointer might be queued in RCU callbacks but not executed yet. So the parent cannot be checked, nor the name hashed. The dentry pointer can not be touched until it can be verified under lock. Hashing simply cannot be used. Instead, verify the parent/child relationship by traversing parent's d_child list. It's slow, but only ncpfs and the destaged smbfs care about it, at this point. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-05Revert "fs: use RCU read side protection in d_validate"Nick Piggin
This reverts commit 3825bdb7ed920845961f32f364454bee5f469abb. You cannot dget() a dentry without having a reference, or holding a lock that guarantees it remains valid. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2010-12-23Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Fix system inodes cache overflow. ocfs2: Hold ip_lock when set/clear flags for indexed dir. ocfs2: Adjust masklog flag values Ocfs2: Teach 'coherency=full' O_DIRECT writes to correctly up_read i_alloc_sem. ocfs2/dlm: Migrate lockres with no locks if it has a reference
2010-12-23Merge branch 'linus-hot-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'linus-hot-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix on-line resizing regression
2010-12-23ext4: fix on-line resizing regressionTheodore Ts'o
https://bugzilla.kernel.org/show_bug.cgi?id=25352 This regression was caused by commit a31437b85: "ext4: use sb_issue_zeroout in setup_new_group_blocks", by accidentally dropping the code which reserved the block group descriptor and inode table blocks. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-12-22logfs: fix "Kernel BUG at readwrite.c:1193"Prasad Joshi
This happens when __logfs_create() tries to write a new inode to the disk which is full. __logfs_create() associates the transaction pointer with inode. During the logfs_write_inode() function call chain this transaction pointer is moved from inode to page->private using function move_inode_to_page (do_write_inode() -> inode_to_page() -> move_inode_to_page) When the write inode fails, the transaction is aborted and iput is called on the failed inode. During delete_inode the same transaction pointer associated with the page is getting used. Thus causing kernel BUG. The patch checks for error in write_inode() and restores the page->private to NULL. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=20162 Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com> Cc: Joern Engel <joern@logfs.org> Cc: Florian Mickler <florian@mickler.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-22logfs: fix deadlock in logfs_get_wblocks, hold and wait on super->s_write_mutexPrasad Joshi
do_logfs_journal_wl_pass() should use GFP_NOFS for memory allocation GC code calls btree_insert32 with GFP_KERNEL while holding a mutex super->s_write_mutex. The same mutex is used in address_space_operations->writepage(), and a call to writepage() could be triggered as a result of memory allocation in btree_insert32, causing a deadlock. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=20342 Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com> Cc: Joern Engel <joern@logfs.org> Cc: Florian Mickler <florian@mickler.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-22ocfs2: Fix system inodes cache overflow.Tao Ma
When we store system inodes cache in ocfs2_super, we use a array for global system inodes. But unfortunately, the range is calculated wrongly which makes it overflow and pollute ocfs2_super->local_system_inodes. This patch fix it by setting the range properly. The corresponding bug is ossbug1303. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1303 Cc: stable@kernel.org Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-12-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: handle partial result from get_user_pages ceph: mark user pages dirty on direct-io reads ceph: fix null pointer dereference in ceph_init_dentry for nfs reexport ceph: fix direct-io on non-page-aligned buffers ceph: fix msgr_init error path
2010-12-20Fix btrfs b0rkageAl Viro
Buggered-in: 76dda93c6ae2 ("Btrfs: add snapshot/subvolume destroy ioctl") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-17ceph: mark user pages dirty on direct-io readsHenry C Chang
For read operation, we have to set the argument _write_ of get_user_pages to 1 since we will write data to pages. Also, we need to SetPageDirty before releasing these pages. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-17ceph: fix null pointer dereference in ceph_init_dentry for nfs reexportSage Weil
The fh_to_dentry etc. methods use ceph_init_dentry(), which assumes that d_parent is defined. It isn't for those callers, so check! Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-16Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notifyLinus Torvalds
* 'for-linus' of git://git.infradead.org/users/eparis/notify: fanotify: fill in the metadata_len field on struct fanotify_event_metadata fanotify: split version into version and metadata_len fanotify: Dont try to open a file descriptor for the overflow event fanotify: Introduce FAN_NOFD fanotify: do not leak user reference on allocation failure inotify: stop kernel memory leak on file creation failure fanotify: on group destroy allow all waiters to bypass permission check fanotify: Dont allow a mask of 0 if setting or removing a mark fanotify: correct broken ref counting in case adding a mark failed fanotify: if set by user unset FMODE_NONOTIFY before fsnotify_perm() is called fanotify: remove packed from access response message fanotify: deny permissions when no event was sent
2010-12-16ocfs2: Hold ip_lock when set/clear flags for indexed dir.Tao Ma
When we set/clear the dyn_features for an inode we hold the ip_lock. So do it when we set/clear OCFS2_INDEXED_DIR_FL also. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-12-16ocfs2: Adjust masklog flag valuesSunil Mushran
Two masklogs had the same flag value. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-12-16nilfs2: fix regression of garbage collection ioctlRyusuke Konishi
On 2.6.37-rc1, garbage collection ioctl of nilfs was broken due to the commit 263d90cefc7d82a0 ("nilfs2: remove own inode hash used for GC"), and leading to filesystem corruption. The patch doesn't queue gc-inodes for log writer if they are reused through the vfs inode cache. Here, gc-inode is the inode which buffers blocks to be relocated on GC. That patch queues gc-inodes in nilfs_init_gcinode() function, but this function is not called when they don't have I_NEW flag. Thus, some of live blocks are wrongly overrode without being moved to new logs. This resolves the problem by moving the gc-inode queueing to an outer function to ensure it's done right. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-12-15ceph: fix direct-io on non-page-aligned buffersHenry C Chang
The user buffer may be 512-byte aligned, not page-aligned. We were assuming the buffer was page-aligned and only accounting for non-page-aligned io offsets. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-15Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix typo which broke '..' detection in ext4_find_entry() ext4: Turn off multiple page-io submission by default
2010-12-15install_special_mapping skips security_file_mmap check.Tavis Ormandy
The install_special_mapping routine (used, for example, to setup the vdso) skips the security check before insert_vm_struct, allowing a local attacker to bypass the mmap_min_addr security restriction by limiting the available pages for special mappings. bprm_mm_init() also skips the check, and although I don't think this can be used to bypass any restrictions, I don't see any reason not to have the security check. $ uname -m x86_64 $ cat /proc/sys/vm/mmap_min_addr 65536 $ cat install_special_mapping.s section .bss resb BSS_SIZE section .text global _start _start: mov eax, __NR_pause int 0x80 $ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s $ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o $ ./install_special_mapping & [1] 14303 $ cat /proc/14303/maps 0000f000-00010000 r-xp 00000000 00:00 0 [vdso] 00010000-00011000 r-xp 00001000 00:19 2453665 /home/taviso/install_special_mapping 00011000-ffffe000 rwxp 00000000 00:00 0 [stack] It's worth noting that Red Hat are shipping with mmap_min_addr set to 4096. Signed-off-by: Tavis Ormandy <taviso@google.com> Acked-by: Kees Cook <kees@ubuntu.com> Acked-by: Robert Swiecki <swiecki@google.com> [ Changed to not drop the error code - akpm ] Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-15fanotify: fill in the metadata_len field on struct fanotify_event_metadataEric Paris
The fanotify_event_metadata now has a field which is supposed to indicate the length of the metadata portion of the event. Fill in that field as well. Based-in-part-on-patch-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-14ext4: fix typo which broke '..' detection in ext4_find_entry()Aaro Koskinen
There should be a check for the NUL character instead of '0'. Fortunately the only thing that cares about this is NFS serving, which is why we didn't notice this in the merge window testing. Reported-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-12-14ext4: Turn off multiple page-io submission by defaultTheodore Ts'o
Jon Nelson has found a test case which causes postgresql to fail with the error: psql:t.sql:4: ERROR: invalid page header in block 38269 of relation base/16384/16581 Under memory pressure, it looks like part of a file can end up getting replaced by zero's. Until we can figure out the cause, we'll roll back the change and use block_write_full_page() instead of ext4_bio_write_page(). The new, more efficient writing function can be used via the mount option mblk_io_submit, so we can test and fix the new page I/O code. To reproduce the problem, install postgres 8.4 or 9.0, and pin enough memory such that the system just at the end of triggering writeback before running the following sql script: begin; create temporary table foo as select x as a, ARRAY[x] as b FROM generate_series(1, 10000000 ) AS x; create index foo_a_idx on foo (a); create index foo_b_idx on foo USING GIN (b); rollback; If the temporary table is created on a hard drive partition which is encrypted using dm_crypt, then under memory pressure, approximately 30-40% of the time, pgsql will issue the above failure. This patch should fix this problem, and the problem will come back if the file system is mounted with the mblk_io_submit mount option. Reported-by: Jon Nelson <jnelson@jamponi.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-12-14Merge branch 'for-2.6.37' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: nfsd: Fix possible BUG_ON firing in set_change_info sunrpc: prevent use-after-free on clearing XPT_BUSY
2010-12-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: prevent RAID level downgrades when space is low Btrfs: account for missing devices in RAID allocation profiles Btrfs: EIO when we fail to read tree roots Btrfs: fix compiler warnings Btrfs: Make async snapshot ioctl more generic Btrfs: pwrite blocked when writing from the mmaped buffer of the same page Btrfs: Fix a crash when mounting a subvolume Btrfs: fix sync subvol/snapshot creation Btrfs: Fix page leak in compressed writeback path Btrfs: do not BUG if we fail to remove the orphan item for dead snapshots Btrfs: fixup return code for btrfs_del_orphan_item Btrfs: do not do fast caching if we are allocating blocks for tree_root Btrfs: deal with space cache errors better Btrfs: fix use after free in O_DIRECT
2010-12-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: verify ioctl retries fuse: fix ioctl when server is 32bit
2010-12-14Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: log timestamp changes to the source inode in rename
2010-12-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix ioctl magic ceph: Behave better when handling file lock replies. ceph: pass lock information by struct file_lock instead of as individual params. ceph: Handle file locks in replies from the MDS. ceph: avoid possible null deref in readdir after dir llseek
2010-12-14Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Fix panic after nfs_umount() nfs: remove extraneous and problematic calls to nfs_clear_request nfs: kernel should return EPROTONOSUPPORT when not support NFSv4 NFS: Fix fcntl F_GETLK not reporting some conflicts nfs: Discard ACL cache on mode update NFS: Readdir cleanups NFS: nfs_readdir_search_for_cookie() don't mark as eof if cookie not found NFS: Fix a memory leak in nfs_readdir Call the filesystem back whenever a page is removed from the page cache NFS: Ensure we use the correct cookie in nfs_readdir_xdr_filler
2010-12-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: remove bogus remapping of error in cifs_filldir() cifs: allow calling cifs_build_path_to_root on incomplete cifs_sb cifs: fix check of error return from is_path_accessable cifs: remove Local_System_Name cifs: fix use of CONFIG_CIFS_ACL cifs: add attribute cache timeout (actimeo) tunable
2010-12-13Btrfs: prevent RAID level downgrades when space is lowChris Mason
The extent allocator has code that allows us to fill allocations from any available block group, even if it doesn't match the raid level we've requested. This was put in because adding a new drive to a filesystem made with the default mkfs options actually upgrades the metadata from single spindle dup to full RAID1. But, the code also allows us to allocate from a raid0 chunk when we really want a raid1 or raid10 chunk. This can cause big trouble because mkfs creates a small (4MB) raid0 chunk for data and metadata which then goes unused for raid1/raid10 installs. The allocator will happily wander in and allocate from that chunk when things get tight, which is not correct. The fix here is to make sure that we provide duplication when the caller has asked for it. It does all the dups to be any raid level, which preserves the dup->raid1 upgrade abilities. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-12-13Btrfs: account for missing devices in RAID allocation profilesChris Mason
When we mount in RAID degraded mode without adding a new device to replace the failed one, we can end up using the wrong RAID flags for allocations. This results in strange combinations of block groups (raid1 in a raid10 filesystem) and corruptions when we try to allocate blocks from single spindle chunks on drives that are actually missing. The first device has two small 4MB chunks in it that mkfs creates and these are usually unused in a raid1 or raid10 setup. But, in -o degraded, the allocator will fall back to these because the mask of desired raid groups isn't correct. The fix here is to count the missing devices as we build up the list of devices in the system. This count is used when picking the raid level to make sure we continue using the same levels that were in place before we lost a drive. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-12-13Btrfs: EIO when we fail to read tree rootsChris Mason
If we just get a plain IO error when we read tree roots, the code wasn't properly sending that error up the chain. This allowed mounts to continue when they should failed, and allowed operations on partially setup root structs. The end result was usually oopsen on spinlocks that hadn't been spun up correctly. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-12-10Btrfs: fix compiler warningsJan Beulich
... regarding an unused function when !MIGRATION, and regarding a printk() format string vs argument mismatch. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-12-10Btrfs: Make async snapshot ioctl more genericLi Zefan
If we had reserved some bytes in struct btrfs_ioctl_vol_args, we wouldn't have to create a new structure for async snapshot creation. Here we convert async snapshot ioctl to use a more generic ABI, as we'll add more ioctls for snapshots/subvolumes in the future, readonly snapshots for example. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-12-10Btrfs: pwrite blocked when writing from the mmaped buffer of the same pageXin Zhong
This problem is found in meego testing: http://bugs.meego.com/show_bug.cgi?id=6672 A file in btrfs is mmaped and the mmaped buffer is passed to pwrite to write to the same page of the same file. In btrfs_file_aio_write(), the pages is locked by prepare_pages(). So when btrfs_copy_from_user() is called, page fault happens and the same page needs to be locked again in filemap_fault(). The fix is to move iov_iter_fault_in_readable() before prepage_pages() to make page fault happen before pages are locked. And also disable page fault in critical region in btrfs_copy_from_user(). Reviewed-by: Yan, Zheng<zheng.z.yan@intel.com> Signed-off-by: Zhong, Xin <xin.zhong@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-12-10Btrfs: Fix a crash when mounting a subvolumeLi Zefan
We should drop dentry before deactivating the superblock, otherwise we can hit this bug: BUG: Dentry f349a690{i=100,n=/} still in use (1) [unmount of btrfs loop1] ... Steps to reproduce the bug: # mount /dev/loop1 /mnt # mkdir save # btrfs subvolume snapshot /mnt save/snap1 # umount /mnt # mount -o subvol=save/snap1 /dev/loop1 /mnt (crash) Reported-by: Michael Niederle <mniederle@gmx.at> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-12-10Btrfs: fix sync subvol/snapshot creationSage Weil
We were incorrectly taking the async path even for the sync ioctls by passing in &transid unconditionally. There's ample room for further cleanup here, but this keeps the fix simple. Signed-off-by: Sage Weil <sage@newdream.net> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-12-10Btrfs: Fix page leak in compressed writeback pathYan, Zheng
"start + num_bytes >= actual_end" can happen when compressed page writeback races with file truncation. In that case we need unlock and release pages past the end of file. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-12-10Btrfs: do not BUG if we fail to remove the orphan item for dead snapshotsJosef Bacik
Not being able to delete an orphan item isn't a horrible thing. The worst that happens is the next time around we try and do the orphan cleanup and we can't find the referenced object and just delete the item and move on. Signed-off-by: Josef Bacik <josef@redhat.com>
2010-12-10NFS: Fix panic after nfs_umount()Chuck Lever
After a few unsuccessful NFS mount attempts in which the client and server cannot agree on an authentication flavor both support, the client panics. nfs_umount() is invoked in the kernel in this case. Turns out nfs_umount()'s UMNT RPC invocation causes the RPC client to write off the end of the rpc_clnt's iostat array. This is because the mount client's nrprocs field is initialized with the count of defined procedures (two: MNT and UMNT), rather than the size of the client's proc array (four). The fix is to use the same initialization technique used by most other upper layer clients in the kernel. Introduced by commit 0b524123, which failed to update nrprocs when support was added for UMNT in the kernel. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=24302 BugLink: http://bugs.launchpad.net/bugs/683938 Reported-by: Stefan Bader <stefan.bader@canonical.com> Tested-by: Stefan Bader <stefan.bader@canonical.com> Cc: stable@kernel.org # >= 2.6.32 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-09Ocfs2: Teach 'coherency=full' O_DIRECT writes to correctly up_read i_alloc_sem.Tristan Ye
Due to newly-introduced 'coherency=full' O_DIRECT writes also takes the EX rw_lock like buffered writes did(rw_level == 1), it turns out messing the usage of 'level' in ocfs2_dio_end_io() up, which caused i_alloc_sem being failed to get up_read'd correctly. This patch tries to teach ocfs2_dio_end_io to understand well on all locking stuffs by explicitly introducing a new bit for i_alloc_sem in iocb's private data, just like what we did for rw_lock. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-12-09ocfs2/dlm: Migrate lockres with no locks if it has a referenceSunil Mushran
o2dlm was not migrating resources with zero locks because it assumed that that resource would get purged by dlm_thread. However, some usage patterns involve creating and dropping locks at a high rate leading to the migrate thread seeing zero locks but the purge thread seeing an active reference. When this happens, the dlm_thread cannot purge the resource and the migrate thread sees no reason to migrate that resource. The spell is broken when the migrate thread catches the resource with a lock. The fix is to make the migrate thread also consider the reference map. This usage pattern can be triggered by userspace on userdlm locks and flocks. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-12-09xfs: log timestamp changes to the source inode in renameChristoph Hellwig
Now that we don't mark VFS inodes dirty anymore for internal timestamp changes, but rely on the transaction subsystem to push them out, we need to explicitly log the source inode in rename after updating it's timestamps to make sure the changes actually get forced out by sync/fsync or an AIL push. We already account for the fourth inode in the log reservation, as a rename of directories needs to update the nlink field, so just adding the xfs_trans_log_inode call is enough. This fixes the xfsqa 065 regression introduced by: "xfs: don't use vfs writeback for pure metadata modifications" Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-12-09Btrfs: fixup return code for btrfs_del_orphan_itemJosef Bacik
If the orphan item doesn't exist, we return 1, which doesn't make any sense to the callers. Instead return -ENOENT if we didn't find the item. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-12-09Btrfs: do not do fast caching if we are allocating blocks for tree_rootJosef Bacik
Since the fast caching uses normal tree locking, we can possibly deadlock if we get to the caching via a btrfs_search_slot() on the tree_root. So just check to see if the root we are on is the tree root, and just don't do the fast caching. Reported-by: Sage Weil <sage@newdream.net> Signed-off-by: Josef Bacik <josef@redhat.com>
2010-12-09Btrfs: deal with space cache errors betterJosef Bacik
Currently if the space cache inode generation number doesn't match the generation number in the space cache header we will just fail to load the space cache, but we won't mark the space cache as an error, so we'll keep getting that error each time somebody tries to cache that block group until we actually clear the thing. Fix this by marking the space cache as having an error so we only get the message once. This patch also makes it so that we don't try and setup space cache for a block group that isn't cached, since we won't be able to write it out anyway. None of these problems are actual problems, they are just annoying and sub-optimal. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2010-12-09Btrfs: fix use after free in O_DIRECTJosef Bacik
This fixes a bug where we use dip after we have freed it. Instead just use the file_offset that was passed to the function. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>