aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2013-07-21ext4: fix ext4_get_group_number()Theodore Ts'o
commit 960fd856fdc3b08b3638f3f9b6b4bfceb77660c7 upstream. The function ext4_get_group_number() was introduced as an optimization in commit bd86298e60b8. Unfortunately, this commit incorrectly calculate the group number for file systems with a 1k block size (when s_first_data_block is 1 instead of zero). This could cause the following kernel BUG: [ 568.877799] ------------[ cut here ]------------ [ 568.877833] kernel BUG at fs/ext4/mballoc.c:3728! [ 568.877840] Oops: Exception in kernel mode, sig: 5 [#1] [ 568.877845] SMP NR_CPUS=32 NUMA pSeries [ 568.877852] Modules linked in: binfmt_misc [ 568.877861] CPU: 1 PID: 3516 Comm: fs_mark Not tainted 3.10.0-03216-g7c6809f-dirty #1 [ 568.877867] task: c0000001fb0b8000 ti: c0000001fa954000 task.ti: c0000001fa954000 [ 568.877873] NIP: c0000000002f42a4 LR: c0000000002f4274 CTR: c000000000317ef8 [ 568.877879] REGS: c0000001fa956ed0 TRAP: 0700 Not tainted (3.10.0-03216-g7c6809f-dirty) [ 568.877884] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 24000428 XER: 00000000 [ 568.877902] SOFTE: 1 [ 568.877905] CFAR: c0000000002b5464 [ 568.877908] GPR00: 0000000000000001 c0000001fa957150 c000000000c6a408 c0000001fb588000 GPR04: 0000000000003fff c0000001fa9571c0 c0000001fa9571c4 000138098c50625f GPR08: 1301200000000000 0000000000000002 0000000000000001 0000000000000000 GPR12: 0000000024000422 c00000000f33a300 0000000000008000 c0000001fa9577f0 GPR16: c0000001fb7d0100 c000000000c29190 c0000000007f46e8 c000000000a14672 GPR20: 0000000000000001 0000000000000008 ffffffffffffffff 0000000000000000 GPR24: 0000000000000100 c0000001fa957278 c0000001fdb2bc78 c0000001fa957288 GPR28: 0000000000100100 c0000001fa957288 c0000001fb588000 c0000001fdb2bd10 [ 568.877993] NIP [c0000000002f42a4] .ext4_mb_release_group_pa+0xec/0x1c0 [ 568.877999] LR [c0000000002f4274] .ext4_mb_release_group_pa+0xbc/0x1c0 [ 568.878004] Call Trace: [ 568.878008] [c0000001fa957150] [c0000000002f4274] .ext4_mb_release_group_pa+0xbc/0x1c0 (unreliable) [ 568.878017] [c0000001fa957200] [c0000000002fb070] .ext4_mb_discard_lg_preallocations+0x394/0x444 [ 568.878025] [c0000001fa957340] [c0000000002fb45c] .ext4_mb_release_context+0x33c/0x734 [ 568.878032] [c0000001fa957440] [c0000000002fbcf8] .ext4_mb_new_blocks+0x4a4/0x5f4 [ 568.878039] [c0000001fa957510] [c0000000002ef56c] .ext4_ext_map_blocks+0xc28/0x1178 [ 568.878047] [c0000001fa957640] [c0000000002c1a94] .ext4_map_blocks+0x2c8/0x490 [ 568.878054] [c0000001fa957730] [c0000000002c536c] .ext4_writepages+0x738/0xc60 [ 568.878062] [c0000001fa957950] [c000000000168a78] .do_writepages+0x5c/0x80 [ 568.878069] [c0000001fa9579d0] [c00000000015d1c4] .__filemap_fdatawrite_range+0x88/0xb0 [ 568.878078] [c0000001fa957aa0] [c00000000015d23c] .filemap_write_and_wait_range+0x50/0xfc [ 568.878085] [c0000001fa957b30] [c0000000002b8edc] .ext4_sync_file+0x220/0x3c4 [ 568.878092] [c0000001fa957be0] [c0000000001f849c] .vfs_fsync_range+0x64/0x80 [ 568.878098] [c0000001fa957c70] [c0000000001f84f0] .vfs_fsync+0x38/0x4c [ 568.878105] [c0000001fa957d00] [c0000000001f87f4] .do_fsync+0x54/0x90 [ 568.878111] [c0000001fa957db0] [c0000000001f8894] .SyS_fsync+0x28/0x3c [ 568.878120] [c0000001fa957e30] [c000000000009c88] syscall_exit+0x0/0x7c [ 568.878125] Instruction dump: [ 568.878130] 60000000 813d0034 81610070 38000000 7f8b4800 419e001c 813f007c 7d2bfe70 [ 568.878144] 7d604a78 7c005850 54000ffe 7c0007b4 <0b000000> e8a10076 e87f0090 7fa4eb78 [ 568.878160] ---[ end trace 594d911d9654770b ]--- In addition fix the STD_GROUP optimization so that it works for bigalloc file systems as well. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Li Zhong <lizhongfs@gmail.com> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ext4: fix overflow when counting used blocks on 32-bit architecturesJan Kara
commit 8af8eecc1331dbf5e8c662022272cf667e213da5 upstream. The arithmetics adding delalloc blocks to the number of used blocks in ext4_getattr() can easily overflow on 32-bit archs as we first multiply number of blocks by blocksize and then divide back by 512. Make the arithmetics more clever and also use proper type (unsigned long long instead of unsigned long). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ext4: fix data offset overflow in ext4_xattr_fiemap() on 32-bit archsJan Kara
commit a60697f411eb365fb09e639e6f183fe33d1eb796 upstream. On 32-bit architectures with 32-bit sector_t computation of data offset in ext4_xattr_fiemap() can overflow resulting in reporting bogus data location. Fix the problem by typing block number to proper type before shifting. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ext4: fix overflows in SEEK_HOLE, SEEK_DATA implementationsJan Kara
commit e7293fd146846e2a44d29e0477e0860c60fb856b upstream. ext4_lblk_t is just u32 so multiplying it by blocksize can easily overflow for files larger than 4 GB. Fix that by properly typing the block offsets before shifting. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ext4: fix data offset overflow on 32-bit archs in ext4_inline_data_fiemap()Jan Kara
commit eaf3793728d07d995f1e74250b2d0005f7ae98b5 upstream. On 32-bit archs when sector_t is defined as 32-bit the logic computing data offset in ext4_inline_data_fiemap(). Fix that by properly typing the shifted value. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21Btrfs: only do the tree_mod_log_free_eb if this is our last refJosef Bacik
commit 7fb7d76f96bfcbea25007d190ba828b18e13d29d upstream. There is another bug in the tree mod log stuff in that we're calling tree_mod_log_free_eb every single time a block is cow'ed. The problem with this is that if this block is shared by multiple snapshots we will call this multiple times per block, so if we go to rewind the mod log for this block we'll BUG_ON() in __tree_mod_log_rewind because we try to rewind a free twice. We only want to call tree_mod_log_free_eb if we are actually freeing the block. With this patch I no longer hit the panic in __tree_mod_log_rewind. Thanks, Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21Btrfs: hold the tree mod lock in __tree_mod_log_rewindJosef Bacik
commit f1ca7e98a67da618d8595866e0860308525154da upstream. We need to hold the tree mod log lock in __tree_mod_log_rewind since we walk forward in the tree mod entries, otherwise we'll end up with random entries and trip the BUG_ON() at the front of __tree_mod_log_rewind. This fixes the panics people were seeing when running find /whatever -type f -exec btrfs fi defrag {} \; Thansk, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21Btrfs: fix estale with btrfs sendJosef Bacik
commit 139f807a1eba1e484941a98fb93ee32ad859a6a1 upstream. This fixes bugzilla 57491. If we take a snapshot of a fs with a unlink ongoing and then try to send that root we will run into problems. When comparing with a parent root we will search the parents and the send roots commit_root, which if we've just created the snapshot will include the file that needs to be evicted by the orphan cleanup. So when we find a changed extent we will try and copy that info into the send stream, but when we lookup the inode we use the normal root, which no longer has the inode because the orphan cleanup deleted it. The best solution I have for this is to check our otransid with the generation of the commit root and if they match just commit the transaction again, that way we get the changes from the orphan cleanup. With this patch the reproducer I made for this bugzilla no longer returns ESTALE when trying to do the send. Thanks, Reported-by: Chris Wilson <jakdaw@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ocfs2: xattr: fix inlined xattr reflinkJunxiao Bi
commit ef962df057aaafd714f5c22ba3de1be459571fdf upstream. Inlined xattr shared free space of inode block with inlined data or data extent record, so the size of the later two should be adjusted when inlined xattr is enabled. See ocfs2_xattr_ibody_init(). But this isn't done well when reflink. For inode with inlined data, its max inlined data size is adjusted in ocfs2_duplicate_inline_data(), no problem. But for inode with data extent record, its record count isn't adjusted. Fix it, or data extent record and inlined xattr may overwrite each other, then cause data corruption or xattr failure. One panic caused by this bug in our test environment is the following: kernel BUG at fs/ocfs2/xattr.c:1435! invalid opcode: 0000 [#1] SMP Pid: 10871, comm: multi_reflink_t Not tainted 2.6.39-300.17.1.el5uek #1 RIP: ocfs2_xa_offset_pointer+0x17/0x20 [ocfs2] RSP: e02b:ffff88007a587948 EFLAGS: 00010283 RAX: 0000000000000000 RBX: 0000000000000010 RCX: 00000000000051e4 RDX: ffff880057092060 RSI: 0000000000000f80 RDI: ffff88007a587a68 RBP: ffff88007a587948 R08: 00000000000062f4 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000010 R13: ffff88007a587a68 R14: 0000000000000001 R15: ffff88007a587c68 FS: 00007fccff7f06e0(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000015cf000 CR3: 000000007aa76000 CR4: 0000000000000660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process multi_reflink_t Call Trace: ocfs2_xa_reuse_entry+0x60/0x280 [ocfs2] ocfs2_xa_prepare_entry+0x17e/0x2a0 [ocfs2] ocfs2_xa_set+0xcc/0x250 [ocfs2] ocfs2_xattr_ibody_set+0x98/0x230 [ocfs2] __ocfs2_xattr_set_handle+0x4f/0x700 [ocfs2] ocfs2_xattr_set+0x6c6/0x890 [ocfs2] ocfs2_xattr_user_set+0x46/0x50 [ocfs2] generic_setxattr+0x70/0x90 __vfs_setxattr_noperm+0x80/0x1a0 vfs_setxattr+0xa9/0xb0 setxattr+0xc3/0x120 sys_fsetxattr+0xa8/0xd0 system_call_fastpath+0x16/0x1b Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ext4: check error return from ext4_write_inline_data_end()Theodore Ts'o
commit 42c832debbbf819f6c4ad8601baa559c44105ba4 upstream. The function ext4_write_inline_data_end() can return an error. So we need to assign it to a signed integer variable to check for an error return (since copied is an unsigned int). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree()Al Viro
commit 64cb927371cd2ec43758d8a094a003d27bc3d0dc upstream. Both ext3 and ext4 htree_dirblock_to_tree() is just filling the in-core rbtree for use by call_filldir(). All updates of ->f_pos are done by the latter; bumping it here (on error) is obviously wrong - we might very well have it nowhere near the block we'd found an error in. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21ext4: fix corruption when online resizing a fs with 1K block sizeMaarten ter Huurne
commit 6ca792edc13c409e8d4eb9001e048264c6a2eb64 upstream. Subtracting the number of the first data block places the superblock backups one block too early, corrupting the file system. When the block size is larger than 1K, the first data block is 0, so the subtraction has no effect and no corruption occurs. Signed-off-by: Maarten ter Huurne <maarten@treewalker.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21jbd2: fix theoretical race in jbd2__journal_restartTheodore Ts'o
commit 39c04153fda8c32e85b51c96eb5511a326ad7609 upstream. Once we decrement transaction->t_updates, if this is the last handle holding the transaction from closing, and once we release the t_handle_lock spinlock, it's possible for the transaction to commit and be released. In practice with normal kernels, this probably won't happen, since the commit happens in a separate kernel thread and it's unlikely this could all happen within the space of a few CPU cycles. On the other hand, with a real-time kernel, this could potentially happen, so save the tid found in transaction->t_tid before we release t_handle_lock. It would require an insane configuration, such as one where the jbd2 thread was set to a very high real-time priority, perhaps because a high priority real-time thread is trying to read or write to a file system. But some people who use real-time kernels have been known to do insane things, including controlling laser-wielding industrial robots. :-) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21jbd2: move superblock checksum calculation to jbd2_write_superblock()Theodore Ts'o
commit fe52d17cdd343ac43c85cf72940a58865b9d3bfb upstream. Some of the functions which modify the jbd2 superblock were not updating the checksum before calling jbd2_write_superblock(). Move the call to jbd2_superblock_csum_set() to jbd2_write_superblock(), so that the checksum is calculated consistently. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21CIFS: Fix a deadlock when a file is reopenedPavel Shilovsky
commit 689c3db4d57a73bee6c5ad7797fce7b54d32a87c upstream. If we request reading or writing on a file that needs to be reopened, it causes the deadlock: we are already holding rw semaphore for reading and then we try to acquire it for writing in cifs_relock_file. Fix this by acquiring the semaphore for reading in cifs_relock_file due to we don't make any changes in locks and don't need a write access. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-21CIFS use sensible file nlink values if unprovidedSteve French
commit 6658b9f70ebca5fc0795b1d6d733996af1e2caa7 upstream. Certain servers may not set the NumberOfLinks field in query file/path info responses. In such a case, cifs_inode_needs_reval() assumes that all regular files are hardlinks and triggers revalidation, leading to excessive and unnecessary network traffic. This change hardcodes cf_nlink (and subsequently i_nlink) when not returned by the server, similar to what already occurs in cifs_mkdir(). Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-13nfsd4: fix decoding of compounds across page boundariesJ. Bruce Fields
commit 247500820ebd02ad87525db5d9b199e5b66f6636 upstream. A freebsd NFSv4.0 client was getting rare IO errors expanding a tarball. A network trace showed the server returning BAD_XDR on the final getattr of a getattr+write+getattr compound. The final getattr started on a page boundary. I believe the Linux client ignores errors on the post-write getattr, and that that's why we haven't seen this before. Reported-by: Rick Macklem <rmacklem@uoguelph.ca> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-13NFSv4.1 end back channel session drainingAndy Adamson
commit 62f288a02f97bd9f6b2361a6fff709729fe9e110 upstream. We need to ensure that we clear NFS4_SLOT_TBL_DRAINING on the back channel when we're done recovering the session. Regression introduced by commit 774d5f14e (NFSv4.1 Fix a pNFS session draining deadlock) Signed-off-by: Andy Adamson <andros@netapp.com> [Trond: Changed order to start back-channel first. Minor code cleanup] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-13hpfs: better test for errorsMikulas Patocka
commit 3ebacb05044f82c5f0bb456a894eb9dc57d0ed90 upstream. The test if bitmap access is out of bound could errorneously pass if the device size is divisible by 16384 sectors and we are asking for one bitmap after the end. Check for invalid size in the superblock. Invalid size could cause integer overflows in the rest of the code. Signed-off-by: Mikulas Patocka <mpatocka@artax.karlin.mff.cuni.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-13ceph: fix sleeping function called from invalid context.majianpeng
commit a1dc1937337a93e699eaa56968b7de6e1a9e77cf upstream. [ 1121.231883] BUG: sleeping function called from invalid context at kernel/rwsem.c:20 [ 1121.231935] in_atomic(): 1, irqs_disabled(): 0, pid: 9831, name: mv [ 1121.231971] 1 lock held by mv/9831: [ 1121.231973] #0: (&(&ci->i_ceph_lock)->rlock){+.+...},at:[<ffffffffa02bbd38>] ceph_getxattr+0x58/0x1d0 [ceph] [ 1121.231998] CPU: 3 PID: 9831 Comm: mv Not tainted 3.10.0-rc6+ #215 [ 1121.232000] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011 [ 1121.232027] ffff88006d355a80 ffff880092f69ce0 ffffffff8168348c ffff880092f69cf8 [ 1121.232045] ffffffff81070435 ffff88006d355a20 ffff880092f69d20 ffffffff816899ba [ 1121.232052] 0000000300000004 ffff8800b76911d0 ffff88006d355a20 ffff880092f69d68 [ 1121.232056] Call Trace: [ 1121.232062] [<ffffffff8168348c>] dump_stack+0x19/0x1b [ 1121.232067] [<ffffffff81070435>] __might_sleep+0xe5/0x110 [ 1121.232071] [<ffffffff816899ba>] down_read+0x2a/0x98 [ 1121.232080] [<ffffffffa02baf70>] ceph_vxattrcb_layout+0x60/0xf0 [ceph] [ 1121.232088] [<ffffffffa02bbd7f>] ceph_getxattr+0x9f/0x1d0 [ceph] [ 1121.232093] [<ffffffff81188d28>] vfs_getxattr+0xa8/0xd0 [ 1121.232097] [<ffffffff8118900b>] getxattr+0xab/0x1c0 [ 1121.232100] [<ffffffff811704f2>] ? final_putname+0x22/0x50 [ 1121.232104] [<ffffffff81155f80>] ? kmem_cache_free+0xb0/0x260 [ 1121.232107] [<ffffffff811704f2>] ? final_putname+0x22/0x50 [ 1121.232110] [<ffffffff8109e63d>] ? trace_hardirqs_on+0xd/0x10 [ 1121.232114] [<ffffffff816957a7>] ? sysret_check+0x1b/0x56 [ 1121.232120] [<ffffffff81189c9c>] SyS_fgetxattr+0x6c/0xc0 [ 1121.232125] [<ffffffff81695782>] system_call_fastpath+0x16/0x1b [ 1121.232129] BUG: scheduling while atomic: mv/9831/0x10000002 [ 1121.232154] 1 lock held by mv/9831: [ 1121.232156] #0: (&(&ci->i_ceph_lock)->rlock){+.+...}, at: [<ffffffffa02bbd38>] ceph_getxattr+0x58/0x1d0 [ceph] I think move the ci->i_ceph_lock down is safe because we can't free ceph_inode_info at there. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-29Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ubifs fixes from Al Viro: "A couple of ubifs readdir/lseek race fixes. Stable fodder, really nasty..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: UBIFS: fix a horrid bug UBIFS: prepare to fix a horrid bug
2013-06-29Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Ingo Molnar: "One more fix for a recently discovered bug" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Disable monitoring on setuid processes for regular users
2013-06-29UBIFS: fix a horrid bugArtem Bityutskiy
Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are in the middle of 'ubifs_readdir()'. This means that 'file->private_data' can be freed while 'ubifs_readdir()' uses it, and this is a very bad bug: not only 'ubifs_readdir()' can return garbage, but this may corrupt memory and lead to all kinds of problems like crashes an security holes. This patch fixes the problem by using the 'file->f_version' field, which '->llseek()' always unconditionally sets to zero. We set it to 1 in 'ubifs_readdir()' and whenever we detect that it became 0, we know there was a seek and it is time to clear the state saved in 'file->private_data'. I tested this patch by writing a user-space program which runds readdir and seek in parallell. I could easily crash the kernel without these patches, but could not crash it with these patches. Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29UBIFS: prepare to fix a horrid bugArtem Bityutskiy
Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are in the middle of 'ubifs_readdir()'. First of all, this means that 'file->private_data' can be freed while 'ubifs_readdir()' uses it. But this particular patch does not fix the problem. This patch is only a preparation, and the fix will follow next. In this patch we make 'ubifs_readdir()' stop using 'file->f_pos' directly, because 'file->f_pos' can be changed by '->llseek()' at any point. This may lead 'ubifs_readdir()' to returning inconsistent data: directory entry names may correspond to incorrect file positions. So here we introduce a local variable 'pos', read 'file->f_pose' once at very the beginning, and then stick to 'pos'. The result of this is that when 'ubifs_dir_llseek()' changes 'file->f_pos' while we are in the middle of 'ubifs_readdir()', the latter "wins". Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-26perf: Disable monitoring on setuid processes for regular usersStephane Eranian
There was a a bug in setup_new_exec(), whereby the test to disabled perf monitoring was not correct because the new credentials for the process were not yet committed and therefore the get_dumpable() test was never firing. The patch fixes the problem by moving the perf_event test until after the credentials are committed. Signed-off-by: Stephane Eranian <eranian@google.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-25Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse bugfix from Miklos Szeredi: "This fixes a race between fallocate() and truncate()" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: hold i_mutex in fuse_file_fallocate()
2013-06-23fs: fix new splice.c kernel-doc warningRandy Dunlap
Fix new kernel-doc warning in fs/splice.c: Warning(fs/splice.c:1298): No description found for parameter 'opos' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-20splice: don't pass the address of ->f_pos to methodsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-18fuse: hold i_mutex in fuse_file_fallocate()Maxim Patlasov
Changing size of a file on server and local update (fuse_write_update_size) should be always protected by inode->i_mutex. Otherwise a race like this is possible: 1. Process 'A' calls fallocate(2) to extend file (~FALLOC_FL_KEEP_SIZE). fuse_file_fallocate() sends FUSE_FALLOCATE request to the server. 2. Process 'B' calls ftruncate(2) shrinking the file. fuse_do_setattr() sends shrinking FUSE_SETATTR request to the server and updates local i_size by i_size_write(inode, outarg.attr.size). 3. Process 'A' resumes execution of fuse_file_fallocate() and calls fuse_write_update_size(inode, offset + length). But 'offset + length' was obsoleted by ftruncate from previous step. Changed in v2 (thanks Brian and Anand for suggestions): - made relation between mutex_lock() and fuse_set_nowrite(inode) more explicit and clear. - updated patch description to use ftruncate(2) in example Signed-off-by: Maxim V. Patlasov <MPatlasov@parallels.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-06-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS fixes from Al Viro: "Several fixes + obvious cleanup (you've missed a couple of open-coded can_lookup() back then)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: snd_pcm_link(): fix a leak... use can_lookup() instead of direct checks of ->i_op->lookup move exit_task_namespaces() outside of exit_notify() fput: task_work_add() can fail if the caller has passed exit_task_work() ncpfs: fix rmdir returns Device or resource busy
2013-06-14Merge tag 'for-linus-v3.10-rc6' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs fixes from Ben Myers: - Remove noisy warnings about experimental support which spams the logs - Add padding to align directory and attr structures correctly - Set block number on child buffer on a root btree split - Disable verifiers during log recovery for non-CRC filesystems * tag 'for-linus-v3.10-rc6' of git://oss.sgi.com/xfs/xfs: xfs: don't shutdown log recovery on validation errors xfs: ensure btree root split sets blkno correctly xfs: fix implicit padding in directory and attr CRC formats xfs: don't emit v5 superblock warnings on write
2013-06-15use can_lookup() instead of direct checks of ->i_op->lookupAl Viro
a couple of places got missed back when Linus has introduced that one... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-15fput: task_work_add() can fail if the caller has passed exit_task_work()Oleg Nesterov
fput() assumes that it can't be called after exit_task_work() but this is not true, for example free_ipc_ns()->shm_destroy() can do this. In this case fput() silently leaks the file. Change it to fallback to delayed_fput_work if task_work_add() fails. The patch looks complicated but it is not, it changes the code from if (PF_KTHREAD) { schedule_work(...); return; } task_work_add(...) to if (!PF_KTHREAD) { if (!task_work_add(...)) return; /* fallback */ } schedule_work(...); As for shm_destroy() in particular, we could make another fix but I think this change makes sense anyway. There could be another similar user, it is not safe to assume that task_work_add() can't fail. Reported-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-14xfs: don't shutdown log recovery on validation errorsDave Chinner
Unfortunately, we cannot guarantee that items logged multiple times and replayed by log recovery do not take objects back in time. When they are taken back in time, the go into an intermediate state which is corrupt, and hence verification that occurs on this intermediate state causes log recovery to abort with a corruption shutdown. Instead of causing a shutdown and unmountable filesystem, don't verify post-recovery items before they are written to disk. This is less than optimal, but there is no way to detect this issue for non-CRC filesystems If log recovery successfully completes, this will be undone and the object will be consistent by subsequent transactions that are replayed, so in most cases we don't need to take drastic action. For CRC enabled filesystems, leave the verifiers in place - we need to call them to recalculate the CRCs on the objects anyway. This recovery problem can be solved for such filesystems - we have a LSN stamped in all metadata at writeback time that we can to determine whether the item should be replayed or not. This is a separate piece of work, so is not addressed by this patch. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 9222a9cf86c0d64ffbedf567412b55da18763aa3)
2013-06-14xfs: ensure btree root split sets blkno correctlyDave Chinner
For CRC enabled filesystems, the BMBT is rooted in an inode, so it passes through a different code path on root splits than the freespace and inode btrees. This is much less traversed by xfstests than the other trees. When testing on a 1k block size filesystem, I've been seeing ASSERT failures in generic/234 like: XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_private.b.allocated == 0, file: fs/xfs/xfs_btree.c, line: 317 which are generally preceded by a lblock check failure. I noticed this in the bmbt stats: $ pminfo -f xfs.btree.block_map xfs.btree.block_map.lookup value 39135 xfs.btree.block_map.compare value 268432 xfs.btree.block_map.insrec value 15786 xfs.btree.block_map.delrec value 13884 xfs.btree.block_map.newroot value 2 xfs.btree.block_map.killroot value 0 ..... Very little coverage of root splits and merges. Indeed, on a 4k filesystem, block_map.newroot and block_map.killroot are both zero. i.e. the code is not exercised at all, and it's the only generic btree infrastructure operation that is not exercised by a default run of xfstests. Turns out that on a 1k filesystem, generic/234 accounts for one of those two root splits, and that is somewhat of a smoking gun. In fact, it's the same problem we saw in the directory/attr code where headers are memcpy()d from one block to another without updating the self describing metadata. Simple fix - when copying the header out of the root block, make sure the block number is updated correctly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit ade1335afef556df6538eb02e8c0dc91fbd9cc37)
2013-06-14xfs: fix implicit padding in directory and attr CRC formatsDave Chinner
Michael L. Semon has been testing CRC patches on a 32 bit system and been seeing assert failures in the directory code from xfs/080. Thanks to Michael's heroic efforts with printk debugging, we found that the problem was that the last free space being left in the directory structure was too small to fit a unused tag structure and it was being corrupted and attempting to log a region out of bounds. Hence the assert failure looked something like: ..... #5 calling xfs_dir2_data_log_unused() 36 32 #1 4092 4095 4096 #2 8182 8183 4096 XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568 Where #1 showed the first region of the dup being logged (i.e. the last 4 bytes of a directory buffer) and #2 shows the corrupt values being calculated from the length of the dup entry which overflowed the size of the buffer. It turns out that the problem was not in the logging code, nor in the freespace handling code. It is an initial condition bug that only shows up on 32 bit systems. When a new buffer is initialised, where's the freespace that is set up: [ 172.316249] calling xfs_dir2_leaf_addname() from xfs_dir_createname() [ 172.316346] #9 calling xfs_dir2_data_log_unused() [ 172.316351] #1 calling xfs_trans_log_buf() 60 63 4096 [ 172.316353] #2 calling xfs_trans_log_buf() 4094 4095 4096 Note the offset of the first region being logged? It's 60 bytes into the buffer. Once I saw that, I pretty much knew that the bug was going to be caused by this. Essentially, all direct entries are rounded to 8 bytes in length, and all entries start with an 8 byte alignment. This means that we can decode inplace as variables are naturally aligned. With the directory data supposedly starting on a 8 byte boundary, and all entries padded to 8 bytes, the minimum freespace in a directory block is supposed to be 8 bytes, which is large enough to fit a unused data entry structure (6 bytes in size). The fact we only have 4 bytes of free space indicates a directory data block alignment problem. And what do you know - there's an implicit hole in the directory data block header for the CRC format, which means the header is 60 byte on 32 bit intel systems and 64 bytes on 64 bit systems. Needs padding. And while looking at the structures, I found the same problem in the attr leaf header. Fix them both. Note that this only affects 32 bit systems with CRCs enabled. Everything else is just fine. Note that CRC enabled filesystems created before this fix on such systems will not be readable with this fix applied. Reported-by: Michael L. Semon <mlsemon35@gmail.com> Debugged-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 8a1fd2950e1fe267e11fc8c85dcaa6b023b51b60)
2013-06-14xfs: don't emit v5 superblock warnings on writeDave Chinner
We write the superblock every 30s or so which results in the verifier being called. Right now that results in this output every 30s: XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled! Use of these features in this kernel is at your own risk! And spamming the logs. We don't need to check for whether we support v5 superblocks or whether there are feature bits we don't support set as these are only relevant when we first mount the filesytem. i.e. on superblock read. Hence for the write verification we can just skip all the checks (and hence verbose output) altogether. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 34510185abeaa5be9b178a41c0a03d30aec3db7e)
2013-06-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This is an assortment of crash fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: stop all workers before cleaning up roots Btrfs: fix use-after-free bug during umount Btrfs: init relocate extent_io_tree with a mapping btrfs: Drop inode if inode root is NULL Btrfs: don't delete fs_roots until after we cleanup the transaction
2013-06-12Merge branch 'akpm' (updates from Andrew Morton)Linus Torvalds
Merge misc fixes from Andrew Morton: "Bunch of fixes and one little addition to math64.h" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits) include/linux/math64.h: add div64_ul() mm: memcontrol: fix lockless reclaim hierarchy iterator frontswap: fix incorrect zeroing and allocation size for frontswap_map kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules() mm: migration: add migrate_entry_wait_huge() ocfs2: add missing lockres put in dlm_mig_lockres_handler mm/page_alloc.c: fix watermark check in __zone_watermark_ok() drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info() aio: fix io_destroy() regression by using call_rcu() rtc-at91rm9200: use shadow IMR on at91sam9x5 rtc-at91rm9200: add shadow interrupt mask rtc-at91rm9200: refactor interrupt-register handling rtc-at91rm9200: add configuration support rtc-at91rm9200: add match-table compile guard fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree cciss: fix broken mutex usage in ioctl audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE drivers/rtc/rtc-cmos.c: fix accidentally enabling rtc channel ...
2013-06-12ocfs2: add missing lockres put in dlm_mig_lockres_handlerXue jiufei
dlm_mig_lockres_handler() is missing a dlm_lockres_put() on an error path. Signed-off-by: joyce <xuejiufei@huawei.com> Reviewed-by: shencanquan <shencanquan@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12aio: fix io_destroy() regression by using call_rcu()Kent Overstreet
There was a regression introduced by 36f5588905c1 ("aio: refcounting cleanup"), reported by Jens Axboe - the refcounting cleanup switched to using RCU in the shutdown path, but the synchronize_rcu() was done in the context of the io_destroy() syscall greatly increasing the time it could block. This patch switches it to call_rcu() and makes shutdown asynchronous (more asynchronous than it was originally; before the refcount changes io_destroy() would still wait on pending kiocbs). Note that there's a global quota on the max outstanding kiocbs, and that quota must be manipulated synchronously; otherwise io_setup() could return -EAGAIN when there isn't quota available, and userspace won't have any way of waiting until shutdown of the old kioctxs has finished (besides busy looping). So we release our quota before kioctx shutdown has finished, which should be fine since the quota never corresponded to anything real anyways. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Reported-by: Jens Axboe <axboe@kernel.dk> Tested-by: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Tested-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directoryGoldwyn Rodrigues
While removing a non-empty directory, the kernel dumps a message: (rmdir,21743,1):ocfs2_unlink:953 ERROR: status = -39 Suppress the error message from being printed in the dmesg so users don't panic. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Acked-by: Sunil Mushran <sunil.mushran@gmail.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12ocfs2: ocfs2_prep_new_orphaned_file() should return retXiaowei.Hu
If an error occurs, for example an EIO in __ocfs2_prepare_orphan_dir, ocfs2_prep_new_orphaned_file will release the inode_ac, then when the caller of ocfs2_prep_new_orphaned_file gets a 0 return, it will refer to a NULL ocfs2_alloc_context struct in the following functions. A kernel panic happens. Signed-off-by: "Xiaowei.Hu" <xiaowei.hu@oracle.com> Reviewed-by: shencanquan <shencanquan@huawei.com> Acked-by: Sunil Mushran <sunil.mushran@gmail.com> Cc: Joe Jin <joe.jin@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12kmsg: honor dmesg_restrict sysctl on /dev/kmsgKees Cook
The dmesg_restrict sysctl currently covers the syslog method for access dmesg, however /dev/kmsg isn't covered by the same protections. Most people haven't noticed because util-linux dmesg(1) defaults to using the syslog method for access in older versions. With util-linux dmesg(1) defaults to reading directly from /dev/kmsg. To fix /dev/kmsg, let's compare the existing interfaces and what they allow: - /proc/kmsg allows: - open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive single-reader interface (SYSLOG_ACTION_READ). - everything, after an open. - syslog syscall allows: - anything, if CAP_SYSLOG. - SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if dmesg_restrict==0. - nothing else (EPERM). The use-cases were: - dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs. - sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the destructive SYSLOG_ACTION_READs. AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't clear the ring buffer. Based on the comments in devkmsg_llseek, it sounds like actions besides reading aren't going to be supported by /dev/kmsg (i.e. SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive syslog syscall actions. To this end, move the check as Josh had done, but also rename the constants to reflect their new uses (SYSLOG_FROM_CALL becomes SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC). SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC allows destructive actions after a capabilities-constrained SYSLOG_ACTION_OPEN check. - /dev/kmsg allows: - open if CAP_SYSLOG or dmesg_restrict==0 - reading/polling, after open Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192 [akpm@linux-foundation.org: use pr_warn_once()] Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Christian Kujau <lists@nerdbynature.de> Tested-by: Josh Boyer <jwboyer@redhat.com> Cc: Kay Sievers <kay@vrfy.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph fixes from Sage Weil: "There is a pair of fixes for double-frees in the recent bundle for 3.10, a couple of fixes for long-standing bugs (sleep while atomic and an endianness fix), and a locking fix that can be triggered when osds are going down" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix cleanup in rbd_add() rbd: don't destroy ceph_opts in rbd_add() ceph: ceph_pagelist_append might sleep while atomic ceph: add cpu_to_le32() calls when encoding a reconnect capability libceph: must hold mutex for reset_changed_osds()
2013-06-08hpfs: fix warnings when the filesystem fills upMikulas Patocka
This patch fixes warnings due to missing lock on write error path. WARNING: at fs/hpfs/hpfs_fn.h:353 hpfs_truncate+0x75/0x80 [hpfs]() Hardware name: empty Pid: 26563, comm: dd Tainted: P O 3.9.4 #12 Call Trace: hpfs_truncate+0x75/0x80 [hpfs] hpfs_write_begin+0x84/0x90 [hpfs] _hpfs_bmap+0x10/0x10 [hpfs] generic_file_buffered_write+0x121/0x2c0 __generic_file_aio_write+0x1c7/0x3f0 generic_file_aio_write+0x7c/0x100 do_sync_write+0x98/0xd0 hpfs_file_write+0xd/0x50 [hpfs] vfs_write+0xa2/0x160 sys_write+0x51/0xa0 page_fault+0x22/0x30 system_call_fastpath+0x1a/0x1f Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: stable@kernel.org # 2.6.39+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-08Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: - Trivial: unused variable removal - Posix-timers: Add the clock ID to the new proc interface to make it useful. The interface is new and should be functional when we reach the final 3.10 release. - Cure a false positive warning in the tick code introduced by the overhaul in 3.10 - Fix for a persistent clock detection regression introduced in this cycle * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Correct run-time detection of persistent_clock. ntp: Remove unused variable flags in __hardpps posix-timers: Show clock ID in proc file tick: Cure broadcast false positive pending bit warning
2013-06-08Btrfs: stop all workers before cleaning up rootsJosef Bacik
Dave reported a panic because the extent_root->commit_root was NULL in the caching kthread. That is because we just unset it in free_root_pointers, which is not the correct thing to do, we have to either wait for the caching kthread to complete or hold the extent_commit_sem lock so we know the thread has exited. This patch makes the kthreads all stop first and then we do our cleanup. This should fix the race. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-06-08Btrfs: fix use-after-free bug during umountLiu Bo
Commit be283b2e674a09457d4563729015adb637ce7cc1 ( Btrfs: use helper to cleanup tree roots) introduced the following bug, BUG: unable to handle kernel NULL pointer dereference at 0000000000000034 IP: [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] [...] Pid: 2463, comm: btrfs-cache-1 Tainted: G O 3.9.0+ #4 innotek GmbH VirtualBox/VirtualBox RIP: 0010:[<ffffffffa039368c>] [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] Process btrfs-cache-1 (pid: 2463, threadinfo ffff880112d60000, task ffff880117679730) [...] Call Trace: [<ffffffffa0398a99>] btrfs_search_slot+0x104/0x64d [btrfs] [<ffffffffa039aea4>] btrfs_next_old_leaf+0xa7/0x334 [btrfs] [<ffffffffa039b141>] btrfs_next_leaf+0x10/0x12 [btrfs] [<ffffffffa039ea13>] caching_thread+0x1a3/0x2e0 [btrfs] [<ffffffffa03d8811>] worker_loop+0x14b/0x48e [btrfs] [<ffffffffa03d86c6>] ? btrfs_queue_worker+0x25c/0x25c [btrfs] [<ffffffff81068d3d>] kthread+0x8d/0x95 [<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43 [<ffffffff8151e5ac>] ret_from_fork+0x7c/0xb0 [<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43 RIP [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] We've free'ed commit_root before actually getting to free block groups where caching thread needs valid extent_root->commit_root. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-06-08Btrfs: init relocate extent_io_tree with a mappingJosef Bacik
Dave reported a NULL pointer deref. This is caused because he thought he'd be smart and add sanity checks to the extent_io bit operations, but he didn't expect a tree to have a NULL mapping. To fix this we just need to init the relocation's processed_blocks with the btree_inode->i_mapping. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>