<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs, branch v3.2.59</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/fs?h=v3.2.59</id>
<link rel='self' href='https://git.amat.us/linux/atom/fs?h=v3.2.59'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2014-05-18T13:58:07Z</updated>
<entry>
<title>Btrfs: fix inode caching vs tree log</title>
<updated>2014-05-18T13:58:07Z</updated>
<author>
<name>Miao Xie</name>
<email>miaox@cn.fujitsu.com</email>
</author>
<published>2014-04-23T11:33:36Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d6db8ad79435e2bf42e446cc13ad135dd5ca1f71'/>
<id>urn:sha1:d6db8ad79435e2bf42e446cc13ad135dd5ca1f71</id>
<content type='text'>
commit 1c70d8fb4dfa95bee491816b2a6767b5ca1080e7 upstream.

Currently, with inode cache enabled, we will reuse its inode id immediately
after unlinking file, we may hit something like following:

|-&gt;iput inode
|-&gt;return inode id into inode cache
|-&gt;create dir,fsync
|-&gt;power off

An easy way to reproduce this problem is:

mkfs.btrfs -f /dev/sdb
mount /dev/sdb /mnt -o inode_cache,commit=100
dd if=/dev/zero of=/mnt/data bs=1M count=10 oflag=sync
inode_id=`ls -i /mnt/data | awk '{print $1}'`
rm -f /mnt/data

i=1
while [ 1 ]
do
        mkdir /mnt/dir_$i
        test1=`stat /mnt/dir_$i | grep Inode: | awk '{print $4}'`
        if [ $test1 -eq $inode_id ]
        then
		dd if=/dev/zero of=/mnt/dir_$i/data bs=1M count=1 oflag=sync
		echo b &gt; /proc/sysrq-trigger
	fi
	sleep 1
        i=$(($i+1))
done

mount /dev/sdb /mnt
umount /dev/sdb
btrfs check /dev/sdb

We fix this problem by adding unlinked inode's id into pinned tree,
and we can not reuse them until committing transaction.

Signed-off-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: Wang Shilong &lt;wangsl.fnst@cn.fujitsu.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>Btrfs: Don't allocate inode that is already in use</title>
<updated>2014-05-18T13:58:07Z</updated>
<author>
<name>Stefan Behrens</name>
<email>sbehrens@giantdisaster.de</email>
</author>
<published>2013-10-15T18:08:15Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=8bbfb31d6b0eef58364978299a90bb37dc8e01f0'/>
<id>urn:sha1:8bbfb31d6b0eef58364978299a90bb37dc8e01f0</id>
<content type='text'>
commit ff76b0565523319d7c1c0b51d5a5a8915d33efab upstream.

Due to an off-by-one error, it is possible to reproduce a bug
when the inode cache is used.

The same inode number is assigned twice, the second time this
leads to an EEXIST in btrfs_insert_empty_items().

The issue can happen when a file is removed right after a subvolume
is created and then a new inode number is created before the
inodes in free_inode_pinned are processed.
unlink() calls btrfs_return_ino() which calls start_caching() in this
case which adds [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID] by
searching for the highest inode (which already cannot find the
unlinked one anymore in btrfs_find_free_objectid()). So if this
unlinked inode's number is equal to the highest_ino + 1 (or &gt;= this value
instead of &gt; this value which was the off-by-one error), we mustn't add
the inode number to free_ino_pinned (caching_thread() does it right).
In this case we need to try directly to add the number to the inode_cache
which will fail in this case.

When this inode number is allocated while it is still in free_ino_pinned,
it is allocated and still added to the free inode cache when the
pinned inodes are processed, thus one of the following inode number
allocations will get an inode that is already in use and fail with EEXIST
in btrfs_insert_empty_items().

One example which was created with the reproducer below:
Create a snapshot, work in the newly created snapshot for the rest.
In unlink(inode 34284) call btrfs_return_ino() which calls start_caching().
start_caching() calls add_free_space [34284, 18446744073709517077].
In btrfs_return_ino(), call start_caching pinned [34284, 1] which is wrong.
mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
btrfs_unpin_free_ino calls add_free_space [34284, 1].
mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
EEXIST when the new inode is inserted.

One possible reproducer is this one:
 #!/bin/sh
 # preparation
TEST_DEV=/dev/sdc1
TEST_MNT=/mnt
umount ${TEST_MNT} 2&gt;/dev/null || true
mkfs.btrfs -f ${TEST_DEV}
mount ${TEST_DEV} ${TEST_MNT} -o \
 rw,relatime,compress=lzo,space_cache,inode_cache
btrfs subv create ${TEST_MNT}/s1
for i in `seq 34027`; do touch ${TEST_MNT}/s1/${i}; done
btrfs subv snap ${TEST_MNT}/s1 ${TEST_MNT}/s2
FILENAME=`find ${TEST_MNT}/s1/ -inum 4085 | sed 's|^.*/\([^/]*\)$|\1|'`
rm ${TEST_MNT}/s2/$FILENAME
touch ${TEST_MNT}/s2/$FILENAME
 # the following steps can be repeated to reproduce the issue again and again
[ -e ${TEST_MNT}/s3 ] &amp;&amp; btrfs subv del ${TEST_MNT}/s3
btrfs subv snap ${TEST_MNT}/s2 ${TEST_MNT}/s3
rm ${TEST_MNT}/s3/$FILENAME
touch ${TEST_MNT}/s3/$FILENAME
ls -alFi ${TEST_MNT}/s?/$FILENAME
touch ${TEST_MNT}/s3/_1 || logger FAILED
ls -alFi ${TEST_MNT}/s?/_1
touch ${TEST_MNT}/s3/_2 || logger FAILED
ls -alFi ${TEST_MNT}/s?/_2
touch ${TEST_MNT}/s3/__1 || logger FAILED
ls -alFi ${TEST_MNT}/s?/__1
touch ${TEST_MNT}/s3/__2 || logger FAILED
ls -alFi ${TEST_MNT}/s?/__2
 # if the above is not enough, add the following loop:
for i in `seq 3 9`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
 #for i in `seq 3 34027`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
 # one of the touch(1) calls in s3 fail due to EEXIST because the inode is
 # already in use that btrfs_find_ino_for_alloc() returns.

Signed-off-by: Stefan Behrens &lt;sbehrens@giantdisaster.de&gt;
Reviewed-by: Jan Schmidt &lt;list.btrfs@jan-o-sch.net&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fusionio.com&gt;
Signed-off-by: Chris Mason &lt;chris.mason@fusionio.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>nfsd: set timeparms.to_maxval in setup_callback_client</title>
<updated>2014-05-18T13:58:05Z</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@redhat.com</email>
</author>
<published>2014-04-15T12:51:48Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9b492e65e428b1246074f04702bf4b81ad11efd0'/>
<id>urn:sha1:9b492e65e428b1246074f04702bf4b81ad11efd0</id>
<content type='text'>
commit 3758cf7e14b753838fe754ede3862af10b35fdac upstream.

...otherwise the logic in the timeout handling doesn't work correctly.

Spotted-by: Trond Myklebust &lt;trond.myklebust@primarydata.com&gt;
Signed-off-by: Jeff Layton &lt;jlayton@redhat.com&gt;
Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
[bwh: Backported to 3.2: max_cb_time() takes no parameters]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>locks: allow __break_lease to sleep even when break_time is 0</title>
<updated>2014-05-18T13:58:03Z</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@redhat.com</email>
</author>
<published>2014-04-15T10:17:49Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=48633e72d8546a791ac3d0d8c72f7629c9643ef7'/>
<id>urn:sha1:48633e72d8546a791ac3d0d8c72f7629c9643ef7</id>
<content type='text'>
commit f1c6bb2cb8b81013e8979806f8e15e3d53efb96d upstream.

A fl-&gt;fl_break_time of 0 has a special meaning to the lease break code
that basically means "never break the lease". knfsd uses this to ensure
that leases don't disappear out from under it.

Unfortunately, the code in __break_lease can end up passing this value
to wait_event_interruptible as a timeout, which prevents it from going
to sleep at all. This makes __break_lease to spin in a tight loop and
causes soft lockups.

Fix this by ensuring that we pass a minimum value of 1 as a timeout
instead.

Cc: J. Bruce Fields &lt;bfields@fieldses.org&gt;
Reported-by: Terry Barnaby &lt;terry1@beam.ltd.uk&gt;
Signed-off-by: Jeff Layton &lt;jlayton@redhat.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>ext4: use i_size_read in ext4_unaligned_aio()</title>
<updated>2014-05-18T13:58:03Z</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2014-04-12T16:45:25Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=0289029f8bbdada97ee659ff4d9ac17940b56844'/>
<id>urn:sha1:0289029f8bbdada97ee659ff4d9ac17940b56844</id>
<content type='text'>
commit 6e6358fc3c3c862bfe9a5bc029d3f8ce43dc9765 upstream.

We haven't taken i_mutex yet, so we need to use i_size_read().

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>ext4: note the error in ext4_end_bio()</title>
<updated>2014-05-18T13:58:03Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@linux.intel.com</email>
</author>
<published>2014-04-07T14:54:20Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=81692a1db0ee5276e3dcae9335346ba9712b93ce'/>
<id>urn:sha1:81692a1db0ee5276e3dcae9335346ba9712b93ce</id>
<content type='text'>
commit 9503c67c93ed0b95ba62d12d1fd09da6245dbdd6 upstream.

ext4_end_bio() currently throws away the error that it receives.  Chances
are this is part of a spate of errors, one of which will end up getting
the error returned to userspace somehow, but we shouldn't take that risk.
Also print out the errno to aid in debug.

Signed-off-by: Matthew Wilcox &lt;matthew.r.wilcox@intel.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>ext4: FIBMAP ioctl causes BUG_ON due to handle EXT_MAX_BLOCKS</title>
<updated>2014-05-18T13:58:02Z</updated>
<author>
<name>Kazuya Mio</name>
<email>k-mio@sx.jp.nec.com</email>
</author>
<published>2014-04-07T14:53:28Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=5e76e584d0b4e24eed04871d807b3081e97bcfb2'/>
<id>urn:sha1:5e76e584d0b4e24eed04871d807b3081e97bcfb2</id>
<content type='text'>
commit 4adb6ab3e0fa71363a5ef229544b2d17de6600d7 upstream.

When we try to get 2^32-1 block of the file which has the extent
(ee_block=2^32-2, ee_len=1) with FIBMAP ioctl, it causes BUG_ON
in ext4_ext_put_gap_in_cache().

To avoid the problem, ext4_map_blocks() needs to check the file logical block
number. ext4_ext_put_gap_in_cache() called via ext4_map_blocks() cannot
handle 2^32-1 because the maximum file logical block number is 2^32-2.

Note that ext4_ind_map_blocks() returns -EIO when the block number is invalid.
So ext4_map_blocks() should also return the same errno.

Signed-off-by: Kazuya Mio &lt;k-mio@sx.jp.nec.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>ocfs2: do not put bh when buffer_uptodate failed</title>
<updated>2014-04-30T15:23:25Z</updated>
<author>
<name>alex chen</name>
<email>alex.chen@huawei.com</email>
</author>
<published>2014-04-03T21:47:05Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4c5dd0ae37e1409250ab628b86facb574588716c'/>
<id>urn:sha1:4c5dd0ae37e1409250ab628b86facb574588716c</id>
<content type='text'>
commit f7cf4f5bfe073ad792ab49c04f247626b3e38db6 upstream.

Do not put bh when buffer_uptodate failed in ocfs2_write_block and
ocfs2_write_super_or_backup, because it will put bh in b_end_io.
Otherwise it will hit a warning "VFS: brelse: Trying to free free
buffer".

Signed-off-by: Alex Chen &lt;alex.chen@huawei.com&gt;
Reviewed-by: Joseph Qi &lt;joseph.qi@huawei.com&gt;
Reviewed-by: Srinivas Eeda &lt;srinivas.eeda@oracle.com&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.com&gt;
Acked-by: Joel Becker &lt;jlbec@evilplan.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>ocfs2: dlm: fix recovery hung</title>
<updated>2014-04-30T15:23:25Z</updated>
<author>
<name>Junxiao Bi</name>
<email>junxiao.bi@oracle.com</email>
</author>
<published>2014-04-03T21:46:51Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f5c4690c8baf89d07433af83ab029d3b554955ea'/>
<id>urn:sha1:f5c4690c8baf89d07433af83ab029d3b554955ea</id>
<content type='text'>
commit ded2cf71419b9353060e633b59e446c42a6a2a09 upstream.

There is a race window in dlm_do_recovery() between dlm_remaster_locks()
and dlm_reset_recovery() when the recovery master nearly finish the
recovery process for a dead node.  After the master sends FINALIZE_RECO
message in dlm_remaster_locks(), another node may become the recovery
master for another dead node, and then send the BEGIN_RECO message to
all the nodes included the old master, in the handler of this message
dlm_begin_reco_handler() of old master, dlm-&gt;reco.dead_node and
dlm-&gt;reco.new_master will be set to the second dead node and the new
master, then in dlm_reset_recovery(), these two variables will be reset
to default value.  This will cause new recovery master can not finish
the recovery process and hung, at last the whole cluster will hung for
recovery.

old recovery master:                                 new recovery master:
dlm_remaster_locks()
                                                  become recovery master for
                                                  another dead node.
                                                  dlm_send_begin_reco_message()
dlm_begin_reco_handler()
{
 if (dlm-&gt;reco.state &amp; DLM_RECO_STATE_FINALIZE) {
  return -EAGAIN;
 }
 dlm_set_reco_master(dlm, br-&gt;node_idx);
 dlm_set_reco_dead_node(dlm, br-&gt;dead_node);
}
dlm_reset_recovery()
{
 dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
 dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM);
}
                                                  will hang in dlm_remaster_locks() for
                                                  request dlm locks info

Before send FINALIZE_RECO message, recovery master should set
DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done,
this can break the race windows as the BEGIN_RECO messages will not be
handled before DLM_RECO_STATE_FINALIZE flag is cleared.

A similar race may happen between new recovery master and normal node
which is in dlm_finalize_reco_handler(), also fix it.

Signed-off-by: Junxiao Bi &lt;junxiao.bi@oracle.com&gt;
Reviewed-by: Srinivas Eeda &lt;srinivas.eeda@oracle.com&gt;
Reviewed-by: Wengang Wang &lt;wen.gang.wang@oracle.com&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>ocfs2: dlm: fix lock migration crash</title>
<updated>2014-04-30T15:23:25Z</updated>
<author>
<name>Junxiao Bi</name>
<email>junxiao.bi@oracle.com</email>
</author>
<published>2014-04-03T21:46:49Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=ca2ba53d4db54ef84aad522cef202f89443a52ca'/>
<id>urn:sha1:ca2ba53d4db54ef84aad522cef202f89443a52ca</id>
<content type='text'>
commit 34aa8dac482f1358d59110d5e3a12f4351f6acaa upstream.

This issue was introduced by commit 800deef3f6f8 ("ocfs2: use
list_for_each_entry where benefical") in 2007 where it replaced
list_for_each with list_for_each_entry.  The variable "lock" will point
to invalid data if "tmpq" list is empty and a panic will be triggered
due to this.  Sunil advised reverting it back, but the old version was
also not right.  At the end of the outer for loop, that
list_for_each_entry will also set "lock" to an invalid data, then in the
next loop, if the "tmpq" list is empty, "lock" will be an stale invalid
data and cause the panic.  So reverting the list_for_each back and reset
"lock" to NULL to fix this issue.

Another concern is that this seemes can not happen because the "tmpq"
list should not be empty.  Let me describe how.

old lock resource owner(node 1):                                  migratation target(node 2):
image there's lockres with a EX lock from node 2 in
granted list, a NR lock from node x with convert_type
EX in converting list.
dlm_empty_lockres() {
 dlm_pick_migration_target() {
   pick node 2 as target as its lock is the first one
   in granted list.
 }
 dlm_migrate_lockres() {
   dlm_mark_lockres_migrating() {
     res-&gt;state |= DLM_LOCK_RES_BLOCK_DIRTY;
     wait_event(dlm-&gt;ast_wq, !dlm_lockres_is_dirty(dlm, res));
	 //after the above code, we can not dirty lockres any more,
     // so dlm_thread shuffle list will not run
                                                                   downconvert lock from EX to NR
                                                                   upconvert lock from NR to EX
&lt;&lt;&lt; migration may schedule out here, then
&lt;&lt;&lt; node 2 send down convert request to convert type from EX to
&lt;&lt;&lt; NR, then send up convert request to convert type from NR to
&lt;&lt;&lt; EX, at this time, lockres granted list is empty, and two locks
&lt;&lt;&lt; in the converting list, node x up convert lock followed by
&lt;&lt;&lt; node 2 up convert lock.

	 // will set lockres RES_MIGRATING flag, the following
	 // lock/unlock can not run
     dlm_lockres_release_ast(dlm, res);
   }

   dlm_send_one_lockres()
                                                                 dlm_process_recovery_data()
                                                                   for (i=0; i&lt;mres-&gt;num_locks; i++)
                                                                     if (ml-&gt;node == dlm-&gt;node_num)
                                                                       for (j = DLM_GRANTED_LIST; j &lt;= DLM_BLOCKED_LIST; j++) {
                                                                        list_for_each_entry(lock, tmpq, list)
                                                                        if (lock) break; &lt;&lt;&lt; lock is invalid as grant list is empty.
                                                                       }
                                                                       if (lock-&gt;ml.node != ml-&gt;node)
                                                                         BUG() &gt;&gt;&gt; crash here
 }

I see the above locks status from a vmcore of our internal bug.

Signed-off-by: Junxiao Bi &lt;junxiao.bi@oracle.com&gt;
Reviewed-by: Wengang Wang &lt;wen.gang.wang@oracle.com&gt;
Cc: Sunil Mushran &lt;sunil.mushran@gmail.com&gt;
Reviewed-by: Srinivas Eeda &lt;srinivas.eeda@oracle.com&gt;
Cc: Joel Becker &lt;jlbec@evilplan.org&gt;
Cc: Mark Fasheh &lt;mfasheh@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
</feed>
