<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/xfs, branch v3.7.9</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/fs/xfs?h=v3.7.9</id>
<link rel='self' href='https://git.amat.us/linux/atom/fs/xfs?h=v3.7.9'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2013-02-17T18:53:11Z</updated>
<entry>
<title>Revert: xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end</title>
<updated>2013-02-17T18:53:11Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2013-02-14T19:22:53Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=c3793e0d94af2071c6f3842dcdb5ea08bd011354'/>
<id>urn:sha1:c3793e0d94af2071c6f3842dcdb5ea08bd011354</id>
<content type='text'>
This reverts commit a56040731e5b00081c6d6c26b99e6e257a5d63d7 which was
commit eb178619f930fa2ba2348de332a1ff1c66a31424 upstream.

It has been reported to cause problems:
	http://bugzilla.redhat.com/show_bug.cgi?id=909602

Acked-by: Ben Myers &lt;bpm@sgi.com&gt;
Acked-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Cc: Brian Foster &lt;bfoster@redhat.com&gt;
Cc: CAI Qian &lt;caiqian@redhat.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>xfs: fix periodic log flushing</title>
<updated>2013-02-04T00:27:07Z</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2013-01-25T18:45:07Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=8121aecbe073a880758ab5464a305361dbcb7134'/>
<id>urn:sha1:8121aecbe073a880758ab5464a305361dbcb7134</id>
<content type='text'>
[Please take this patch for -stable in kernels 3.5-3.7.  It doesn't have an
equivalent upstream commit because the code was removed before the bug was
discovered.  See f661f1e0bf50 and 7e18530bef6a.]

There is a logic inversion in xfssyncd_worker() which means that the
log is not periodically forced or idled correctly. This means that
metadata changes aggregated in memory do not get flushed in a timely
manner, and hence if filesystem is not cleanly unmounted those
changes can be lost. This loss can manifest itself even hours after
the changes were made if the filesystem is left to idle without a
sync() occurring between the last modification and the
crash/shutdown occuring.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Ben Myers &lt;bpm@sgi.com&gt;
Signed-off-by: Ben Myers &lt;bpm@sgi.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end</title>
<updated>2013-02-04T00:27:06Z</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2013-01-21T12:53:52Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a56040731e5b00081c6d6c26b99e6e257a5d63d7'/>
<id>urn:sha1:a56040731e5b00081c6d6c26b99e6e257a5d63d7</id>
<content type='text'>
commit eb178619f930fa2ba2348de332a1ff1c66a31424 upstream.

When _xfs_buf_find is passed an out of range address, it will fail
to find a relevant struct xfs_perag and oops with a null
dereference. This can happen when trying to walk a filesystem with a
metadata inode that has a partially corrupted extent map (i.e. the
block number returned is corrupt, but is otherwise intact) and we
try to read from the corrupted block address.

In this case, just fail the lookup. If it is readahead being issued,
it will simply not be done, but if it is real read that fails we
will get an error being reported.  Ideally this case should result
in an EFSCORRUPTED error being reported, but we cannot return an
error through xfs_buf_read() or xfs_buf_get() so this lookup failure
may result in ENOMEM or EIO errors being reported instead.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Brian Foster &lt;bfoster@redhat.com&gt;
Reviewed-by: Ben Myers &lt;bpm@sgi.com&gt;
Signed-off-by: Ben Myers &lt;bpm@sgi.com&gt;
Cc: CAI Qian &lt;caiqian@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>xfs: Fix possible use-after-free with AIO</title>
<updated>2013-02-04T00:27:01Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2013-01-23T12:56:18Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=2698f16a24e6ec48de944f00aef812f19247af60'/>
<id>urn:sha1:2698f16a24e6ec48de944f00aef812f19247af60</id>
<content type='text'>
commit 4b05d09c18d9aa62d2e7fb4b057f54e5a38963f5 upstream.

Running AIO is pinning inode in memory using file reference. Once AIO
is completed using aio_complete(), file reference is put and inode can
be freed from memory. So we have to be sure that calling aio_complete()
is the last thing we do with the inode.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
CC: xfs@oss.sgi.com
CC: Ben Myers &lt;bpm@sgi.com&gt;
Reviewed-by: Ben Myers &lt;bpm@sgi.com&gt;
Signed-off-by: Ben Myers &lt;bpm@sgi.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>xfs: fix stray dquot unlock when reclaiming dquots</title>
<updated>2013-01-11T17:18:53Z</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2012-11-28T02:01:02Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f8b723cc9f1a9eec83c8527d67eedc6249b7fb66'/>
<id>urn:sha1:f8b723cc9f1a9eec83c8527d67eedc6249b7fb66</id>
<content type='text'>
commit b870553cdecb26d5291af09602352b763e323df2 upstream.

When we fail to get a dquot lock during reclaim, we jump to an error
handler that unlocks the dquot. This is wrong as we didn't lock the
dquot, and unlocking it means who-ever is holding the lock has had
it silently taken away, and hence it results in a lock imbalance.

Found by inspection while modifying the code for the numa-lru
patchset. This fixes a random hang I've been seeing on xfstest 232
for the past several months.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Ben Myers &lt;bpm@sgi.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>xfs: fix direct IO nested transaction deadlock.</title>
<updated>2013-01-11T17:18:53Z</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2012-11-28T02:01:00Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=82251473514fde36ce8dc0167f41eaf7ef6a4cb9'/>
<id>urn:sha1:82251473514fde36ce8dc0167f41eaf7ef6a4cb9</id>
<content type='text'>
commit 437a255aa23766666aec78af63be4c253faa8d57 upstream.

The direct IO path can do a nested transaction reservation when
writing past the EOF. The first transaction is the append
transaction for setting the filesize at IO completion, but we can
also need a transaction for allocation of blocks. If the log is low
on space due to reservations and small log, the append transaction
can be granted after wating for space as the only active transaction
in the system. This then attempts a reservation for an allocation,
which there isn't space in the log for, and the reservation sleeps.
The result is that there is nothing left in the system to wake up
all the processes waiting for log space to come free.

The stack trace that shows this deadlock is relatively innocuous:

 xlog_grant_head_wait
 xlog_grant_head_check
 xfs_log_reserve
 xfs_trans_reserve
 xfs_iomap_write_direct
 __xfs_get_blocks
 xfs_get_blocks_direct
 do_blockdev_direct_IO
 __blockdev_direct_IO
 xfs_vm_direct_IO
 generic_file_direct_write
 xfs_file_dio_aio_writ
 xfs_file_aio_write
 do_sync_write
 vfs_write

This was discovered on a filesystem with a log of only 10MB, and a
log stripe unit of 256k whih increased the base reservations by
512k. Hence a allocation transaction requires 1.2MB of log space to
be available instead of only 260k, and so greatly increased the
chance that there wouldn't be enough log space available for the
nested transaction to succeed. The key to reproducing it is this
mkfs command:

mkfs.xfs -f -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b $SCRATCH_DEV

The test case was a 1000 fsstress processes running with random
freeze and unfreezes every few seconds. Thanks to Eryu Guan
(eguan@redhat.com) for writing the test that found this on a system
with a somewhat unique default configuration....

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Andrew Dahl &lt;adahl@sgi.com&gt;
Signed-off-by: Ben Myers &lt;bpm@sgi.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>xfs: drop buffer io reference when a bad bio is built</title>
<updated>2012-11-17T15:36:57Z</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2012-11-12T11:09:46Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d69043c42d8c6414fa28ad18d99973aa6c1c2e24'/>
<id>urn:sha1:d69043c42d8c6414fa28ad18d99973aa6c1c2e24</id>
<content type='text'>
Error handling in xfs_buf_ioapply_map() does not handle IO reference
counts correctly. We increment the b_io_remaining count before
building the bio, but then fail to decrement it in the failure case.
This leads to the buffer never running IO completion and releasing
the reference that the IO holds, so at unmount we can leak the
buffer. This leak is captured by this assert failure during unmount:

XFS: Assertion failed: atomic_read(&amp;pag-&gt;pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 273

This is not a new bug - the b_io_remaining accounting has had this
problem for a long, long time - it's just very hard to get a
zero length bio being built by this code...

Further, the buffer IO error can be overwritten on a multi-segment
buffer by subsequent bio completions for partial sections of the
buffer. Hence we should only set the buffer error status if the
buffer is not already carrying an error status. This ensures that a
partial IO error on a multi-segment buffer will not be lost. This
part of the problem is a regression, however.

cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Mark Tinguely &lt;tinguely@sgi.com&gt;
Signed-off-by: Ben Myers &lt;bpm@sgi.com&gt;
</content>
</entry>
<entry>
<title>xfs: fix broken error handling in xfs_vm_writepage</title>
<updated>2012-11-17T15:35:42Z</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2012-11-12T11:09:45Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3daed8bc3e49b9695ae931b9f472b5b90d1965b3'/>
<id>urn:sha1:3daed8bc3e49b9695ae931b9f472b5b90d1965b3</id>
<content type='text'>
When we shut down the filesystem, it might first be detected in
writeback when we are allocating a inode size transaction. This
happens after we have moved all the pages into the writeback state
and unlocked them. Unfortunately, if we fail to set up the
transaction we then abort writeback and try to invalidate the
current page. This then triggers are BUG() in block_invalidatepage()
because we are trying to invalidate an unlocked page.

Fixing this is a bit of a chicken and egg problem - we can't
allocate the transaction until we've clustered all the pages into
the IO and we know the size of it (i.e. whether the last block of
the IO is beyond the current EOF or not). However, we don't want to
hold pages locked for long periods of time, especially while we lock
other pages to cluster them into the write.

To fix this, we need to make a clear delineation in writeback where
errors can only be handled by IO completion processing. That is,
once we have marked a page for writeback and unlocked it, we have to
report errors via IO completion because we've already started the
IO. We may not have submitted any IO, but we've changed the page
state to indicate that it is under IO so we must now use the IO
completion path to report errors.

To do this, add an error field to xfs_submit_ioend() to pass it the
error that occurred during the building on the ioend chain. When
this is non-zero, mark each ioend with the error and call
xfs_finish_ioend() directly rather than building bios. This will
immediately push the ioends through completion processing with the
error that has occurred.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Mark Tinguely &lt;tinguely@sgi.com&gt;
Signed-off-by: Ben Myers &lt;bpm@sgi.com&gt;
</content>
</entry>
<entry>
<title>xfs: fix attr tree double split corruption</title>
<updated>2012-11-17T15:34:13Z</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2012-11-12T11:09:44Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=42e2976f131d65555d5c1d6c3d47facc63577814'/>
<id>urn:sha1:42e2976f131d65555d5c1d6c3d47facc63577814</id>
<content type='text'>
In certain circumstances, a double split of an attribute tree is
needed to insert or replace an attribute. In rare situations, this
can go wrong, leaving the attribute tree corrupted. In this case,
the attr being replaced is the last attr in a leaf node, and the
replacement is larger so doesn't fit in the same leaf node.
When we have the initial condition of a node format attribute
btree with two leaves at index 1 and 2. Call them L1 and L2.  The
leaf L1 is completely full, there is not a single byte of free space
in it. L2 is mostly empty.  The attribute being replaced - call it X
- is the last attribute in L1.

The way an attribute replace is executed is that the replacement
attribute - call it Y - is first inserted into the tree, but has an
INCOMPLETE flag set on it so that list traversals ignore it. Once
this transaction is committed, a second transaction it run to
atomically mark Y as COMPLETE and X as INCOMPLETE, so that a
traversal will now find Y and skip X. Once that transaction is
committed, attribute X is then removed.

So, the initial condition is:

     +--------+     +--------+
     |   L1   |     |   L2   |
     | fwd: 2 |----&gt;| fwd: 0 |
     | bwd: 0 |&lt;----| bwd: 1 |
     | fsp: 0 |     | fsp: N |
     |--------|     |--------|
     | attr A |     | attr 1 |
     |--------|     |--------|
     | attr B |     | attr 2 |
     |--------|     |--------|
     ..........     ..........
     |--------|     |--------|
     | attr X |     | attr n |
     +--------+     +--------+

So now we go to replace X, and see that L1:fsp = 0 - it is full so
we can't insert Y in the same leaf. So we record the the location of
attribute X so we can track it for later use, then we split L1 into
L1 and L3 and reblance across the two leafs. We end with:

     +--------+     +--------+     +--------+
     |   L1   |     |   L3   |     |   L2   |
     | fwd: 3 |----&gt;| fwd: 2 |----&gt;| fwd: 0 |
     | bwd: 0 |&lt;----| bwd: 1 |&lt;----| bwd: 3 |
     | fsp: M |     | fsp: J |     | fsp: N |
     |--------|     |--------|     |--------|
     | attr A |     | attr X |     | attr 1 |
     |--------|     +--------+     |--------|
     | attr B |                    | attr 2 |
     |--------|                    |--------|
     ..........                    ..........
     |--------|                    |--------|
     | attr W |                    | attr n |
     +--------+                    +--------+

And we track that the original attribute is now at L3:0.

We then try to insert Y into L1 again, and find that there isn't
enough room because the new attribute is larger than the old one.
Hence we have to split again to make room for Y. We end up with
this:

     +--------+     +--------+     +--------+     +--------+
     |   L1   |     |   L4   |     |   L3   |     |   L2   |
     | fwd: 4 |----&gt;| fwd: 3 |----&gt;| fwd: 2 |----&gt;| fwd: 0 |
     | bwd: 0 |&lt;----| bwd: 1 |&lt;----| bwd: 4 |&lt;----| bwd: 3 |
     | fsp: M |     | fsp: J |     | fsp: J |     | fsp: N |
     |--------|     |--------|     |--------|     |--------|
     | attr A |     | attr Y |     | attr X |     | attr 1 |
     |--------|     + INCOMP +     +--------+     |--------|
     | attr B |     +--------+                    | attr 2 |
     |--------|                                   |--------|
     ..........                                   ..........
     |--------|                                   |--------|
     | attr W |                                   | attr n |
     +--------+                                   +--------+

And now we have the new (incomplete) attribute @ L4:0, and the
original attribute at L3:0. At this point, the first transaction is
committed, and we move to the flipping of the flags.

This is where we are supposed to end up with this:

     +--------+     +--------+     +--------+     +--------+
     |   L1   |     |   L4   |     |   L3   |     |   L2   |
     | fwd: 4 |----&gt;| fwd: 3 |----&gt;| fwd: 2 |----&gt;| fwd: 0 |
     | bwd: 0 |&lt;----| bwd: 1 |&lt;----| bwd: 4 |&lt;----| bwd: 3 |
     | fsp: M |     | fsp: J |     | fsp: J |     | fsp: N |
     |--------|     |--------|     |--------|     |--------|
     | attr A |     | attr Y |     | attr X |     | attr 1 |
     |--------|     +--------+     + INCOMP +     |--------|
     | attr B |                    +--------+     | attr 2 |
     |--------|                                   |--------|
     ..........                                   ..........
     |--------|                                   |--------|
     | attr W |                                   | attr n |
     +--------+                                   +--------+

But that doesn't happen properly - the attribute tracking indexes
are not pointing to the right locations. What we end up with is both
the old attribute to be removed pointing at L4:0 and the new
attribute at L4:1.  On a debug kernel, this assert fails like so:

XFS: Assertion failed: args-&gt;index2 &lt; be16_to_cpu(leaf2-&gt;hdr.count), file: fs/xfs/xfs_attr_leaf.c, line: 2725

because the new attribute location does not exist. On a production
kernel, this goes unnoticed and the code proceeds ahead merrily and
removes L4 because it thinks that is the block that is no longer
needed. This leaves the hash index node pointing to entries
L1, L4 and L2, but only blocks L1, L3 and L2 to exist. Further, the
leaf level sibling list is L1 &lt;-&gt; L4 &lt;-&gt; L2, but L4 is now free
space, and so everything is busted. This corruption is caused by the
removal of the old attribute triggering a join - it joins everything
correctly but then frees the wrong block.

xfs_repair will report something like:

bad sibling back pointer for block 4 in attribute fork for inode 131
problem with attribute contents in inode 131
would clear attr fork
bad nblocks 8 for inode 131, would reset to 3
bad anextents 4 for inode 131, would reset to 0

The problem lies in the assignment of the old/new blocks for
tracking purposes when the double leaf split occurs. The first split
tries to place the new attribute inside the current leaf (i.e.
"inleaf == true") and moves the old attribute (X) to the new block.
This sets up the old block/index to L1:X, and newly allocated
block to L3:0. It then moves attr X to the new block and tries to
insert attr Y at the old index. That fails, so it splits again.

With the second split, the rebalance ends up placing the new attr in
the second new block - L4:0 - and this is where the code goes wrong.
What is does is it sets both the new and old block index to the
second new block. Hence it inserts attr Y at the right place (L4:0)
but overwrites the current location of the attr to replace that is
held in the new block index (currently L3:0). It over writes it with
L4:1 - the index we later assert fail on.

Hopefully this table will show this in a foramt that is a bit easier
to understand:

Split		old attr index		new attr index
		vanilla	patched		vanilla	patched
before 1st	L1:26	L1:26		N/A	N/A
after 1st	L3:0	L3:0		L1:26	L1:26
after 2nd	L4:0	L3:0		L4:1	L4:0
                ^^^^			^^^^
		wrong			wrong

The fix is surprisingly simple, for all this analysis - just stop
the rebalance on the out-of leaf case from overwriting the new attr
index - it's already correct for the double split case.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Mark Tinguely &lt;tinguely@sgi.com&gt;
Signed-off-by: Ben Myers &lt;bpm@sgi.com&gt;
</content>
</entry>
<entry>
<title>xfs: fix reading of wrapped log data</title>
<updated>2012-11-08T17:10:51Z</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2012-11-02T00:38:44Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=6ce377afd1755eae5c93410ca9a1121dfead7b87'/>
<id>urn:sha1:6ce377afd1755eae5c93410ca9a1121dfead7b87</id>
<content type='text'>
Commit 4439647 ("xfs: reset buffer pointers before freeing them") in
3.0-rc1 introduced a regression when recovering log buffers that
wrapped around the end of log. The second part of the log buffer at
the start of the physical log was being read into the header buffer
rather than the data buffer, and hence recovery was seeing garbage
in the data buffer when it got to the region of the log buffer that
was incorrectly read.

Cc: &lt;stable@vger.kernel.org&gt; # 3.0.x, 3.2.x, 3.4.x 3.6.x
Reported-by: Torsten Kaiser &lt;just.for.lkml@googlemail.com&gt;
Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Mark Tinguely &lt;tinguely@sgi.com&gt;
Signed-off-by: Ben Myers &lt;bpm@sgi.com&gt;
</content>
</entry>
</feed>
