xfs: ensure WB_SYNC_ALL writeback handles partial pages correctly

commit 0d085a529b427d97710e6a41f8a4f23e1757cd12 upstream. XFS has been having trouble with stray delayed allocation extents beyond EOF for a long time. Recent changes to the collapse range code has triggered erroneous EBUSY errors on page invalidtion for block size smaller than page size filesystems. These have been caused by dirty buffers beyond EOF on a partial page which do not get written to disk during a sync. The issue is that write-ahead in xfs_cluster_write() finds such a partial page and handles it by leaving the page dirty but pushing it into a writeback state. This used to work just fine, as the write_cache_pages() code would then find the dirty partial page in the next mapping tree lookup as the dirty tag is still set. Unfortunately, when we moved to a mark and sweep approach to writeback to fix other writeback sync issues, we broken this. THe act of marking the page as under writeback now clears the TOWRITE tag in the radix tree, even though the page is still dirty. This causes the TOWRITE tag to be cleared, and hence the next lookup on the mapping tree does not find the dirty partial page and so doesn't try to write it again. This same writeback bug was found recently in ext4 and fixed in commit 1c8349a ("ext4: fix data integrity sync in ordered mode") without communication to the wider filesystem community. We can use exactly the same fix here so the TOWRITE flag is not cleared on partial page writes. cc: stable@vger.kernel.org # dependent on 1c8349a17137b93f0a83f276c764a6df1b9a116e Root-cause-found-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
author: Dave Chinner <dchinner@redhat.com> 2014-09-23 15:36:27 +1000
committer: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 2014-10-30 09:40:18 -0700
commit: e81cffc4efaae6fdceb377e9b83bab05be663931 (patch)
tree: 6e2b9ba97450dd3378a0d6857aa35a2539d6890b
parent: 0419937b584f3a744b0a9bca633579b2fc346113 (diff)
1 files changed, 14 insertions, 2 deletions
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 02614349690..4ff074bc2a7 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -434,10 +434,22 @@ xfs_start_page_writeback(
 {
 	ASSERT(PageLocked(page));
 	ASSERT(!PageWriteback(page));
-	if (clear_dirty)
+
+	/*
+	 * if the page was not fully cleaned, we need to ensure that the higher
+	 * layers come back to it correctly. That means we need to keep the page
+	 * dirty, and for WB_SYNC_ALL writeback we need to ensure the
+	 * PAGECACHE_TAG_TOWRITE index mark is not removed so another attempt to
+	 * write this page in this writeback sweep will be made.
+	 */
+	if (clear_dirty) {
 		clear_page_dirty_for_io(page);
-	set_page_writeback(page);
+		set_page_writeback(page);
+	} else
+		set_page_writeback_keepwrite(page);
+
 	unlock_page(page);
+
 	/* If no buffers on the page are to be written, finish it here */
 	if (!buffers)
 		end_page_writeback(page);
author	Dave Chinner <dchinner@redhat.com>	2014-09-23 15:36:27 +1000
committer	Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-10-30 09:40:18 -0700
commit	e81cffc4efaae6fdceb377e9b83bab05be663931 (patch)
tree	6e2b9ba97450dd3378a0d6857aa35a2539d6890b
parent	0419937b584f3a744b0a9bca633579b2fc346113 (diff)