<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/ext4, branch v3.0.9</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/fs/ext4?h=v3.0.9</id>
<link rel='self' href='https://git.amat.us/linux/atom/fs/ext4?h=v3.0.9'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2011-11-11T17:37:16Z</updated>
<entry>
<title>ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining</title>
<updated>2011-11-11T17:37:16Z</updated>
<author>
<name>Jiaying Zhang</name>
<email>jiayingz@google.com</email>
</author>
<published>2011-08-31T15:50:51Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=ef52f3936f9f5d770ea177e5c769e68af1701a90'/>
<id>urn:sha1:ef52f3936f9f5d770ea177e5c769e68af1701a90</id>
<content type='text'>
commit 8c0bec2151a47906bf779c6715a10ce04453ab77 upstream.

The i_mutex lock and flush_completed_IO() added by commit 2581fdc810
in ext4_evict_inode() causes lockdep complaining about potential
deadlock in several places.  In most/all of these LOCKDEP complaints
it looks like it's a false positive, since many of the potential
circular locking cases can't take place by the time the
ext4_evict_inode() is called; but since at the very least it may mask
real problems, we need to address this.

This change removes the flush_completed_IO() and i_mutex lock in
ext4_evict_inode().  Instead, we take a different approach to resolve
the software lockup that commit 2581fdc810 intends to fix.  Rather
than having ext4-dio-unwritten thread wait for grabing the i_mutex
lock of an inode, we use mutex_trylock() instead, and simply requeue
the work item if we fail to grab the inode's i_mutex lock.

This should speed up work queue processing in general and also
prevents the following deadlock scenario: During page fault,
shrink_icache_memory is called that in turn evicts another inode B.
Inode B has some pending io_end work so it calls ext4_ioend_wait()
that waits for inode B's i_ioend_count to become zero.  However, inode
B's ioend work was queued behind some of inode A's ioend work on the
same cpu's ext4-dio-unwritten workqueue.  As the ext4-dio-unwritten
thread on that cpu is processing inode A's ioend work, it tries to
grab inode A's i_mutex lock.  Since the i_mutex lock of inode A is
still hold before the page fault happened, we enter a deadlock.

Signed-off-by: Jiaying Zhang &lt;jiayingz@google.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>ext4: fix race in xattr block allocation path</title>
<updated>2011-11-11T17:36:34Z</updated>
<author>
<name>Eric Sandeen</name>
<email>sandeen@redhat.com</email>
</author>
<published>2011-10-29T14:15:35Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=628ee980d92cd25b8b794af59823fb93090500c3'/>
<id>urn:sha1:628ee980d92cd25b8b794af59823fb93090500c3</id>
<content type='text'>
commit 6d6a435190bdf2e04c9465cde5bdc3ac68cf11a4 upstream.

Ceph users reported that when using Ceph on ext4, the filesystem
would often become corrupted, containing inodes with incorrect
i_blocks counters.

I managed to reproduce this with a very hacked-up "streamtest"
binary from the Ceph tree.

Ceph is doing a lot of xattr writes, to out-of-inode blocks.
There is also another thread which does sync_file_range and close,
of the same files.  The problem appears to happen due to this race:

sync/flush thread               xattr-set thread
-----------------               ----------------

do_writepages                   ext4_xattr_set
ext4_da_writepages              ext4_xattr_set_handle
mpage_da_map_blocks             ext4_xattr_block_set
        set DELALLOC_RESERVE
                                ext4_new_meta_blocks
                                        ext4_mb_new_blocks
                                                if (!i_delalloc_reserved_flag)
                                                        vfs_dq_alloc_block
ext4_get_blocks
	down_write(i_data_sem)
        set i_delalloc_reserved_flag
	...
	up_write(i_data_sem)
                                        if (i_delalloc_reserved_flag)
                                                vfs_dq_alloc_block_nofail


In other words, the sync/flush thread pops in and sets
i_delalloc_reserved_flag on the inode, which makes the xattr thread
think that it's in a delalloc path in ext4_new_meta_blocks(),
and add the block for a second time, after already having added
it once in the !i_delalloc_reserved_flag case in ext4_mb_new_blocks

The real problem is that we shouldn't be using the DELALLOC_RESERVED
state flag, and instead we should be passing
EXT4_GET_BLOCKS_DELALLOC_RESERVE down to ext4_map_blocks() instead of
using an inode state flag.  We'll fix this for now with using
i_data_sem to prevent this race, but this is really not the right way
to fix things.

Signed-off-by: Eric Sandeen &lt;sandeen@redhat.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>ext4: call ext4_handle_dirty_metadata with correct inode in ext4_dx_add_entry</title>
<updated>2011-11-11T17:36:34Z</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2011-08-31T16:02:51Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3c607445bbf6ff07c755656df331069d753569ea'/>
<id>urn:sha1:3c607445bbf6ff07c755656df331069d753569ea</id>
<content type='text'>
commit 5930ea643805feb50a2f8383ae12eb6f10935e49 upstream.

ext4_dx_add_entry manipulates bh2 and frames[0].bh, which are two buffer_heads
that point to directory blocks assigned to the directory inode.  However, the
function calls ext4_handle_dirty_metadata with the inode of the file that's
being added to the directory, not the directory inode itself.  Therefore,
correct the code to dirty the directory buffers with the directory inode, not
the file inode.

Signed-off-by: Darrick J. Wong &lt;djwong@us.ibm.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>ext4: ext4_mkdir should dirty dir_block with newly created directory inode</title>
<updated>2011-11-11T17:36:34Z</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@us.ibm.com</email>
</author>
<published>2011-08-31T16:00:51Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=37915713a96e05b5d731b9457a0cf22ced00f36f'/>
<id>urn:sha1:37915713a96e05b5d731b9457a0cf22ced00f36f</id>
<content type='text'>
commit f9287c1f2d329f4d78a3bbc9cf0db0ebae6f146a upstream.

ext4_mkdir calls ext4_handle_dirty_metadata with dir_block and the inode "dir".
Unfortunately, dir_block belongs to the newly created directory (which is
"inode"), not the parent directory (which is "dir").  Fix the incorrect
association.

Signed-off-by: Darrick J. Wong &lt;djwong@us.ibm.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>ext4: ext4_rename should dirty dir_bh with the correct directory</title>
<updated>2011-11-11T17:36:33Z</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@us.ibm.com</email>
</author>
<published>2011-08-31T15:58:51Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a848dee39f54f1eb9d9b90ef172ec4e813815e21'/>
<id>urn:sha1:a848dee39f54f1eb9d9b90ef172ec4e813815e21</id>
<content type='text'>
commit bcaa992975041e40449be8c010c26192b8c8b409 upstream.

When ext4_rename performs a directory rename (move), dir_bh is a
buffer that is modified to update the '..' link in the directory being
moved (old_inode).  However, ext4_handle_dirty_metadata is called with
the old parent directory inode (old_dir) and dir_bh, which is
incorrect because dir_bh does not belong to the parent inode.  Fix
this error.

Signed-off-by: Darrick J. Wong &lt;djwong@us.ibm.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>ext2,ext3,ext4: don't inherit APPEND_FL or IMMUTABLE_FL for new inodes</title>
<updated>2011-11-11T17:36:32Z</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2011-08-31T15:54:51Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d24f405b711a4247f31358339dc1112ca659e6fe'/>
<id>urn:sha1:d24f405b711a4247f31358339dc1112ca659e6fe</id>
<content type='text'>
commit 1cd9f0976aa4606db8d6e3dc3edd0aca8019372a upstream.

This doesn't make much sense, and it exposes a bug in the kernel where
attempts to create a new file in an append-only directory using
O_CREAT will fail (but still leave a zero-length file).  This was
discovered when xfstests #79 was generalized so it could run on all
file systems.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>writeback: introduce .tagged_writepages for the WB_SYNC_NONE sync stage</title>
<updated>2011-10-03T18:40:43Z</updated>
<author>
<name>Wu Fengguang</name>
<email>fengguang.wu@intel.com</email>
</author>
<published>2010-06-06T16:38:15Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=ac693061b11c33d5a5c5ec1925de7abd3fcb0971'/>
<id>urn:sha1:ac693061b11c33d5a5c5ec1925de7abd3fcb0971</id>
<content type='text'>
commit 6e6938b6d3130305a5960c86b1a9b21e58cf6144 upstream.

sync(2) is performed in two stages: the WB_SYNC_NONE sync and the
WB_SYNC_ALL sync. Identify the first stage with .tagged_writepages and
do livelock prevention for it, too.

Jan's commit f446daaea9 ("mm: implement writeback livelock avoidance
using page tagging") is a partial fix in that it only fixed the
WB_SYNC_ALL phase livelock.

Although ext4 is tested to no longer livelock with commit f446daaea9,
it may due to some "redirty_tail() after pages_skipped" effect which
is by no means a guarantee for _all_ the file systems.

Note that writeback_inodes_sb() is called by not only sync(), they are
treated the same because the other callers also need livelock prevention.

Impact:  It changes the order in which pages/inodes are synced to disk.
Now in the WB_SYNC_NONE stage, it won't proceed to write the next inode
until finished with the current inode.

Acked-by: Jan Kara &lt;jack@suse.cz&gt;
CC: Dave Chinner &lt;david@fromorbit.com&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>ext4: fix nomblk_io_submit option so it correctly converts uninit blocks</title>
<updated>2011-08-29T20:29:11Z</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2011-08-13T16:58:21Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=45df4b8977852ea12d6ed19f6c87e6765f6c31e5'/>
<id>urn:sha1:45df4b8977852ea12d6ed19f6c87e6765f6c31e5</id>
<content type='text'>
commit 9dd75f1f1a02d656a11a7b9b9e6c2759b9c1e946 upstream.

Bug discovered by Jan Kara:

Finally, commit 1449032be17abb69116dbc393f67ceb8bd034f92 returned back
the old IO submission code but apparently it forgot to return the old
handling of uninitialized buffers so we unconditionnaly call
block_write_full_page() without specifying end_io function. So AFAICS
we never convert unwritten extents to written in some cases. For
example when I mount the fs as: mount -t ext4 -o
nomblk_io_submit,dioread_nolock /dev/ubdb /mnt and do
        int fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, 0600);
        char buf[1024];
        memset(buf, 'a', sizeof(buf));
        fallocate(fd, 0, 0, 16384);
        write(fd, buf, sizeof(buf));

I get a file full of zeros (after remounting the filesystem so that
pagecache is dropped) instead of seeing the first KB contain 'a's.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN.</title>
<updated>2011-08-29T20:29:11Z</updated>
<author>
<name>Tao Ma</name>
<email>boyu.mt@taobao.com</email>
</author>
<published>2011-08-13T16:30:59Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4eddd2a50f2548c6c83081fde8fdbc3de07626f6'/>
<id>urn:sha1:4eddd2a50f2548c6c83081fde8fdbc3de07626f6</id>
<content type='text'>
commit 32c80b32c053dc52712dedac5e4d0aa7c93fc353 upstream.

EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten
should be done simultaneously since ext4_end_io_nolock always clear
the flag and decrease the counter in the same time.

We don't increase i_aiodio_unwritten when setting
EXT4_IO_END_UNWRITTEN so it will go nagative and causes some process
to wait forever.

Part of the patch came from Eric in his e-mail, but it doesn't fix the
problem met by Michael actually.

http://marc.info/?l=linux-ext4&amp;m=131316851417460&amp;w=2

Reported-and-Tested-by: Michael Tokarev&lt;mjt@tls.msk.ru&gt;
Signed-off-by: Eric Sandeen &lt;sandeen@redhat.com&gt;
Signed-off-by: Tao Ma &lt;boyu.mt@taobao.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode</title>
<updated>2011-08-29T20:29:10Z</updated>
<author>
<name>Jiaying Zhang</name>
<email>jiayingz@google.com</email>
</author>
<published>2011-08-13T16:17:13Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=2526f368949bccda6e8ed1bf74a4e955e3af42af'/>
<id>urn:sha1:2526f368949bccda6e8ed1bf74a4e955e3af42af</id>
<content type='text'>
commit 2581fdc810889fdea97689cb62481201d579c796 upstream.

Flush inode's i_completed_io_list before calling ext4_io_wait to
prevent the following deadlock scenario: A page fault happens while
some process is writing inode A. During page fault,
shrink_icache_memory is called that in turn evicts another inode
B. Inode B has some pending io_end work so it calls ext4_ioend_wait()
that waits for inode B's i_ioend_count to become zero. However, inode
B's ioend work was queued behind some of inode A's ioend work on the
same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
thread on that cpu is processing inode A's ioend work, it tries to
grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
still hold before the page fault happened, we enter a deadlock.

Also moves ext4_flush_completed_IO and ext4_ioend_wait from
ext4_destroy_inode() to ext4_evict_inode(). During inode deleteion,
ext4_evict_inode() is called before ext4_destroy_inode() and in
ext4_evict_inode(), we may call ext4_truncate() without holding
i_mutex lock. As a result, there is a race between flush_completed_IO
that is called from ext4_ext_truncate() and ext4_end_io_work, which
may cause corruption on an io_end structure. This change moves
ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode()
to ext4_evict_inode() to resolve the race between ext4_truncate() and
ext4_end_io_work during inode deletion.

Signed-off-by: Jiaying Zhang &lt;jiayingz@google.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
</feed>
