linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2009-06-11	ext4: fix bb_prealloc_list corruption due to wrong group locking	Eric Sandeen
	(cherry-picked from commit d33a1976fbee1ee321d6f014333d8f03a39d526c) This is for Red Hat bug 490026: EXT4 panic, list corruption in ext4_mb_new_inode_pa ext4_lock_group(sb, group) is supposed to protect this list for each group, and a common code flow to remove an album is like this: ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL); ext4_lock_group(sb, grp); list_del(&pa->pa_group_list); ext4_unlock_group(sb, grp); so it's critical that we get the right group number back for this prealloc context, to lock the right group (the one associated with this pa) and prevent concurrent list manipulation. however, ext4_mb_put_pa() passes in (pa->pa_pstart - 1) with a comment, "-1 is to protect from crossing allocation group". This makes sense for the group_pa, where pa_pstart is advanced by the length which has been used (in ext4_mb_release_context()), and when the entire length has been used, pa_pstart has been advanced to the first block of the next group. However, for inode_pa, pa_pstart is never advanced; it's just set once to the first block in the group and not moved after that. So in this case, if we subtract one in ext4_mb_put_pa(), we are actually locking the previous group, and opening the race with the other threads which do not subtract off the extra block. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	ext4: fix bogus BUG_ONs in in mballoc code	Eric Sandeen
	(cherry picked from commit 8d03c7a0c550e7ab24cadcef5e66656bfadec8b9) Thiemo Nagel reported that: # dd if=/dev/zero of=image.ext4 bs=1M count=2 # mkfs.ext4 -v -F -b 1024 -m 0 -g 512 -G 4 -I 128 -N 1 \ -O large_file,dir_index,flex_bg,extent,sparse_super image.ext4 # mount -o loop image.ext4 mnt/ # dd if=/dev/zero of=mnt/file oopsed, with a BUG_ON in ext4_mb_normalize_request because size == EXT4_BLOCKS_PER_GROUP It appears to me (esp. after talking to Andreas) that the BUG_ON is bogus; a request of exactly EXT4_BLOCKS_PER_GROUP should be allowed, though larger sizes do indicate a problem. Fix that an another (apparently rare) codepath with a similar check. Reported-by: Thiemo Nagel <thiemo.nagel@ph.tum.de> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	ext4: Print the find_group_flex() warning only once	Theodore Ts'o
	(cherry picked from commit 2842c3b5449f31470b61db716f1926b594fb6156) This is a short-term warning, and even printk_ratelimit() can result in too much noise in system logs. So only print it once as a warning. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	ext4: fix header check in ext4_ext_search_right() for deep extent trees.	Eric Sandeen
	(cherry picked from commit 395a87bfefbc400011417e9eaae33169f9f036c0) The ext4_ext_search_right() function is confusing; it uses a "depth" variable which is 0 at the root and maximum at the leaves, but the on-disk metadata uses a "depth" (actually eh_depth) which is opposite: maximum at the root, and 0 at the leaves. The ext4_ext_check_header() function is given a depth and checks the header agaisnt that depth; it expects the on-disk semantics, but we are giving it the opposite in the while loop in this function. We should be giving it the on-disk notion of "depth" which we can get from (p_depth - depth) - and if you look, the last (more commonly hit) call to ext4_ext_check_header() does just this. Sending in the wrong depth results in (incorrect) messages about corruption: EXT4-fs error (device sdb1): ext4_ext_search_right: bad header in inode #2621457: unexpected eh_depth - magic f30a, entries 340, max 340(0), depth 1(2) http://bugzilla.kernel.org/show_bug.cgi?id=12821 Reported-by: David Dindorp <ddi@dubex.dk> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	ext4: fix ext4_free_inode() vs. ext4_claim_inode() race	Eric Sandeen
	(cherry picked from commit 7ce9d5d1f3c8736511daa413c64985a05b2feee3) I was seeing fsck errors on inode bitmaps after a 4 thread dbench run on a 4 cpu machine: Inode bitmap differences: -50736 -(50752--50753) etc... I believe that this is because ext4_free_inode() uses atomic bitops, and although ext4_new_inode() used to also use atomic bitops for synchronization, commit 393418676a7602e1d7d3f6e560159c65c8cbd50e changed this to use the sb_bgl_lock, so that we could also synchronize against read_inode_bitmap and initialization of uninit inode tables. However, that change left ext4_free_inode using atomic bitops, which I think leaves no synchronization between setting & unsetting bits in the inode table. The below patch fixes it for me, although I wonder if we're getting at all heavy-handed with this spinlock... Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	nfs: Fix NFS v4 client handling of MAY_EXEC in nfs_permission.	Frank Filz
	commit 7ee2cb7f32b299c2b06a31fde155457203e4b7dd upstream. The problem is that permission checking is skipped if atomic open is possible, but when exec opens a file, it just opens it O_READONLY which means EXEC permission will not be checked at that time. This problem is observed by the following sequence (executed as root): mount -t nfs4 server:/ /mnt4 echo "ls" >/mnt4/foo chmod 744 /mnt4/foo su guest -c "mnt4/foo" Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Eugene Teo <eugeneteo@kernel.sg> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	ocfs2: fix i_mutex locking in ocfs2_splice_to_file()	Miklos Szeredi
	commit 328eaaba4e41a04c1dc4679d65bea3fee4349d86 upstream. Rearrange locking of i_mutex on destination and call to ocfs2_rw_lock() so locks are only held while buffers are copied with the pipe_to_file() actor, and not while waiting for more data on the pipe. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	splice: fix i_mutex locking in generic_splice_write()	Miklos Szeredi
	commit eb443e5a25d43996deb62b9bcee1a4ce5dea2ead upstream. Rearrange locking of i_mutex on destination so it's only held while buffers are copied with the pipe_to_file() actor, and not while waiting for more data on the pipe. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	splice: remove i_mutex locking in splice_from_pipe()	Miklos Szeredi
	commit 2933970b960223076d6affcf7a77e2bc546b8102 upstream. splice_from_pipe() is only called from two places: - generic_splice_sendpage() - splice_write_null() Neither of these require i_mutex to be taken on the destination inode. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	splice: split up __splice_from_pipe()	Miklos Szeredi
	commit b3c2d2ddd63944ef2a1e4a43077b602288107e01 upstream. Split up __splice_from_pipe() into four helper functions: splice_from_pipe_begin() splice_from_pipe_next() splice_from_pipe_feed() splice_from_pipe_end() splice_from_pipe_next() will wait (if necessary) for more buffers to be added to the pipe. splice_from_pipe_feed() will feed the buffers to the supplied actor and return when there's no more data available (or if all of the requested data has been copied). This is necessary so that implementations can do locking around the non-waiting splice_from_pipe_feed(). This patch should not cause any change in behavior. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	NFS: Fix the notifications when renaming onto an existing file	Trond Myklebust
	commit b1e4adf4ea41bb8b5a7bfc1a7001f137e65495df upstream. NFS appears to be returning an unnecessary "delete" notification when we're doing an atomic rename. See http://bugzilla.gnome.org/show_bug.cgi?id=575684 The fix is to get rid of the redundant call to d_delete(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	nfsd4: check for negative dentry before use in nfsv4 readdir	J. Bruce Fields
	commit b2c0cea6b1cb210e962f07047df602875564069e upstream. After 2f9092e1020246168b1309b35e085ecd7ff9ff72 "Fix i_mutex vs. readdir handling in nfsd" (and 14f7dd63 "Copy XFS readdir hack into nfsd code"), an entry may be removed between the first mutex_unlock and the second mutex_lock. In this case, lookup_one_len() will return a negative dentry. Check for this case to avoid a NULL dereference. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Reviewed-by: J. R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	epoll: fix size check in epoll_create()	Davide Libenzi
	commit bfe3891a5f5d3b78146a45f40e435d14f5ae39dd upstream. Fix a size check WRT the manual pages. This was inadvertently broken by commit 9fe5ad9c8cef9ad5873d8ee55d1cf00d9b607df0 ("flag parameters add-on: remove epoll_create size param"). Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: <Hiroyuki.Mach@gmail.com> Cc: rohit verma <rohit.170309@gmail.com> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	cifs: Fix unicode string area word alignment in session setup	Jeff Layton
	commit 27b87fe52baba0a55e9723030e76fce94fabcea4 refreshed. cifs: fix unicode string area word alignment in session setup The handling of unicode string area alignment is wrong. decode_unicode_ssetup improperly assumes that it will always be preceded by a pad byte. This isn't the case if the string area is already word-aligned. This problem, combined with the bad buffer sizing for the serverDomain string can cause memory corruption. The bad alignment can make it so that the alignment of the characters is off. This can make them translate to characters that are greater than 2 bytes each. If this happens we can overflow the allocation. Fix this by fixing the alignment in CIFS_SessSetup instead so we can verify it against the head of the response. Also, clean up the workaround for improperly terminated strings by checking for a odd-length unicode buffers and then forcibly terminating them. Finally, resize the buffer for serverDomain. Now that we've fixed the alignment, it's probably fine, but a malicious server could overflow it. A better solution for handling these strings is still needed, but this should be a suitable bandaid. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Cc: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	cifs: Fix buffer size in cifs_convertUCSpath	Suresh Jayaraman
	Relevant commits 7fabf0c9479fef9fdb9528a5fbdb1cb744a744a4 and f58841666bc22e827ca0dcef7b71c7bc2758ce82. The upstream commits adds cifs_from_ucs2 that includes functionality of cifs_convertUCSpath and does cleanup. Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Acked-by: Steve French <sfrench@us.ibm.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	cifs: Fix incorrect destination buffer size in cifs_strncpy_to_host	Suresh Jayaraman
	Relevant commits 968460ebd8006d55661dec0fb86712b40d71c413 and 066ce6899484d9026acd6ba3a8dbbedb33d7ae1b. Minimal hunks to fix buffer size and fix an existing problem pointed out by Guenter Kukuk that length of src is used for NULL termination of dst. cifs: Rename cifs_strncpy_to_host and fix buffer size There is a possibility for the path_name and node_name buffers to overflow if they contain charcters that are >2 bytes in the local charset. Resize the buffer allocation so to avoid this possibility. Also, as pointed out by Jeff Layton, it would be appropriate to rename the function to cifs_strlcpy_to_host to reflect the fact that the copied string is always NULL terminated. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	cifs: Increase size of tmp_buf in cifs_readdir to avoid potential overflows	Suresh Jayaraman
	Commit 7b0c8fcff47a885743125dd843db64af41af5a61 refreshed and use a #define from commit f58841666bc22e827ca0dcef7b71c7bc2758ce82. cifs: Increase size of tmp_buf in cifs_readdir to avoid potential overflows Increase size of tmp_buf to possible maximum to avoid potential overflows. Also moved UNICODE_NAME_MAX definition so that it can be used elsewhere. Pointed-out-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	cifs: Fix buffer size for tcon->nativeFileSystem field	Jeff Layton
	Commit f083def68f84b04fe3f97312498911afce79609e refreshed. cifs: fix buffer size for tcon->nativeFileSystem field The buffer for this was resized recently to fix a bug. It's still possible however that a malicious server could overflow this field by sending characters in it that are >2 bytes in the local charset. Double the size of the buffer to account for this possibility. Also get rid of some really strange and seemingly pointless NULL termination. It's NULL terminating the string in the source buffer, but by the time that happens, we've already copied the string. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Cc: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	NFS: Close page_mkwrite() races	Trond Myklebust
	commit 7fdf523067666b0eaff330f362401ee50ce187c4 upstream Follow up to Nick Piggin's patches to ensure that nfs_vm_page_mkwrite returns with the page lock held, and sets the VM_FAULT_LOCKED flag. See http://bugzilla.kernel.org/show_bug.cgi?id=12913 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	NFS: Fix the return value in nfs_page_mkwrite()	Trond Myklebust
	commit 2b2ec7554cf7ec5e4412f89a5af6abe8ce950700 upstream Commit c2ec175c39f62949438354f603f4aa170846aabb ("mm: page_mkwrite change prototype to match fault") exposed a bug in the NFS implementation of page_mkwrite. We should be returning 0 on success... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	GFS2: Fix page_mkwrite() return code	Steven Whitehouse
	commit e56985da455b9dc0591b8cb2006cc94b6f4fb0f4 upstream This allows for the possibility of returning VM_FAULT_OOM as well as VM_FAULT_SIGBUS. This ensures that the correct action is taken. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	mm: close page_mkwrite races	Nick Piggin
	commit b827e496c893de0c0f142abfaeb8730a2fd6b37f upstream mm: close page_mkwrite races Change page_mkwrite to allow implementations to return with the page locked, and also change it's callers (in page fault paths) to hold the lock until the page is marked dirty. This allows the filesystem to have full control of page dirtying events coming from the VM. Rather than simply hold the page locked over the page_mkwrite call, we call page_mkwrite with the page unlocked and allow callers to return with it locked, so filesystems can avoid LOR conditions with page lock. The problem with the current scheme is this: a filesystem that wants to associate some metadata with a page as long as the page is dirty, will perform this manipulation in its ->page_mkwrite. It currently then must return with the page unlocked and may not hold any other locks (according to existing page_mkwrite convention). In this window, the VM could write out the page, clearing page-dirty. The filesystem has no good way to detect that a dirty pte is about to be attached, so it will happily write out the page, at which point, the filesystem may manipulate the metadata to reflect that the page is no longer dirty. It is not always possible to perform the required metadata manipulation in ->set_page_dirty, because that function cannot block or fail. The filesystem may need to allocate some data structure, for example. And the VM cannot mark the pte dirty before page_mkwrite, because page_mkwrite is allowed to fail, so we must not allow any window where the page could be written to if page_mkwrite does fail. This solution of holding the page locked over the 3 critical operations (page_mkwrite, setting the pte dirty, and finally setting the page dirty) closes out races nicely, preventing page cleaning for writeout being initiated in that window. This provides the filesystem with a strong synchronisation against the VM here. - Sage needs this race closed for ceph filesystem. - Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913). - I need it for fsblock. - I suspect other filesystems may need it too (eg. btrfs). - I have converted buffer.c to the new locking. Even simple block allocation under dirty pages might be susceptible to i_size changing under partial page at the end of file (we also have a buffer.c-side problem here, but it cannot be fixed properly without this patch). - Other filesystems (eg. NFS, maybe btrfs) will need to change their page_mkwrite functions themselves. [ This also moves page_mkwrite another step closer to fault, which should eventually allow page_mkwrite to be moved into ->fault, and thus avoiding a filesystem calldown and page lock/unlock cycle in __do_fault. ] [akpm@linux-foundation.org: fix derefs of NULL ->mapping] Cc: Sage Weil <sage@newdream.net> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	fs: fix page_mkwrite error cases in core code and btrfs	Nick Piggin
	commit 56a76f8275c379ed73c8a43cfa1dfa2f5e9cfa19 upstream fs: fix page_mkwrite error cases in core code and btrfs page_mkwrite is called with neither the page lock nor the ptl held. This means a page can be concurrently truncated or invalidated out from underneath it. Callers are supposed to prevent truncate races themselves, however previously the only thing they can do in case they hit one is to raise a SIGBUS. A sigbus is wrong for the case that the page has been invalidated or truncated within i_size (eg. hole punched). Callers may also have to perform memory allocations in this path, where again, SIGBUS would be wrong. The previous patch ("mm: page_mkwrite change prototype to match fault") made it possible to properly specify errors. Convert the generic buffer.c code and btrfs to return sane error values (in the case of page removed from pagecache, VM_FAULT_NOPAGE will cause the fault handler to exit without doing anything, and the fault will be retried properly). This fixes core code, and converts btrfs as a template/example. All other filesystems defining their own page_mkwrite should be fixed in a similar manner. Acked-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	mm: page_mkwrite change prototype to match fault	Nick Piggin
	commit c2ec175c39f62949438354f603f4aa170846aabb upstream mm: page_mkwrite change prototype to match fault Change the page_mkwrite prototype to take a struct vm_fault, and return VM_FAULT_xxx flags. There should be no functional change. This makes it possible to return much more detailed error information to the VM (and also can provide more information eg. virtual_address to the driver, which might be important in some special cases). This is required for a subsequent fix. And will also make it easier to merge page_mkwrite() with fault() in future. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Felix Blyakher <felixb@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	dup2: Fix return value with oldfd == newfd and invalid fd	Jeff Mahoney
	commit 2b79bc4f7ebbd5af3c8b867968f9f15602d5f802 upstream. The return value of dup2 when oldfd == newfd and the fd isn't valid is not getting properly sign extended. We end up with 4294967287 instead of -EBADF. I've reproduced this on SLE11 (2.6.27.21), openSUSE Factory (2.6.29-rc5), and Ubuntu 9.04 (2.6.28). This patch uses a signed int for the error value so it is properly extended. Commit 6c5d0512a091480c9f981162227fdb1c9d70e555 introduced this regression. Reported-by: Jiri Dluhos <jdluhos@novell.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-08	proc: avoid information leaks to non-privileged processes	Jake Edge
	commit f83ce3e6b02d5e48b3a43b001390e2b58820389d upstream. By using the same test as is used for /proc/pid/maps and /proc/pid/smaps, only allow processes that can ptrace() a given process to see information that might be used to bypass address space layout randomization (ASLR). These include eip, esp, wchan, and start_stack in /proc/pid/stat as well as the non-symbolic output from /proc/pid/wchan. ASLR can be bypassed by sampling eip as shown by the proof-of-concept code at http://code.google.com/p/fuzzyaslr/ As part of a presentation (http://www.cr0.org/paper/to-jt-linux-alsr-leak.pdf) esp and wchan were also noted as possibly usable information leaks as well. The start_stack address also leaks potentially useful information. Cc: Stable Team <stable@kernel.org> Signed-off-by: Jake Edge <jake@lwn.net> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-08	pagemap: require aligned-length, non-null reads of /proc/pid/pagemap	Vitaly Mayatskikh
	commit 0816178638c15ce5472d39d771a96860dff4141a upstream. The intention of commit aae8679b0ebcaa92f99c1c3cb0cd651594a43915 ("pagemap: fix bug in add_to_pagemap, require aligned-length reads of /proc/pid/pagemap") was to force reads of /proc/pid/pagemap to be a multiple of 8 bytes, but now it allows to read 0 bytes, which actually puts some data to user's buffer. According to POSIX, if count is zero, read() should return zero and has no other results. Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com> Cc: Thomas Tuttle <ttuttle@google.com> Acked-by: Matt Mackall <mpm@selenic.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-02	fs core fixes	Hugh Dickins
	Please add the following 4 commits to 2.6.27-stable and 2.6.28-stable. However, there has been a lot of change here between 2.6.28 and 2.6.29: in particular, fs/exec.c's unsafe_exec() grew into the more complicated check_unsafe_exec(). So applying the original patches gives too many rejects: at the bottom is the diffstat and the combined patch required. 1 Commit: 53e9309e01277ec99c38e84e0ca16921287cf470 Author: Hugh Dickins <hugh@veritas.com> Date: Sat, 28 Mar 2009 23:16:03 +0000 (+0000) Subject: compat_do_execve should unshare_files 2 Commit: e426b64c412aaa3e9eb3e4b261dc5be0d5a83e78 Author: Hugh Dickins <hugh@veritas.com> Date: Sat, 28 Mar 2009 23:20:19 +0000 (+0000) Subject: fix setuid sometimes doesn't 3 Commit: 7c2c7d993044cddc5010f6f429b100c63bc7dffb Author: Hugh Dickins <hugh@veritas.com> Date: Sat, 28 Mar 2009 23:21:27 +0000 (+0000) Subject: fix setuid sometimes wouldn't 4 Commit: f1191b50ec11c8e2ca766d6d99eb5bb9d2c084a3 Author: Al Viro <viro@zeniv.linux.org.uk> Date: Mon, 30 Mar 2009 11:35:18 +0000 (-0400) Subject: check_unsafe_exec() doesn't care about signal handlers sharing Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-02	block: revert part of 18ce3751ccd488c78d3827e9f6bf54e6322676fb	Jens Axboe
	commit 78f707bfc723552e8309b7c38a8d0cc51012e813 upstream. The above commit added WRITE_SYNC and switched various places to using that for committing writes that will be waited upon immediately after submission. However, this causes a performance regression with AS and CFQ for ext3 at least, since sync_dirty_buffer() will submit some writes with WRITE_SYNC while ext3 has sumitted others dependent writes without the sync flag set. This causes excessive anticipation/idling in the IO scheduler because sync and async writes get interleaved, causing a big performance regression for the below test case (which is meant to simulate sqlite like behaviour). ---- test case ---- int main(int argc, char *argv) { int fdes, i; FILE fp; struct timeval start; struct timeval end; struct timeval res; gettimeofday(&start, NULL); for (i=0; i<ROWS; i++) { fp = fopen("test_file", "a"); fprintf(fp, "Some Text Data\n"); fdes = fileno(fp); fsync(fdes); fclose(fp); } gettimeofday(&end, NULL); timersub(&end, &start, &res); fprintf(stdout, "time to write %d lines is %ld(msec)\n", ROWS, (res.tv_sec*1000000 + res.tv_usec)/1000); return 0; } ------------------- Thanks to Sean.White@APCC.com for tracking down this performance regression and providing a test case. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-02	hugetlbfs: return negative error code for bad mount option	Akinobu Mita
	upstream commit: c12ddba09394c60e1120e6997794fa6ed52da884 This fixes the following BUG: # mount -o size=MM -t hugetlbfs none /huge hugetlbfs: Bad value 'MM' for mount option 'size=MM' ------------[ cut here ]------------ kernel BUG at fs/super.c:996! Due to BUG_ON(!mnt->mnt_sb); in vfs_kern_mount(). Also, remove unused #include <linux/quotaops.h> Cc: William Irwin <wli@holomorphy.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-02	splice: fix deadlock in splicing to file	Miklos Szeredi
	upstream commit: 7bfac9ecf0585962fe13584f5cf526d8c8e76f17 There's a possible deadlock in generic_file_splice_write(), splice_from_pipe() and ocfs2_file_splice_write(): - task A calls generic_file_splice_write() - this calls inode_double_lock(), which locks i_mutex on both pipe->inode and target inode - ordering depends on inode pointers, can happen that pipe->inode is locked first - __splice_from_pipe() needs more data, calls pipe_wait() - this releases lock on pipe->inode, goes to interruptible sleep - task B calls generic_file_splice_write(), similarly to the first - this locks pipe->inode, then tries to lock inode, but that is already held by task A - task A is interrupted, it tries to lock pipe->inode, but fails, as it is already held by task B - ABBA deadlock Fix this by explicitly ordering locks: the outer lock must be on target inode and the inner lock (which is later unlocked and relocked) must be on pipe->inode. This is OK, pipe inodes and target inodes form two nonoverlapping sets, generic_file_splice_write() and friends are not called with a target which is a pipe. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-02	vfs: skip I_CLEAR state inodes	Wu Fengguang
	upstream commit: b6fac63cc1f52ec27f29fe6c6c8494a2ffac33fd clear_inode() will switch inode state from I_FREEING to I_CLEAR, and do so _outside_ of inode_lock. So any I_FREEING testing is incomplete without a coupled testing of I_CLEAR. So add I_CLEAR tests to drop_pagecache_sb(), generic_sync_sb_inodes() and add_dquot_ref(). Masayoshi MIZUMA discovered the bug in drop_pagecache_sb() and Jan Kara reminds fixing the other two cases. Masayoshi MIZUMA has a nice panic flow: ===================================================================== [process A] \| [process B] \| \| \| prune_icache() \| drop_pagecache() \| spin_lock(&inode_lock) \| drop_pagecache_sb() \| inode->i_state \|= I_FREEING; \| \| \| spin_unlock(&inode_lock) \| V \| \| \| spin_lock(&inode_lock) \| V \| \| \| dispose_list() \| \| \| list_del() \| \| \| clear_inode() \| \| \| inode->i_state = I_CLEAR \| \| \| \| \| V \| \| \| if (inode->i_state & (I_FREEING\|I_WILL_FREE)) \| \| \| continue; <==== NOT MATCH \| \| \| \| \| \| (DANGER from here on! Accessing disposing inode!) \| \| \| \| \| \| __iget() \| \| \| list_move() <===== PANIC on poisoned list !! V V \| (time) ===================================================================== Reported-by: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [chrisw: backport to 2.6.29] Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-02	CIFS: Fix memory overwrite when saving nativeFileSystem field during mount	Steve French
	upstream commit: b363b3304bcf68c4541683b2eff70b29f0446a5b CIFS can allocate a few bytes to little for the nativeFileSystem field during tree connect response processing during mount. This can result in a "Redzone overwritten" message to be logged. Signed-off-by: Sridhar Vinay <vinaysridhar@in.ibm.com> Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com> CC: Stable <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com> [chrisw: minor backport to CHANGES file] Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-02	cifs: fix buffer format byte on NT Rename/hardlink	Jeff Layton
	upstream commit: fcc7c09d94be7b75c9ea2beb22d0fae191c6b4b9 Discovered at Connnectathon 2009... The buffer format byte and the pad are transposed in NT_RENAME calls (which are used to set hardlinks). Most servers seem to ignore this fact, but NetApp filers throw back an error due to this problem. This patch fixes it. CC: Stable <stable@kernel.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-23	eventfd: remove fput() call from possible IRQ context	Davide Libenzi
	commit 87c3a86e1c220121d0ced59d1a71e78ed9abc6dd upstream. Remove a source of fput() call from inside IRQ context. Myself, like Eric, wasn't able to reproduce an fput() call from IRQ context, but Jeff said he was able to, with the attached test program. Independently from this, the bug is conceptually there, so we might be better off fixing it. This patch adds an optimization similar to the one we already do on ->ki_filp, on ->ki_eventfd. Playing with ->f_count directly is not pretty in general, but the alternative here would be to add a brand new delayed fput() infrastructure, that I'm not sure is worth it. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Cc: Zach Brown <zach.brown@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-23	NFSD: provide encode routine for OP_OPENATTR	Benny Halevy
	commit 84f09f46b4ee9e4e9b6381f8af31817516d2091b upstream. Although this operation is unsupported by our implementation we still need to provide an encode routine for it to merely encode its (error) status back in the compound reply. Thanks for Bill Baker at sun.com for testing with the Sun OpenSolaris' client, finding, and reporting this bug at Connectathon 2009. This bug was introduced in 2.6.27 Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	ext4: Fix deadlock in ext4_write_begin() and ext4_da_write_begin()	Jan Kara
	(cherry picked from commit ebd3610b110bbb18ea6f9f2aeed1e1068c537227) Functions ext4_write_begin() and ext4_da_write_begin() call grab_cache_page_write_begin() without AOP_FLAG_NOFS. Thus it can happen that page reclaim is triggered in that function and it recurses back into the filesystem (or some other filesystem). But this can lead to various problems as a transaction is already started at that point. Add the necessary flag. http://bugzilla.kernel.org/show_bug.cgi?id=11688 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	ext4: Add fallback for find_group_flex	Theodore Ts'o
	(cherry picked from commit 05bf9e839d9de4e8a094274a0a2fd07beb47eaf1) This is a workaround for find_group_flex() which badly needs to be replaced. One of its problems (besides ignoring the Orlov algorithm) is that it is a bit hyperactive about returning failure under suspicious circumstances. This can lead to spurious ENOSPC failures even when there are inodes still available. Work around this for now by retrying the search using find_group_other() if find_group_flex() returns -1. If find_group_other() succeeds when find_group_flex() has failed, log a warning message. A better block/inode allocator that will fix this problem for real has been queued up for the next merge window. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	ext4: Fix NULL dereference in ext4_ext_migrate()'s error handling	Dan Carpenter
	(cherry picked from commit 090542641de833c6f756895fc2f139f046e298f9) This was found through a code checker (http://repo.or.cz/w/smatch.git/). It looks like you might be able to trigger the error by trying to migrate a readonly file system. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	ext4: Initialize preallocation list_head's properly	Aneesh Kumar K.V
	(cherry picked from commit d794bf8e0936dce45104565cd48c571061f4c1e3) When creating a new ext4_prealloc_space structure, we have to initialize its list_head pointers before we add them to any prealloc lists. Otherwise, with list debug enabled, we will get list corruption warnings. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	ext4: Fix lockdep warning	Aneesh Kumar K.V
	(cherry picked from commit ba4439165f0f0d25b2fe065cf0c1ff8130b802eb) We should not call ext4_mb_add_n_trim while holding alloc_semp. ============================================= [ INFO: possible recursive locking detected ] 2.6.29-rc4-git1-dirty #124 --------------------------------------------- ffsb/3116 is trying to acquire lock: (&meta_group_info[i]->alloc_sem){----}, at: [<ffffffff8035a6e8>] ext4_mb_load_buddy+0xd2/0x343 but task is already holding lock: (&meta_group_info[i]->alloc_sem){----}, at: [<ffffffff8035a6e8>] ext4_mb_load_buddy+0xd2/0x343 http://bugzilla.kernel.org/show_bug.cgi?id=12672 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	ext4: Fix to read empty directory blocks correctly in 64k	Wei Yongjun
	(cherry picked from commit 7be2baaa0322c59ba888aa5260a8c130666acd41) The rec_len field in the directory entry is 16 bits, so there was a problem representing rec_len for filesystems with a 64k block size in the case where the directory entry takes the entire 64k block. Unfortunately, there were two schemes that were proposed; one where all zeros meant 65536 and one where all ones (65535) meant 65536. E2fsprogs used 0, whereas the kernel used 65535. Oops. Fortunately this case happens extremely rarely, with the most common case being the lost+found directory, created by mke2fs. So we will be liberal in what we accept, and accept both encodings, but we will continue to encode 65536 as 65535. This will require a change in e2fsprogs, but with fortunately ext4 filesystems normally have the dir_index feature enabled, which precludes having a completely empty directory block. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate()	Jan Kara
	(cherry picked from commit 7f5aa215088b817add9c71914b83650bdd49f8a9) If we race with commit code setting i_transaction to NULL, we could possibly dereference it. Proper locking requires the journal pointer (to access journal->j_list_lock), which we don't have. So we have to change the prototype of the function so that filesystem passes us the journal pointer. Also add a more detailed comment about why the function jbd2_journal_begin_ordered_truncate() does what it does and how it should be used. Thanks to Dan Carpenter <error27@gmail.com> for pointing to the suspitious code. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Joel Becker <joel.becker@oracle.com> CC: linux-ext4@vger.kernel.org CC: mfasheh@suse.de CC: Dan Carpenter <error27@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	Revert "ext4: wait on all pending commits in ext4_sync_fs()"	Jan Kara
	(cherry picked from commit 9eddacf9e9c03578ef2c07c9534423e823d677f8) This undoes commit 14ce0cb411c88681ab8f3a4c9caa7f42e97a3184. Since jbd2_journal_start_commit() is now fixed to return 1 when we started a transaction commit, there's some transaction waiting to be committed or there's a transaction already committing, we don't need to call ext4_force_commit() in ext4_sync_fs(). Furthermore ext4_force_commit() can unnecessarily create sync transaction which is expensive so it's worthwhile to remove it when we can. http://bugzilla.kernel.org/show_bug.cgi?id=12224 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Eric Sandeen <sandeen@redhat.com> Cc: linux-ext4@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	jbd2: Fix return value of jbd2_journal_start_commit()	Jan Kara
	(cherry picked from commit c88ccea3143975294f5a52097546bcbb75975f52) The function jbd2_journal_start_commit() returns 1 if either a transaction is committing or the function has queued a transaction commit. But it returns 0 if we raced with somebody queueing the transaction commit as well. This resulted in ext4_sync_fs() not functioning correctly (description from Arthur Jones): In the case of a data=ordered umount with pending long symlinks which are delayed due to a long list of other I/O on the backing block device, this causes the buffer associated with the long symlinks to not be moved to the inode dirty list in the second phase of fsync_super. Then, before they can be dirtied again, kjournald exits, seeing the UMOUNT flag and the dirty pages are never written to the backing block device, causing long symlink corruption and exposing new or previously freed block data to userspace. This can be reproduced with a script created by Eric Sandeen <sandeen@redhat.com>: #!/bin/bash umount /mnt/test2 mount /dev/sdb4 /mnt/test2 rm -f /mnt/test2/* dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512 touch /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename ln -s /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename /mnt/test2/link umount /mnt/test2 mount /dev/sdb4 /mnt/test2 ls /mnt/test2/ This patch fixes jbd2_journal_start_commit() to always return 1 when there's a transaction committing or queued for commit. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> CC: Eric Sandeen <sandeen@redhat.com> CC: linux-ext4@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	proc: fix PG_locked reporting in /proc/kpageflags	Helge Bahmann
	commit e07a4b9217d1e97d2f3a62b6b070efdc61212110 upstream. Expr always evaluates to zero. Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	proc: fix kflags to uflags copying in /proc/kpageflags	Wu Fengguang
	commit ad3bdefe877afb47480418fdb05ecd42842de65e upstream. Fix kpf_copy_bit(src,dst) to be kpf_copy_bit(dst,src) to match the actual call patterns, e.g. kpf_copy_bit(kflags, KPF_LOCKED, PG_locked). This misplacement of src/dst only affected reporting of PG_writeback, PG_reclaim and PG_buddy. For others kflags==uflags so not affected. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	inotify: fix GFP_KERNEL related deadlock	Ingo Molnar
	commit f04b30de3c82528f1ab4c58b3dd4c975f5341901 upstream. Enhanced lockdep coverage of __GFP_NOFS turned up this new lockdep assert: [ 1093.677775] [ 1093.677781] ================================= [ 1093.680031] [ INFO: inconsistent lock state ] [ 1093.680031] 2.6.29-rc5-tip-01504-gb49eca1-dirty #1 [ 1093.680031] --------------------------------- [ 1093.680031] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. [ 1093.680031] kswapd0/308 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 1093.680031] (&inode->inotify_mutex){+.+.?.}, at: [<c0205942>] inotify_inode_is_dead+0x20/0x80 [ 1093.680031] {RECLAIM_FS-ON-W} state was registered at: [ 1093.680031] [<c01696b9>] mark_held_locks+0x43/0x5b [ 1093.680031] [<c016baa4>] lockdep_trace_alloc+0x6c/0x6e [ 1093.680031] [<c01cf8b0>] kmem_cache_alloc+0x20/0x150 [ 1093.680031] [<c040d0ec>] idr_pre_get+0x27/0x6c [ 1093.680031] [<c02056e3>] inotify_handle_get_wd+0x25/0xad [ 1093.680031] [<c0205f43>] inotify_add_watch+0x7a/0x129 [ 1093.680031] [<c020679e>] sys_inotify_add_watch+0x20f/0x250 [ 1093.680031] [<c010389e>] sysenter_do_call+0x12/0x35 [ 1093.680031] [<ffffffff>] 0xffffffff [ 1093.680031] irq event stamp: 60417 [ 1093.680031] hardirqs last enabled at (60417): [<c018d5f5>] call_rcu+0x53/0x59 [ 1093.680031] hardirqs last disabled at (60416): [<c018d5b9>] call_rcu+0x17/0x59 [ 1093.680031] softirqs last enabled at (59656): [<c0146229>] __do_softirq+0x157/0x16b [ 1093.680031] softirqs last disabled at (59651): [<c0106293>] do_softirq+0x74/0x15d [ 1093.680031] [ 1093.680031] other info that might help us debug this: [ 1093.680031] 2 locks held by kswapd0/308: [ 1093.680031] #0: (shrinker_rwsem){++++..}, at: [<c01b0502>] shrink_slab+0x36/0x189 [ 1093.680031] #1: (&type->s_umount_key#4){+++++.}, at: [<c01e6d77>] shrink_dcache_memory+0x110/0x1fb [ 1093.680031] [ 1093.680031] stack backtrace: [ 1093.680031] Pid: 308, comm: kswapd0 Not tainted 2.6.29-rc5-tip-01504-gb49eca1-dirty #1 [ 1093.680031] Call Trace: [ 1093.680031] [<c016947a>] valid_state+0x12a/0x13d [ 1093.680031] [<c016954e>] mark_lock+0xc1/0x1e9 [ 1093.680031] [<c016a5b4>] ? check_usage_forwards+0x0/0x3f [ 1093.680031] [<c016ab74>] __lock_acquire+0x2c6/0xac8 [ 1093.680031] [<c01688d9>] ? register_lock_class+0x17/0x228 [ 1093.680031] [<c016b3d3>] lock_acquire+0x5d/0x7a [ 1093.680031] [<c0205942>] ? inotify_inode_is_dead+0x20/0x80 [ 1093.680031] [<c08824c4>] __mutex_lock_common+0x3a/0x4cb [ 1093.680031] [<c0205942>] ? inotify_inode_is_dead+0x20/0x80 [ 1093.680031] [<c08829ed>] mutex_lock_nested+0x2e/0x36 [ 1093.680031] [<c0205942>] ? inotify_inode_is_dead+0x20/0x80 [ 1093.680031] [<c0205942>] inotify_inode_is_dead+0x20/0x80 [ 1093.680031] [<c01e6672>] dentry_iput+0x90/0xc2 [ 1093.680031] [<c01e67a3>] d_kill+0x21/0x45 [ 1093.680031] [<c01e6a46>] __shrink_dcache_sb+0x27f/0x355 [ 1093.680031] [<c01e6dc5