linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2014-04-18	net: fix for a race condition in the inet frag code	Nikolay Aleksandrov
	[ Upstream commit 24b9bf43e93e0edd89072da51cf1fab95fc69dec ] I stumbled upon this very serious bug while hunting for another one, it's a very subtle race condition between inet_frag_evictor, inet_frag_intern and the IPv4/6 frag_queue and expire functions (basically the users of inet_frag_kill/inet_frag_put). What happens is that after a fragment has been added to the hash chain but before it's been added to the lru_list (inet_frag_lru_add) in inet_frag_intern, it may get deleted (either by an expired timer if the system load is high or the timer sufficiently low, or by the fraq_queue function for different reasons) before it's added to the lru_list, then after it gets added it's a matter of time for the evictor to get to a piece of memory which has been freed leading to a number of different bugs depending on what's left there. I've been able to trigger this on both IPv4 and IPv6 (which is normal as the frag code is the same), but it's been much more difficult to trigger on IPv4 due to the protocol differences about how fragments are treated. The setup I used to reproduce this is: 2 machines with 4 x 10G bonded in a RR bond, so the same flow can be seen on multiple cards at the same time. Then I used multiple instances of ping/ping6 to generate fragmented packets and flood the machines with them while running other processes to load the attacked machine. *It is very important to have the _same flow_ coming in on multiple CPUs concurrently. Usually the attacked machine would die in less than 30 minutes, if configured properly to have many evictor calls and timeouts it could happen in 10 minutes or so. An important point to make is that any caller (frag_queue or timer) of inet_frag_kill will remove both the timer refcount and the original/guarding refcount thus removing everything that's keeping the frag from being freed at the next inet_frag_put. All of this could happen before the frag was ever added to the LRU list, then it gets added and the evictor uses a freed fragment. An example for IPv6 would be if a fragment is being added and is at the stage of being inserted in the hash after the hash lock is released, but before inet_frag_lru_add executes (or is able to obtain the lru lock) another overlapping fragment for the same flow arrives at a different CPU which finds it in the hash, but since it's overlapping it drops it invoking inet_frag_kill and thus removing all guarding refcounts, and afterwards freeing it by invoking inet_frag_put which removes the last refcount added previously by inet_frag_find, then inet_frag_lru_add gets executed by inet_frag_intern and we have a freed fragment in the lru_list. The fix is simple, just move the lru_add under the hash chain locked region so when a removing function is called it'll have to wait for the fragment to be added to the lru_list, and then it'll remove it (it works because the hash chain removal is done before the lru_list one and there's no window between the two list adds when the frag can get dropped). With this fix applied I couldn't kill the same machine in 24 hours with the same setup. Fixes: 3ef0eb0db4bf ("net: frag, move LRU list maintenance outside of rwlock") CC: Florian Westphal <fw@strlen.de> CC: Jesper Dangaard Brouer <brouer@redhat.com> CC: David S. Miller <davem@davemloft.net> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18	drm/cirrus: use drm_set_preferred_mode	Gerd Hoffmann
	commit 121a6a17439b000b9699c3fa876636db20fa4107 upstream. Explicitly set 1024x768 as default mode, so the display doesn't come up with the largest supported mode. While being at it drop first three drm_add_modes_noedid calls. As drm_add_modes_noedid fills the mode list with modes from the database up to the specified size it is pretty pointless to call it multiple times with different sizes. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18	drm: add drm_set_preferred_mode	Gerd Hoffmann
	commit 3cf70dafd7bbbc91df0a9ecb081d46f9f3d867f6 upstream. New helper function to set the preferred video mode. Can be called after drm_add_modes_noedid if you don't want the largest supported video mode be used by default. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18	fbdev: Make the switch from generic to native driver less alarming	Adam Jackson
	commit 13ba0ad4490c3dd08b15c430a7a01c6fb45d5bce upstream. Calling this "conflicting" just makes people think there's a problem when there's not. Signed-off-by: Adam Jackson <ajax@redhat.com> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18	video/fb: Propagate error code from failing to unregister conflicting fb	Chris Wilson
	commit 46eeb2c144956e88197439b5ee5cf221a91b0a81 upstream. If we fail to remove a conflicting fb driver, we need to abort the loading of the second driver to avoid likely kernel panics. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18	fb: reorder the lock sequence to fix potential dead lock	Gu Zheng
	commit 3a41c5dbe8bc396a7fb16ca8739e945bb003342e upstream. Following commits: 50e244cc79 fb: rework locking to fix lock ordering on takeover e93a9a8687 fb: Yet another band-aid for fixing lockdep mess 054430e773 fbcon: fix locking harder reworked locking to fix related lock ordering on takeover, and introduced console_lock into fbmem, but it seems that the new lock sequence(fb_info->lock ---> console_lock) is against with the one in console_callback(console_lock ---> fb_info->lock), and leads to a potential dead lock as following: [ 601.079000] ====================================================== [ 601.079000] [ INFO: possible circular locking dependency detected ] [ 601.079000] 3.11.0 #189 Not tainted [ 601.079000] ------------------------------------------------------- [ 601.079000] kworker/0:3/619 is trying to acquire lock: [ 601.079000] (&fb_info->lock){+.+.+.}, at: [<ffffffff81397566>] lock_fb_info+0x26/0x60 [ 601.079000] but task is already holding lock: [ 601.079000] (console_lock){+.+.+.}, at: [<ffffffff8141aae3>] console_callback+0x13/0x160 [ 601.079000] which lock already depends on the new lock. [ 601.079000] the existing dependency chain (in reverse order) is: [ 601.079000] -> #1 (console_lock){+.+.+.}: [ 601.079000] [<ffffffff810dc971>] lock_acquire+0xa1/0x140 [ 601.079000] [<ffffffff810c6267>] console_lock+0x77/0x80 [ 601.079000] [<ffffffff81399448>] register_framebuffer+0x1d8/0x320 [ 601.079000] [<ffffffff81cfb4c8>] efifb_probe+0x408/0x48f [ 601.079000] [<ffffffff8144a963>] platform_drv_probe+0x43/0x80 [ 601.079000] [<ffffffff8144853b>] driver_probe_device+0x8b/0x390 [ 601.079000] [<ffffffff814488eb>] __driver_attach+0xab/0xb0 [ 601.079000] [<ffffffff814463bd>] bus_for_each_dev+0x5d/0xa0 [ 601.079000] [<ffffffff81447e6e>] driver_attach+0x1e/0x20 [ 601.079000] [<ffffffff81447a07>] bus_add_driver+0x117/0x290 [ 601.079000] [<ffffffff81448fea>] driver_register+0x7a/0x170 [ 601.079000] [<ffffffff8144a10a>] __platform_driver_register+0x4a/0x50 [ 601.079000] [<ffffffff8144a12d>] platform_driver_probe+0x1d/0xb0 [ 601.079000] [<ffffffff81cfb0a1>] efifb_init+0x273/0x292 [ 601.079000] [<ffffffff81002132>] do_one_initcall+0x102/0x1c0 [ 601.079000] [<ffffffff81cb80a6>] kernel_init_freeable+0x15d/0x1ef [ 601.079000] [<ffffffff8166d2de>] kernel_init+0xe/0xf0 [ 601.079000] [<ffffffff816914ec>] ret_from_fork+0x7c/0xb0 [ 601.079000] -> #0 (&fb_info->lock){+.+.+.}: [ 601.079000] [<ffffffff810dc1d8>] __lock_acquire+0x1e18/0x1f10 [ 601.079000] [<ffffffff810dc971>] lock_acquire+0xa1/0x140 [ 601.079000] [<ffffffff816835ca>] mutex_lock_nested+0x7a/0x3b0 [ 601.079000] [<ffffffff81397566>] lock_fb_info+0x26/0x60 [ 601.079000] [<ffffffff813a4aeb>] fbcon_blank+0x29b/0x2e0 [ 601.079000] [<ffffffff81418658>] do_blank_screen+0x1d8/0x280 [ 601.079000] [<ffffffff8141ab34>] console_callback+0x64/0x160 [ 601.079000] [<ffffffff8108d855>] process_one_work+0x1f5/0x540 [ 601.079000] [<ffffffff8108e04c>] worker_thread+0x11c/0x370 [ 601.079000] [<ffffffff81095fbd>] kthread+0xed/0x100 [ 601.079000] [<ffffffff816914ec>] ret_from_fork+0x7c/0xb0 [ 601.079000] other info that might help us debug this: [ 601.079000] Possible unsafe locking scenario: [ 601.079000] CPU0 CPU1 [ 601.079000] ---- ---- [ 601.079000] lock(console_lock); [ 601.079000] lock(&fb_info->lock); [ 601.079000] lock(console_lock); [ 601.079000] lock(&fb_info->lock); [ 601.079000] * DEADLOCK * so we reorder the lock sequence the same as it in console_callback() to avoid this issue. And following Tomi's suggestion, fix these similar issues all in fb subsystem. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18	drm: Prefer noninterlace cmdline mode unless explicitly specified	Takashi Iwai
	commit c683f427bdc43525f61e26609d34e799e7ea4c12 upstream. Currently drm_pick_cmdline_mode() doesn't care about the interlace when the given mode line has no "i" suffix. That is, when there are multiple entries for the same resolution, an interlace mode might be picked up just depending on the assigned order, and there is no way to exclude it. This patch changes the logic for the mode selection, to prefer the noninterlace mode unless the interlace mode is explicitly given. When no matching mode is found, it still tries the interlace mode as fallback. Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18	drm/radeon: enable speaker allocation setup on dce3.2	Alex Deucher
	commit 3803c8e5b50946dd6bc18972d9190757d05648f0 upstream. Now that we disable audio while setting up the audio hw, we should be able to set this up without hangs. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18	drm/radeon: change audio enable logic	Alex Deucher
	commit 832eafaf34ff7d0348fe701e417900c6cf1f5656 upstream. Disable audio around audio hw setup. This may avoid hangs on certain asics. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18	drm/cirrus: Fix cirrus drm driver for fbdev + qemu	Martin Koegler
	commit 99d4a8ae93ead27b5a88cdbd09dc556fe96ac3a8 upstream. Xorg fbdev driver requires smem_start/smem_len, otherwise it tries to map 0 bytes as video memory. Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=856760 Signed-off-by: Martin Koegler <martin.koegler@chello.at> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18	drm/i915: Undo the PIPEA quirk for i845	Chris Wilson
	commit a4945f9522d27e1e6d64a02ad055e83768cb0896 upstream. The PIPEA quirk is specifically for the issue with the PIPEB PLL on 830gm being slaved to the PIPEA PLL, and so to use PIPEB requires PIPEA running. i845 doesn't even have the second PLL or pipe, and enabling the quirk results in a blank DVO LVDS. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18	floppy: bail out in open() if drive is not responding to block0 read	Jiri Kosina
	commit 7b7b68bba5ef23734c35ffb0d8d82079ed604d33 upstream. In case reading of block 0 during open() fails, it is not the right thing to let open() succeed. Fix this by introducing FD_OPEN_SHOULD_FAIL_BIT flag, and setting it in case the bio callback encounters an error while trying to read block 0. As a bonus, this works around certain broken userspace (blkid), which is not able to properly handle read()s returning IO errors. Hence be nice to those, and bail out during open() already; if block 0 is not readable, read()s are not going to provide any meaningful data anyway. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18	ext4: Speedup WB_SYNC_ALL pass called from sync(2)	Jan Kara
	commit 10542c229a4e8e25b40357beea66abe9dacda2c0 upstream. When doing filesystem wide sync, there's no need to force transaction commit (or synchronously write inode buffer) separately for each inode because ext4_sync_fs() takes care of forcing commit at the end (VFS takes care of flushing buffer cache, respectively). Most of the time this slowness doesn't manifest because previous WB_SYNC_NONE writeback doesn't leave much to write but when there are processes aggressively creating new files and several filesystems to sync, the sync slowness can be noticeable. In the following test script sync(1) takes around 6 minutes when there are two ext4 filesystems mounted on a standard SATA drive. After this patch sync takes a couple of seconds so we have about two orders of magnitude improvement. function run_writers { for (( i = 0; i < 10; i++ )); do mkdir $1/dir$i for (( j = 0; j < 40000; j++ )); do dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null done & done } for dir in "$@"; do run_writers $dir done sleep 40 time sync Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18	SUNRPC: Fix potential memory scribble in xprt_free_bc_request()	Trond Myklebust
	commit 628356791b04ea988fee070f66a748a823d001bb upstream. The call to xprt_free_allocation() will call list_del() on req->rq_bc_pa_list, which is not attached to a list. This patch moves the list_del() out of xprt_free_allocation() and into those callers that need it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18	NFSv3: Fix return value of nfs3_proc_setacls	Trond Myklebust
	commit 8f493b9cfcd8941c6b27d6ce8e3b4a78c094b3c1 upstream. nfs3_proc_setacls is used internally by the NFSv3 create operations to set the acl after the file has been created. If the operation fails because the server doesn't support acls, then it must return '0', not -EOPNOTSUPP. Reported-by: Russell King <linux@arm.linux.org.uk> Link: http://lkml.kernel.org/r/20140201010328.GI15937@n2100.arm.linux.org.uk Cc: Christoph Hellwig <hch@lst.de> Tested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18	nfs: initialize the ACL support bits to zero.	Malahal Naineni
	commit a1800acaf7d1c2bf6d68b9a8f4ab8560cc66555a upstream. Avoid returning incorrect acl mask attributes when the server doesn't support ACLs. Signed-off-by: Malahal Naineni <malahal@us.ibm.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-18	Char: ipmi_bt_sm, fix infinite loop	Jiri Slaby
	commit a94cdd1f4d30f12904ab528152731fb13a812a16 upstream. In read_all_bytes, we do unsigned char i; ... bt->read_data[0] = BMC2HOST; bt->read_count = bt->read_data[0]; ... for (i = 1; i <= bt->read_count; i++) bt->read_data[i] = BMC2HOST; If bt->read_data[0] == bt->read_count == 255, we loop infinitely in the 'for' loop. Make 'i' an 'int' instead of 'char' to get rid of the overflow and finish the loop after 255 iterations every time. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Reported-and-debugged-by: Rui Hui Dian <rhdian@novell.com> Cc: Tomas Cech <tcech@suse.cz> Cc: Corey Minyard <minyard@acm.org> Cc: <openipmi-developer@lists.sourceforge.net> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-13	m68k: Skip futex_atomic_cmpxchg_inatomic() test	Finn Thain
	commit e571c58f313d35c56e0018470e3375ddd1fd320e upstream. Skip the futex_atomic_cmpxchg_inatomic() test in futex_init(). It causes a fatal exception on 68030 (and presumably 68020 also). Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1403061006440.5525@nippy.intranet Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-13	futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test	Heiko Carstens
	commit 03b8c7b623c80af264c4c8d6111e5c6289933666 upstream. If an architecture has futex_atomic_cmpxchg_inatomic() implemented and there is no runtime check necessary, allow to skip the test within futex_init(). This allows to get rid of some code which would always give the same result, and also allows the compiler to optimize a couple of if statements away. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Finn Thain <fthain@telegraphics.com.au> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Link: http://lkml.kernel.org/r/20140302120947.GA3641@osiris Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [geert: Backported to v3.10..v3.13] Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-13	selinux: correctly label /proc inodes in use before the policy is loaded	Paul Moore
	commit f64410ec665479d7b4b77b7519e814253ed0f686 upstream. This patch is based on an earlier patch by Eric Paris, he describes the problem below: "If an inode is accessed before policy load it will get placed on a list of inodes to be initialized after policy load. After policy load we call inode_doinit() which calls inode_doinit_with_dentry() on all inodes accessed before policy load. In the case of inodes in procfs that means we'll end up at the bottom where it does: /* Default to the fs superblock SID. */ isec->sid = sbsec->sid; if ((sbsec->flags & SE_SBPROC) && !S_ISLNK(inode->i_mode)) { if (opt_dentry) { isec->sclass = inode_mode_to_security_class(...) rc = selinux_proc_get_sid(opt_dentry, isec->sclass, &sid); if (rc) goto out_unlock; isec->sid = sid; } } Since opt_dentry is null, we'll never call selinux_proc_get_sid() and will leave the inode labeled with the label on the superblock. I believe a fix would be to mimic the behavior of xattrs. Look for an alias of the inode. If it can't be found, just leave the inode uninitialized (and pick it up later) if it can be found, we should be able to call selinux_proc_get_sid() ..." On a system exhibiting this problem, you will notice a lot of files in /proc with the generic "proc_t" type (at least the ones that were accessed early in the boot), for example: # ls -Z /proc/sys/kernel/shmmax \| awk '{ print $4 " " $5 }' system_u:object_r:proc_t:s0 /proc/sys/kernel/shmmax However, with this patch in place we see the expected result: # ls -Z /proc/sys/kernel/shmmax \| awk '{ print $4 " " $5 }' system_u:object_r:sysctl_kernel_t:s0 /proc/sys/kernel/shmmax Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-13	PCI: mvebu: move clock enable before register access	Sebastian Hesselbarth
	commit b42285f66f871a9898a0e79e2d74bc7e7a101995 upstream. The clock passed to PCI controller found on MVEBU SoCs may come from a clock gate. This requires the clock to be enabled before any registers are accessed. Therefore, move the clock enable before register iomap to ensure it is enabled. Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-13	powernow-k6: reorder frequencies	Mikulas Patocka
	commit 22c73795b101597051924556dce019385a1e2fa0 upstream. This patch reorders reported frequencies from the highest to the lowest, just like in other frequency drivers. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-13	powernow-k6: correctly initialize default parameters	Mikulas Patocka
	commit d82b922a4acc1781d368aceac2f9da43b038cab2 upstream. The powernow-k6 driver used to read the initial multiplier from the powernow register. However, there is a problem with this: * If there was a frequency transition before, the multiplier read from the register corresponds to the current multiplier. * If there was no frequency transition since reset, the field in the register always reads as zero, regardless of the current multiplier that is set using switches on the mainboard and that the CPU is running at. The zero value corresponds to multiplier 4.5, so as a consequence, the powernow-k6 driver always assumes multiplier 4.5. For example, if we have 550MHz CPU with bus frequency 100MHz and multiplier 5.5, the powernow-k6 driver thinks that the multiplier is 4.5 and bus frequency is 122MHz. The powernow-k6 driver then sets the multiplier to 4.5, underclocking the CPU to 450MHz, but reports the current frequency as 550MHz. There is no reliable way how to read the initial multiplier. I modified the driver so that it contains a table of known frequencies (based on parameters of existing CPUs and some common overclocking schemes) and sets the multiplier according to the frequency. If the frequency is unknown (because of unusual overclocking or underclocking), the user must supply the bus speed and maximum multiplier as module parameters. This patch should be backported to all stable kernels. If it doesn't apply cleanly, change it, or ask me to change it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-13	powernow-k6: disable cache when changing frequency	Mikulas Patocka
	commit e20e1d0ac02308e2211306fc67abcd0b2668fb8b upstream. I found out that a system with k6-3+ processor is unstable during network server load. The system locks up or the network card stops receiving. The reason for the instability is the CPU frequency scaling. During frequency transition the processor is in "EPM Stop Grant" state. The documentation says that the processor doesn't respond to inquiry requests in this state. Consequently, coherency of processor caches and bus master devices is not maintained, causing the system instability. This patch flushes the cache during frequency transition. It fixes the instability. Other minor changes: * u64 invalue changed to unsigned long because the variable is 32-bit * move the logic to set the multiplier to a separate function powernow_k6_set_cpu_multiplier * preserve lower 5 bits of the powernow port instead of 4 (the voltage field has 5 bits) * mask interrupts when reading the multiplier, so that the port is not open during other activity (running other kernel code with the port open shouldn't cause any misbehavior, but we should better be safe and keep the port closed) This patch should be backported to all stable kernels. If it doesn't apply cleanly, change it, or ask me to change it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	Linux 3.12.17v3.12.17	Jiri Slaby

2014-04-03	netfilter: nf_conntrack_dccp: fix skb_header_pointer API usages	Daniel Borkmann
	commit b22f5126a24b3b2f15448c3f2a254fc10cbc2b92 upstream. Some occurences in the netfilter tree use skb_header_pointer() in the following way ... struct dccp_hdr _dh, *dh; ... skb_header_pointer(skb, dataoff, sizeof(_dh), &dh); ... where dh itself is a pointer that is being passed as the copy buffer. Instead, we need to use &_dh as the forth argument so that we're copying the data into an actual buffer that sits on the stack. Currently, we probably could overwrite memory on the stack (e.g. with a possibly mal-formed DCCP packet), but unintentionally, as we only want the buffer to be placed into _dh variable. Fixes: 2bc780499aa3 ("[NETFILTER]: nf_conntrack: add DCCP protocol support") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	mm: close PageTail race	David Rientjes
	commit 668f9abbd4334e6c29fa8acd71635c4f9101caa7 upstream. Commit bf6bddf1924e ("mm: introduce compaction and migration for ballooned pages") introduces page_count(page) into memory compaction which dereferences page->first_page if PageTail(page). This results in a very rare NULL pointer dereference on the aforementioned page_count(page). Indeed, anything that does compound_head(), including page_count() is susceptible to racing with prep_compound_page() and seeing a NULL or dangling page->first_page pointer. This patch uses Andrea's implementation of compound_trans_head() that deals with such a race and makes it the default compound_head() implementation. This includes a read memory barrier that ensures that if PageTail(head) is true that we return a head page that is neither NULL nor dangling. The patch then adds a store memory barrier to prep_compound_page() to ensure page->first_page is set. This is the safest way to ensure we see the head page that we are expecting, PageTail(page) is already in the unlikely() path and the memory barriers are unfortunately required. Hugetlbfs is the exception, we don't enforce a store memory barrier during init since no race is possible. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Holger Kiehl <Holger.Kiehl@dwd.de> Cc: Christoph Lameter <cl@linux.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	net: mvneta: fix usage as a module on RGMII configurations	Thomas Petazzoni
	commit e3a8786c10e75903f1269474e21fe8cb49c3a670 upstream. Commit 5445eaf309ff ('mvneta: Try to fix mvneta when compiled as module') fixed the mvneta driver to make it work properly when loaded as a module in SGMII configuration, which was tested successful by the author on the Armada XP OpenBlocks AX3, which uses SGMII. However, it turns out that the Armada XP GP, which uses RGMII, is affected by a similar problem: its SERDES configuration is lost when mvneta is loaded as a module, because this configuration is set by the bootloader, and then lost because the clock is gated by the clock framework until the mvneta driver is loaded again and the clock is re-enabled. However, it turns out that for the RGMII case, setting the SERDES configuration is not sufficient: the PCS enable bit in the MVNETA_GMAC_CTRL_2 register must also be set, like in the SGMII configuration. Therefore, this commit reworks the SGMII/RGMII initialization: the only difference between the two now is a different SERDES configuration, all the rest is identical. In detail, to achieve this, the commit: * Renames MVNETA_SGMII_SERDES_CFG to MVNETA_SERDES_CFG because it is not specific to SGMII, but also used on RGMII configurations. * Adds a MVNETA_RGMII_SERDES_PROTO definition, that must be used as the MVNETA_SERDES_CFG value in RGMII configurations. * Removes the mvneta_gmac_rgmii_set() and mvneta_port_sgmii_config() functions, and instead directly do the SGMII/RGMII configuration in mvneta_port_up(), from where those functions where called. It is worth mentioning that mvneta_gmac_rgmii_set() had an 'enable' parameter that was always passed as '1', so it was pretty useless. * Reworks the mvneta_port_up() function to set the MVNETA_SERDES_CFG register to the appropriate value depending on the RGMII vs. SGMII configuration. It also unconditionally set the PCS_ENABLE bit (was already done for SGMII, but is now also needed for RGMII), and sets the PORT_RGMII bit (which was already done for both SGMII and RGMII). This commit was successfully tested with mvneta compiled as a module, on both the OpenBlocks AX3 (SGMII configuration) and the Armada XP GP (RGMII configuration). Reported-by: Steve McIntyre <steve@einval.com> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	net: mvneta: rename MVNETA_GMAC2_PSC_ENABLE to MVNETA_GMAC2_PCS_ENABLE	Thomas Petazzoni
	commit a79121d3b57e7ad61f0b5d23eae05214054f3ccd upstream. Bit 3 of the MVNETA_GMAC_CTRL_2 is actually used to enable the PCS, not the PSC: there was a typo in the name of the define, which this commit fixes. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	make prepend_name() work correctly when called with negative *buflen	Al Viro
	commit e825196d48d2b89a6ec3a8eff280098d2a78207e upstream. In all callchains leading to prepend_name(), the value left in buflen is eventually discarded unused if prepend_name() has returned a negative. So we are free to do what prepend() does, and subtract from buflen before checking for underflow (which turns into checking the sign of subtraction result, of course). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	x86: fix boot on uniprocessor systems	Artem Fetishev
	commit 825600c0f20e595daaa7a6dd8970f84fa2a2ee57 upstream. On x86 uniprocessor systems topology_physical_package_id() returns -1 which causes rapl_cpu_prepare() to leave rapl_pmu variable uninitialized which leads to GPF in rapl_pmu_init(). See arch/x86/kernel/cpu/perf_event_intel_rapl.c. It turns out that physical_package_id and core_id can actually be retreived for uniprocessor systems too. Enabling them also fixes rapl_pmu code. Signed-off-by: Artem Fetishev <artem_fetishev@epam.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	drm/i915: Undo gtt scratch pte unmapping again	Daniel Vetter
	commit 8ee661b505613ef2747b350ca2871a31b3781bee upstream. It apparently blows up on some machines. This functionally reverts commit 828c79087cec61eaf4c76bb32c222fbe35ac3930 Author: Ben Widawsky <benjamin.widawsky@intel.com> Date: Wed Oct 16 09:21:30 2013 -0700 drm/i915: Disable GGTT PTEs on GEN6+ suspend Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64841 Reported-and-Tested-by: Brad Jackson <bjackson0971@gmail.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Todd Previte <tprevite@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	i2c: cpm: Fix build by adding of_address.h and of_irq.h	Scott Wood
	commit 5f12c5eca6e6b7aeb4b2028d579f614b4fe7a81f upstream. Fixes a build break due to the undeclared use of irq_of_parse_and_map() and of_iomap(). This build break was apparently introduced while the driver was unbuildable due to the bug fixed by 62c19c9d29e65086e5ae76df371ed2e6b23f00cd ("i2c: Remove usage of orphaned symbol OF_I2C"). When 62c19c was added in v3.14-rc7, the driver was enabled again, breaking the powerpc mpc85xx_defconfig and mpc85xx_smp_defconfig. 62c19c is marked for stable, so this should go there as well. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	Revert "xen: properly account for _PAGE_NUMA during xen pte translations"	David Vrabel
	commit 5926f87fdaad4be3ed10cec563bf357915e55a86 upstream. This reverts commit a9c8e4beeeb64c22b84c803747487857fe424b68. PTEs in Xen PV guests must contain machine addresses if _PAGE_PRESENT is set and pseudo-physical addresses is _PAGE_PRESENT is clear. This is because during a domain save/restore (migration) the page table entries are "canonicalised" and uncanonicalised". i.e., MFNs are converted to PFNs during domain save so that on a restore the page table entries may be rewritten with the new MFNs on the destination. This canonicalisation is only done for PTEs that are present. This change resulted in writing PTEs with MFNs if _PAGE_PROTNONE (or _PAGE_NUMA) was set but _PAGE_PRESENT was clear. These PTEs would be migrated as-is which would result in unexpected behaviour in the destination domain. Either a) the MFN would be translated to the wrong PFN/page; b) setting the _PAGE_PRESENT bit would clear the PTE because the MFN is no longer owned by the domain; or c) the present bit would not get set. Symptoms include "Bad page" reports when munmapping after migrating a domain. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	xen/balloon: flush persistent kmaps in correct position	Wei Liu
	commit 09ed3d5ba06137913960f9c9385f71fc384193ab upstream. Xen balloon driver will update ballooned out pages' P2M entries to point to scratch page for PV guests. In 24f69373e2 ("xen/balloon: don't alloc page while non-preemptible", kmap_flush_unused was moved after updating P2M table. In that case for 32 bit PV guest we might end up with P2M X -----> S (S is mfn of balloon scratch page) M2P Y -----> X (Y is mfn in persistent kmap entry) kmap_flush_unused() iterates through all the PTEs in the kmap address space, using pte_to_page() to obtain the page. If the p2m and the m2p are inconsistent the incorrect page is returned. This will clear page->address on the wrong page which may cause subsequent oopses if that page is currently kmap'ed. Move the flush back between get_page and __set_phys_to_machine to fix this. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	Input: cypress_ps2 - don't report as a button pads	Hans de Goede
	commit 6797b39e6f6f34c74177736e146406e894b9482b upstream. The cypress PS/2 trackpad models supported by the cypress_ps2 driver emulate BTN_RIGHT events in firmware based on the finger position, as part of this no motion events are sent when the finger is in the button area. The INPUT_PROP_BUTTONPAD property is there to indicate to userspace that BTN_RIGHT events should be emulated in userspace, which is not necessary in this case. When INPUT_PROP_BUTTONPAD is advertised userspace will wait for a motion event before propagating the button event higher up the stack, as it needs current abs x + y data for its BTN_RIGHT emulation. Since in the cypress_ps2 pads don't report motion events in the button area, this means that clicks in the button area end up being ignored, so INPUT_PROP_BUTTONPAD actually causes problems for these touchpads, and removing it fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76341 Reported-by: Adam Williamson <awilliam@redhat.com> Tested-by: Adam Williamson <awilliam@redhat.com> Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	Input: synaptics - add manual min/max quirk for ThinkPad X240	Hans de Goede
	commit 8a0435d958fb36d93b8df610124a0e91e5675c82 upstream. This extends Benjamin Tissoires manual min/max quirk table with support for the ThinkPad X240. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	Input: synaptics - add manual min/max quirk	Benjamin Tissoires
	commit 421e08c41fda1f0c2ff6af81a67b491389b653a5 upstream. The new Lenovo Haswell series (-40's) contains a new Synaptics touchpad. However, these new Synaptics devices report bad axis ranges. Under Windows, it is not a problem because the Windows driver uses RMI4 over SMBus to talk to the device. Under Linux, we are using the PS/2 fallback interface and it occurs the reported ranges are wrong. Of course, it would be too easy to have only one range for the whole series, each touchpad seems to be calibrated in a different way. We can not use SMBus to get the actual range because I suspect the firmware will switch into the SMBus mode and stop talking through PS/2 (this is the case for hybrid HID over I2C / PS/2 Synaptics touchpads). So as a temporary solution (until RMI4 land into upstream), start a new list of quirks with the min/max manually set. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	Input: mousedev - fix race when creating mixed device	Dmitry Torokhov
	commit e4dbedc7eac7da9db363a36f2bd4366962eeefcc upstream. We should not be using static variable mousedev_mix in methods that can be called before that singleton gets assigned. While at it let's add open and close methods to mousedev structure so that we do not need to test if we are dealing with multiplexor or normal device and simply call appropriate method directly. This fixes: https://bugzilla.kernel.org/show_bug.cgi?id=71551 Reported-by: GiulioDP <depasquale.giulio@gmail.com> Tested-by: GiulioDP <depasquale.giulio@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	ext4: atomically set inode->i_flags in ext4_set_inode_flags()	Theodore Ts'o
	commit 00a1a053ebe5febcfc2ec498bd894f035ad2aa06 upstream. Use cmpxchg() to atomically set i_flags instead of clearing out the S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race where an immutable file has the immutable flag cleared for a brief window of time. Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	s390/time,vdso: fix clock_gettime for CLOCK_MONOTONIC	Martin Schwidefsky
	commit ca5de58ba746b08c920b2024aaf01aa1500b110d upstream. With git commit 79c74ecbebf76732f91b82a62ce7fc8a88326962 "s390/time,vdso: convert to the new update_vsyscall interface" the new update_vsyscall function already does the sum of xtime and wall_to_monotonic. The old update_vsyscall function only copied the wall_to_monotonic offset. The vdso code needs to be modified to take this into consideration. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	Input: wacom - add support for three new Intuos Pro devices	Ping Cheng
	commit b5fd2a3e92ca5c8c1f3c20d31ac5daed3ec4d604 upstream. Two tablets in this series support both pen and touch. One (Intuos S) only supports pen. This patch also updates the driver to process wireless devices that do not support touch interface. Tested-by: Jason Gerecke <killertofu@gmail.com> Acked-by: Chris Bagwell <chris@cnpbagwell.com> Signed-off-by: Ping Cheng <pingc@wacom.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	ipvs: fix AF assignment in ip_vs_conn_new()	Michal Kubecek
	commit 2a971354e74f3837d14b9c8d7f7983b0c9c330e4 upstream. If a fwmark is passed to ip_vs_conn_new(), it is passed in vaddr, not daddr. Therefore we should set AF to AF_UNSPEC in vaddr assignment (like we do in ip_vs_ct_in_get()), otherwise we may copy only first 4 bytes of an IPv6 address into cp->daddr. Signed-off-by: Bogdano Arendartchuk <barendartchuk@suse.com> Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	FS-Cache: Handle removal of unadded object to the fscache_object_list rb tree	David Howells
	commit 7026f1929e18921fd67bf478f475a8fdfdff16ae upstream. When FS-Cache allocates an object, the following sequence of events can occur: -->fscache_alloc_object() -->cachefiles_alloc_object() [via cache->ops->alloc_object] <--[returns new object] -->fscache_attach_object() <--[failed] -->cachefiles_put_object() [via cache->ops->put_object] -->fscache_object_destroy() -->fscache_objlist_remove() -->rb_erase() to remove the object from fscache_object_list. resulting in a crash in the rbtree code. The problem is that the object is only added to fscache_object_list on the success path of fscache_attach_object() where it calls fscache_objlist_add(). So if fscache_attach_object() fails, the object won't have been added to the objlist rbtree. We do, however, unconditionally try to remove the object from the tree. Thanks to NeilBrown for finding this and suggesting this solution. Reported-by: NeilBrown <neilb@suse.de> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: (a customer of) NeilBrown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	dlm: set zero linger time on sctp socket	Dongmao Zhang
	commit ece35848c1847cdf3dd07954578d3e99238ebbae upstream. The recovery time for a failed node was taking a long time because the failed node could not perform the full shutdown process. Removing the linger time speeds this up. The dlm does not care what happens to messages to or from the failed node. Signed-off-by: Dongmao Zhang <dmzhang@suse.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	NFSv4.1 free slot before resending I/O to MDS	Andy Adamson
	commit f9c96fcc501a43dbc292b17fc0ded4b54e63b79d upstream. Fix a dynamic session slot leak where a slot is preallocated and I/O is resent through the MDS. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	nfs: add memory barriers around NFS_INO_INVALID_DATA and NFS_INO_INVALIDATING	Jeff Layton
	commit 4db72b40fdbc706f8957e9773ae73b1574b8c694 upstream. If the setting of NFS_INO_INVALIDATING gets reordered to before the clearing of NFS_INO_INVALID_DATA, then another task may hit a race window where both appear to be clear, even though the inode's pages are still in need of invalidation. Fix this by adding the appropriate memory barriers. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	NFS: Fix races in nfs_revalidate_mapping	Trond Myklebust
	commit 17dfeb9113397a6119091a491ef7182649f0c5a9 upstream. Commit d529ef83c355f97027ff85298a9709fe06216a66 (NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping) introduces a potential race, since it doesn't test the value of nfsi->cache_validity and set the bitlock in nfsi->flags atomically. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03	NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping	Jeff Layton
	commit d529ef83c355f97027ff85298a9709fe06216a66 upstream. There is a possible race in how the nfs_invalidate_mapping function is handled. Currently, we go and invalidate the pages in the file and then clear NFS_INO_INVALID_DATA. The problem is that it's possible for a stale page to creep into the mapping after the page was invalidated (i.e., via readahead). If another writer comes along and sets the flag after that happens but before invalidate_inode_pages2 returns then we could clear the flag without the cache having been properly invalidated. So, we must clear the flag first and then invalidate the pages. Doing this however, opens another race: It's possible to have two concurrent read() calls that end up in nfs_revalidate_mapping at the same time. The first one clears the NFS_INO_INVALID_DATA flag and then goes to call nfs_invalidate_mapping. Just before calling that though, the other task races in, checks the flag and finds it cleared. At that point, it trusts that the mapping is good and gets the lock on the page, allowing the read() to be satisfied from the cache even though the data is no longer valid. These effects are easily manifested by running diotest3 from the LTP test suite on NFS. That program does a series of DIO writes and buffered reads. The operations are serialized and page-aligned but the existing code fails the test since it occasionally allows a read to come out of the cache incorrectly. While mixing direct and buffered I/O isn't recommended, I believe it's possible to hit this in other ways that just use buffered I/O, though that situation is much harder to reproduce. The problem is that the checking/clearing of that flag and the invalidation of the mapping really need to be atomic. Fix this by serializing concurrent invalidations with a bitlock. At the same time, we also need to allow other places that check NFS_INO_INVALID_DATA to check whether we might be in the middle of invalidating the file, so fix up a couple of places that do t