linux/fs, branch v3.2.2

proc: clear_refs: do not clear reserved pages

2012-01-26T00:13:58Z

commit 85e72aa5384b1a614563ad63257ded0e91d1a620 upstream. /proc/pid/clear_refs is used to clear the Referenced and YOUNG bits for pages and corresponding page table entries of the task with PID pid, which includes any special mappings inserted into the page tables in order to provide things like vDSOs and user helper functions. On ARM this causes a problem because the vectors page is mapped as a global mapping and since ec706dab ("ARM: add a vma entry for the user accessible vector page"), a VMA is also inserted into each task for this page to aid unwinding through signals and syscall restarts. Since the vectors page is required for handling faults, clearing the YOUNG bit (and subsequently writing a faulting pte) means that we lose the vectors page *globally* and cannot fault it back in. This results in a system deadlock on the next exception. To see this problem in action, just run: $ echo 1 > /proc/self/clear_refs on an ARM platform (as any user) and watch your system hang. I think this has been the case since 2.6.37 This patch avoids clearing the aforementioned bits for reserved pages, therefore leaving the vectors page intact on ARM. Since reserved pages are not candidates for swap, this change should not have any impact on the usefulness of clear_refs. Signed-off-by: Will Deacon Reported-by: Moussa Ba Acked-by: Hugh Dickins Cc: David Rientjes Cc: Russell King Acked-by: Nicolas Pitre Cc: Matt Mackall Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman

cifs: lower default wsize when unix extensions are not used

2012-01-26T00:13:57Z

commit ce91acb3acae26f4163c5a6f1f695d1a1e8d9009 upstream. We've had some reports of servers (namely, the Solaris in-kernel CIFS server) that don't deal properly with writes that are "too large" even though they set CAP_LARGE_WRITE_ANDX. Change the default to better mirror what windows clients do. Cc: Pavel Shilovsky Reported-by: Nick Davis Signed-off-by: Jeff Layton Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman

xfs: fix endian conversion issue in discard code

2012-01-26T00:13:55Z

commit b1c770c273a4787069306fc82aab245e9ac72e9d upstream When finding the longest extent in an AG, we read the value directly out of the AGF buffer without endian conversion. This will give an incorrect length, resulting in FITRIM operations potentially not trimming everything that it should. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Signed-off-by: Ben Myers Signed-off-by: Greg Kroah-Hartman

proc: clean up and fix /proc//mem handling

2012-01-26T00:13:43Z

commit e268337dfe26dfc7efd422a804dbb27977a3cccc upstream. Jüri Aedla reported that the /proc//mem handling really isn't very robust, and it also doesn't match the permission checking of any of the other related files. This changes it to do the permission checks at open time, and instead of tracking the process, it tracks the VM at the time of the open. That simplifies the code a lot, but does mean that if you hold the file descriptor open over an execve(), you'll continue to read from the _old_ VM. That is different from our previous behavior, but much simpler. If somebody actually finds a load where this matters, we'll need to revert this commit. I suspect that nobody will ever notice - because the process mapping addresses will also have changed as part of the execve. So you cannot actually usefully access the fd across a VM change simply because all the offsets for IO would have changed too. Reported-by: Jüri Aedla Cc: Al Viro Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman

fix cputime overflow in uptime_proc_show

2012-01-26T00:13:41Z

commit c3e0ef9a298e028a82ada28101ccd5cf64d209ee upstream. For 32-bit architectures using standard jiffies the idletime calculation in uptime_proc_show will quickly overflow. It takes (2^32 / HZ) seconds of idle-time, or e.g. 12.45 days with no load on a quad-core with HZ=1000. Switch to 64-bit calculations. Cc: Michael Abbott Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman

pnfsblock: limit bio page count

2012-01-26T00:13:37Z

commit 74a6eeb44ca6174d9cc93b9b8b4d58211c57bc80 upstream. One bio can have at most BIO_MAX_PAGES pages. We should limit it bec otherwise bio_alloc will fail when there are many pages in one read/write_pagelist. Signed-off-by: Peng Tao Signed-off-by: Benny Halevy Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman

pnfsblock: don't spinlock when freeing block_dev

2012-01-26T00:13:36Z

commit 93a3844ee0f843b05a1df4b52e1a19ff26b98d24 upstream. bl_free_block_dev() may sleep. We can not call it with spinlock held. Besides, there is no need to take bm_lock as we are last user freeing bm_devlist. Signed-off-by: Peng Tao Signed-off-by: Benny Halevy Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman

pnfsblock: acquire im_lock in _preload_range

2012-01-26T00:13:36Z

commit 39e567ae36fe03c2b446e1b83ee3d39bea08f90b upstream. When calling _add_entry, we should take the im_lock to protect agains other modifiers. Signed-off-by: Peng Tao Signed-off-by: Benny Halevy Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman

fix shrink_dcache_parent() livelock

2012-01-26T00:13:35Z

commit eaf5f9073533cde21c7121c136f1c3f072d9cf59 upstream. Two (or more) concurrent calls of shrink_dcache_parent() on the same dentry may cause shrink_dcache_parent() to loop forever. Here's what appears to happen: 1 - CPU0: select_parent(P) finds C and puts it on dispose list, returns 1 2 - CPU1: select_parent(P) locks P->d_lock 3 - CPU0: shrink_dentry_list() locks C->d_lock dentry_kill(C) tries to lock P->d_lock but fails, unlocks C->d_lock 4 - CPU1: select_parent(P) locks C->d_lock, moves C from dispose list being processed on CPU0 to the new dispose list, returns 1 5 - CPU0: shrink_dentry_list() finds dispose list empty, returns 6 - Goto 2 with CPU0 and CPU1 switched Basically select_parent() steals the dentry from shrink_dentry_list() and thinks it found a new one, causing shrink_dentry_list() to think it's making progress and loop over and over. One way to trigger this is to make udev calls stat() on the sysfs file while it is going away. Having a file in /lib/udev/rules.d/ with only this one rule seems to the trick: ATTR{vendor}=="0x8086", ATTR{device}=="0x10ca", ENV{PCI_SLOT_NAME}="%k", ENV{MATCHADDR}="$attr{address}", RUN+="/bin/true" Then execute the following loop: while true; do echo -bond0 > /sys/class/net/bonding_masters echo +bond0 > /sys/class/net/bonding_masters echo -bond1 > /sys/class/net/bonding_masters echo +bond1 > /sys/class/net/bonding_masters done One fix would be to check all callers and prevent concurrent calls to shrink_dcache_parent(). But I think a better solution is to stop the stealing behavior. This patch adds a new dentry flag that is set when the dentry is added to the dispose list. The flag is cleared in dentry_lru_del() in case the dentry gets a new reference just before being pruned. If the dentry has this flag, select_parent() will skip it and let shrink_dentry_list() retry pruning it. With select_parent() skipping those dentries there will not be the appearance of progress (new dentries found) when there is none, hence shrink_dcache_parent() will not loop forever. Set the flag is also set in prune_dcache_sb() for consistency as suggested by Linus. Signed-off-by: Miklos Szeredi Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman

dcache: use a dispose list in select_parent

2012-01-26T00:13:35Z

commit b48f03b319ba78f3abf9a7044d1f436d8d90f4f9 upstream. select_parent currently abuses the dentry cache LRU to provide cleanup features for child dentries that need to be freed. It moves them to the tail of the LRU, then tells shrink_dcache_parent() to calls __shrink_dcache_sb to unconditionally move them to a dispose list (as DCACHE_REFERENCED is ignored). __shrink_dcache_sb() has to relock the dentries to move them off the LRU onto the dispose list, but otherwise does not touch the dentries that select_parent() moved to the tail of the LRU. It then passses the dispose list to shrink_dentry_list() which tries to free the dentries. IOWs, the use of __shrink_dcache_sb() is superfluous - we can build exactly the same list of dentries for disposal directly in select_parent() and call shrink_dentry_list() instead of calling __shrink_dcache_sb() to do that. This means that we avoid long holds on the lru lock walking the LRU moving dentries to the dispose list We also avoid the need to relock each dentry just to move it off the LRU, reducing the numebr of times we lock each dentry to dispose of them in shrink_dcache_parent() from 3 to 2 times. Further, we remove one of the two callers of __shrink_dcache_sb(). This also means that __shrink_dcache_sb can be moved into back into prune_dcache_sb() and we no longer have to handle referenced dentries conditionally, simplifying the code. Signed-off-by: Dave Chinner Signed-off-by: Linus Torvalds Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman