linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2009-06-11	myr10ge: again fix lro_gen_skb() alignment	Stanislaw Gruszka
	[ Upstream commit 636d2f68a0814d84de26c021b2c15e3b4ffa29de ] Add LRO alignment initially committed in 621544eb8c3beaa859c75850f816dd9b056a00a3 ("[LRO]: fix lro_gen_skb() alignment") and removed in 0dcffac1a329be69bab0ac604bf7283737108e68 ("myri10ge: add multislices support") during conversion to multi-slice. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	vlan/macvlan: fix NULL pointer dereferences in ethtool handlers	Patrick McHardy
	[ Upstream commit 7816a0a862d851d0b05710e7d94bfe390f3180e2 ] Check whether the underlying device provides a set of ethtool ops before checking for individual handlers to avoid NULL pointer dereferences. Reported-by: Art van Breemen <ard@telegraafnet.nl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	bonding: fix alb mode locking regression	Jay Vosburgh
	[ Upstream commit 815bcc2719c12b6f5b511706e2d19728e07f0b02 ] Fix locking issue in alb MAC address management; removed incorrect locking and replaced with correct locking. This bug was introduced in commit 059fe7a578fba5bbb0fdc0365bfcf6218fa25eb0 ("bonding: Convert locks to _bh, rework alb locking for new locking") Bug reported by Paul Smith <paul@mad-scientist.net>, who also tested the fix. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	sparc64: Reschedule KGDB capture to a software interrupt.	David S. Miller
	[ Upstream commit 42cc77c861e8e850e86252bb5b1e12e006261973 ] Otherwise it might interrupt switch_to() midstream and use half-cooked register window state. Reported-by: Chris Torek <chris.torek@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	sparc64: Fix lost interrupts on sun4u.	David S. Miller
	[ Upstream commit d0cac39e4ec8097e4c7099d291b1fdcc0fe56b58 ] Based upon a report by Meelis Roos. Sparc64 SBUS and PCI controllers use a combination of IMAP and ICLR registers to manage device interrupts. The IMAP register contains the "valid" enable bit as well as CPU targetting information. Whereas the ICLR register is written with zero at the end of handling an interrupt to reset the state machine for that interrupt to IDLE so it can be sent again. For PCI slot and SBUS slot devices we can have multiple interrupts sharing the same IMAP register. There are individual ICLR registers but only one IMAP register for managing those. We represent each shared case with individual virtual IRQs so the generic IRQ layer thinks there is only one user of the IRQ instance. In such shared IMAP cases this is wrong, so if there are multiple active users then a free_irq() call will prematurely turn off the interrupt by clearing the Valid bit in the IMAP register even though there are other active users. Fix this by simply doing nothing in sun4u_disable_irq() and checking IRQF_DISABLED during IRQ dispatch. This situation doesn't exist in the hypervisor sun4v cases, so I left those alone. Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	sparc64: Fix crash with /proc/iomem	Mikulas Patocka
	[ Upstream commit 192d7a4667c6d11d1a174ec4cad9a3c5d5f9043c ] When you compile kernel on Sparc64 with heap memory checking and type "cat /proc/iomem", you get a crash, because pointers in struct resource are uninitialized. Most code fills struct resource with zeros, so I assume that it is responsibility of the caller of request_resource to initialized it, not the responsibility of request_resource functuion. After 2.6.29 is out, there could be a check for uninitialized fields added to request_resource to avoid crashes like this. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	sparc64: Flush TLB before releasing pages.	David S. Miller
	[ Upstream commit 86ee79c3dbd48d7430fd81edc1da3516c9f6dabc ] tlb_flush_mmu() needs to flush pending TLB entries before processing the mmu_gather ->pages list. Noticed by Benjamin Herrenschmidt. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	sparc64: Fix MM refcount check in smp_flush_tlb_pending().	David S. Miller
	[ Upstream commit f9384d41c02408dd404aa64d66d0ef38adcf6479 ] As explained by Benjamin Herrenschmidt: > CPU 0 is running the context, task->mm == task->active_mm == your > context. The CPU is in userspace happily churning things. > > CPU 1 used to run it, not anymore, it's now running fancyfsd which > is a kernel thread, but current->active_mm still points to that > same context. > > Because there's only one "real" user, mm_users is 1 (but mm_count is > elevated, it's just that the presence on CPU 1 as active_mm has no > effect on mm_count(). > > At this point, fancyfsd decides to invalidate a mapping currently mapped > by that context, for example because a networked file has changed > remotely or something like that, using unmap_mapping_ranges(). > > So CPU 1 goes into the zapping code, which eventually ends up calling > flush_tlb_pending(). Your test will succeed, as current->active_mm is > indeed the target mm for the flush, and mm_users is indeed 1. So you > will -not- send an IPI to the other CPU, and CPU 0 will continue happily > accessing the pages that should have been unmapped. To fix this problem, check ->mm instead of ->active_mm, and this means: > So if you test current->mm, you effectively account for mm_users == 1, > so the only way the mm can be active on another processor is as a lazy > mm for a kernel thread. So your test should work properly as long > as you don't have a HW that will do speculative TLB reloads into the > TLB on that other CPU (and even if you do, you flush-on-switch-in should > get rid of any crap here). And therefore we should be OK. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	sparc: Fix bus type probing for ESP and LE devices.	David S. Miller
	If there is a dummy "espdma" or "ledma" parent device above ESP scsi or LE ethernet device nodes, we have to match the bus as SBUS. Otherwise the address and size cell counts are wrong and we don't calculate the final physical device resource values correctly at all. Commit 5280267c1dddb8d413595b87dc406624bb497946 ("sparc: Fix handling of LANCE and ESP parent nodes in of_device.c") was meant to fix this problem, but that only influences the inner loop of build_device_resources(). We need this logic to also kick in at the beginning of build_device_resources() as well, when we make the first attempt to determine the device's immediate parent bus type for 'reg' property element extraction. Based almost entirely upon a patch by Friedrich Oslage. Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	sparc64: Fix smp_callin() locking.	David S. Miller
	[ Upstream commit 8e255baa449df3049a8827a7f1f4f12b6921d0d1 ] Interrupts must be disabled when taking the IPI lock. Caught by lockdep. Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	TPM: get_event_name stack corruption	Eric Paris
	commit fbaa58696cef848de818768783ef185bd3f05158 upstream. get_event_name uses sprintf to fill a buffer declared on the stack. It fills the buffer 2 bytes at a time. What the code doesn't take into account is that sprintf(buf, "%02x", data) actually writes 3 bytes. 2 bytes for the data and then it nul terminates the string. Since we declare buf to be 40 characters long and then we write 40 bytes of data into buf sprintf is going to write 41 characters. The fix is to leave room in buf for the nul terminator. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	nfs: Fix NFS v4 client handling of MAY_EXEC in nfs_permission.	Frank Filz
	commit 7ee2cb7f32b299c2b06a31fde155457203e4b7dd upstream. The problem is that permission checking is skipped if atomic open is possible, but when exec opens a file, it just opens it O_READONLY which means EXEC permission will not be checked at that time. This problem is observed by the following sequence (executed as root): mount -t nfs4 server:/ /mnt4 echo "ls" >/mnt4/foo chmod 744 /mnt4/foo su guest -c "mnt4/foo" Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Eugene Teo <eugeneteo@kernel.sg> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	icom: fix rmmod crash	Breno Leitao
	commit 95caa0a9bdaf93607bd0cc8932f53112496f2f22 upstream. Actually the icom driver is crashing when is being removed because the driver is kfreeing the adapter structure before calling pci_release_regions(), which result in the following error: Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d33 Faulting instruction address: 0xc000000000246b80 Oops: Kernel access of bad area, sig: 11 [#1] .... [c000000012d436a0] [c0000000001002d0] .kfree+0x120/0x34c (unreliable) [c000000012d43730] [c000000000246d60] .pci_release_selected_regions+0x3c/0x68 [c000000012d437c0] [d000000002d54700] .icom_kref_release+0xf4/0x118 [icom] [c000000012d43850] [c000000000232e50] .kref_put+0x74/0x94 [c000000012d438d0] [d000000002d56c58] .icom_remove+0x40/0xa4 [icom] [c000000012d43960] [c000000000249e48] .pci_device_remove+0x50/0x90 [c000000012d439e0] [c0000000002d68d8] .__device_release_driver+0x94/0xd4 [c000000012d43a70] [c0000000002d7104] .driver_detach+0xf8/0x12c [c000000012d43b00] [c0000000002d549c] .bus_remove_driver+0xbc/0x11c [c000000012d43b90] [c0000000002d71dc] .driver_unregister+0x60/0x80 [c000000012d43c20] [c00000000024a07c] .pci_unregister_driver+0x44/0xe8 [c000000012d43cb0] [d000000002d56bf4] .icom_exit+0x1c/0x40 [icom] [c000000012d43d30] [c000000000095fa8] .SyS_delete_module+0x214/0x2a8 [c000000012d43e30] [c00000000000852c] syscall_exit+0x0/0x40 Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	Linux 2.6.27.24v2.6.27.24	Greg Kroah-Hartman

2009-05-19	ocfs2: fix i_mutex locking in ocfs2_splice_to_file()	Miklos Szeredi
	commit 328eaaba4e41a04c1dc4679d65bea3fee4349d86 upstream. Rearrange locking of i_mutex on destination and call to ocfs2_rw_lock() so locks are only held while buffers are copied with the pipe_to_file() actor, and not while waiting for more data on the pipe. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	splice: fix i_mutex locking in generic_splice_write()	Miklos Szeredi
	commit eb443e5a25d43996deb62b9bcee1a4ce5dea2ead upstream. Rearrange locking of i_mutex on destination so it's only held while buffers are copied with the pipe_to_file() actor, and not while waiting for more data on the pipe. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	splice: remove i_mutex locking in splice_from_pipe()	Miklos Szeredi
	commit 2933970b960223076d6affcf7a77e2bc546b8102 upstream. splice_from_pipe() is only called from two places: - generic_splice_sendpage() - splice_write_null() Neither of these require i_mutex to be taken on the destination inode. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	splice: split up __splice_from_pipe()	Miklos Szeredi
	commit b3c2d2ddd63944ef2a1e4a43077b602288107e01 upstream. Split up __splice_from_pipe() into four helper functions: splice_from_pipe_begin() splice_from_pipe_next() splice_from_pipe_feed() splice_from_pipe_end() splice_from_pipe_next() will wait (if necessary) for more buffers to be added to the pipe. splice_from_pipe_feed() will feed the buffers to the supplied actor and return when there's no more data available (or if all of the requested data has been copied). This is necessary so that implementations can do locking around the non-waiting splice_from_pipe_feed(). This patch should not cause any change in behavior. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	powerpc/5200: Don't specify IRQF_SHARED in PSC UART driver	Grant Likely
	commit d9f0c5f9bc74f16d0ea0f6c518b209e48783a796 upstream. The MPC5200 PSC device is wired up to a dedicated interrupt line which is never shared. This patch removes the IRQF_SHARED flag from the request_irq() call which eliminates the "IRQF_DISABLED is not guaranteed on shared IRQs" warning message from the console output. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Reviewed-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	ehea: fix invalid pointer access	Hannes Hering
	commit 0b2febf38a33d7c40fb7bb4a58c113a1fa33c412 upstream. This patch fixes an invalid pointer access in case the receive queue holds no pointer to the next skb when the queue is empty. Signed-off-by: Hannes Hering <hering2@de.ibm.com> Signed-off-by: Jan-Bernd Themann <themann@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	NFS: Fix the notifications when renaming onto an existing file	Trond Myklebust
	commit b1e4adf4ea41bb8b5a7bfc1a7001f137e65495df upstream. NFS appears to be returning an unnecessary "delete" notification when we're doing an atomic rename. See http://bugzilla.gnome.org/show_bug.cgi?id=575684 The fix is to get rid of the redundant call to d_delete(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	nfsd4: check for negative dentry before use in nfsv4 readdir	J. Bruce Fields
	commit b2c0cea6b1cb210e962f07047df602875564069e upstream. After 2f9092e1020246168b1309b35e085ecd7ff9ff72 "Fix i_mutex vs. readdir handling in nfsd" (and 14f7dd63 "Copy XFS readdir hack into nfsd code"), an entry may be removed between the first mutex_unlock and the second mutex_lock. In this case, lookup_one_len() will return a negative dentry. Check for this case to avoid a NULL dereference. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Reviewed-by: J. R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	epoll: fix size check in epoll_create()	Davide Libenzi
	commit bfe3891a5f5d3b78146a45f40e435d14f5ae39dd upstream. Fix a size check WRT the manual pages. This was inadvertently broken by commit 9fe5ad9c8cef9ad5873d8ee55d1cf00d9b607df0 ("flag parameters add-on: remove epoll_create size param"). Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: <Hiroyuki.Mach@gmail.com> Cc: rohit verma <rohit.170309@gmail.com> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	cifs: Fix unicode string area word alignment in session setup	Jeff Layton
	commit 27b87fe52baba0a55e9723030e76fce94fabcea4 refreshed. cifs: fix unicode string area word alignment in session setup The handling of unicode string area alignment is wrong. decode_unicode_ssetup improperly assumes that it will always be preceded by a pad byte. This isn't the case if the string area is already word-aligned. This problem, combined with the bad buffer sizing for the serverDomain string can cause memory corruption. The bad alignment can make it so that the alignment of the characters is off. This can make them translate to characters that are greater than 2 bytes each. If this happens we can overflow the allocation. Fix this by fixing the alignment in CIFS_SessSetup instead so we can verify it against the head of the response. Also, clean up the workaround for improperly terminated strings by checking for a odd-length unicode buffers and then forcibly terminating them. Finally, resize the buffer for serverDomain. Now that we've fixed the alignment, it's probably fine, but a malicious server could overflow it. A better solution for handling these strings is still needed, but this should be a suitable bandaid. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Cc: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	cifs: Fix buffer size in cifs_convertUCSpath	Suresh Jayaraman
	Relevant commits 7fabf0c9479fef9fdb9528a5fbdb1cb744a744a4 and f58841666bc22e827ca0dcef7b71c7bc2758ce82. The upstream commits adds cifs_from_ucs2 that includes functionality of cifs_convertUCSpath and does cleanup. Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Acked-by: Steve French <sfrench@us.ibm.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	cifs: Fix incorrect destination buffer size in cifs_strncpy_to_host	Suresh Jayaraman
	Relevant commits 968460ebd8006d55661dec0fb86712b40d71c413 and 066ce6899484d9026acd6ba3a8dbbedb33d7ae1b. Minimal hunks to fix buffer size and fix an existing problem pointed out by Guenter Kukuk that length of src is used for NULL termination of dst. cifs: Rename cifs_strncpy_to_host and fix buffer size There is a possibility for the path_name and node_name buffers to overflow if they contain charcters that are >2 bytes in the local charset. Resize the buffer allocation so to avoid this possibility. Also, as pointed out by Jeff Layton, it would be appropriate to rename the function to cifs_strlcpy_to_host to reflect the fact that the copied string is always NULL terminated. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	cifs: Increase size of tmp_buf in cifs_readdir to avoid potential overflows	Suresh Jayaraman
	Commit 7b0c8fcff47a885743125dd843db64af41af5a61 refreshed and use a #define from commit f58841666bc22e827ca0dcef7b71c7bc2758ce82. cifs: Increase size of tmp_buf in cifs_readdir to avoid potential overflows Increase size of tmp_buf to possible maximum to avoid potential overflows. Also moved UNICODE_NAME_MAX definition so that it can be used elsewhere. Pointed-out-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	cifs: Fix buffer size for tcon->nativeFileSystem field	Jeff Layton
	Commit f083def68f84b04fe3f97312498911afce79609e refreshed. cifs: fix buffer size for tcon->nativeFileSystem field The buffer for this was resized recently to fix a bug. It's still possible however that a malicious server could overflow this field by sending characters in it that are >2 bytes in the local charset. Double the size of the buffer to account for this possibility. Also get rid of some really strange and seemingly pointless NULL termination. It's NULL terminating the string in the source buffer, but by the time that happens, we've already copied the string. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Cc: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	NFS: Close page_mkwrite() races	Trond Myklebust
	commit 7fdf523067666b0eaff330f362401ee50ce187c4 upstream Follow up to Nick Piggin's patches to ensure that nfs_vm_page_mkwrite returns with the page lock held, and sets the VM_FAULT_LOCKED flag. See http://bugzilla.kernel.org/show_bug.cgi?id=12913 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	NFS: Fix the return value in nfs_page_mkwrite()	Trond Myklebust
	commit 2b2ec7554cf7ec5e4412f89a5af6abe8ce950700 upstream Commit c2ec175c39f62949438354f603f4aa170846aabb ("mm: page_mkwrite change prototype to match fault") exposed a bug in the NFS implementation of page_mkwrite. We should be returning 0 on success... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	GFS2: Fix page_mkwrite() return code	Steven Whitehouse
	commit e56985da455b9dc0591b8cb2006cc94b6f4fb0f4 upstream This allows for the possibility of returning VM_FAULT_OOM as well as VM_FAULT_SIGBUS. This ensures that the correct action is taken. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	mm: close page_mkwrite races	Nick Piggin
	commit b827e496c893de0c0f142abfaeb8730a2fd6b37f upstream mm: close page_mkwrite races Change page_mkwrite to allow implementations to return with the page locked, and also change it's callers (in page fault paths) to hold the lock until the page is marked dirty. This allows the filesystem to have full control of page dirtying events coming from the VM. Rather than simply hold the page locked over the page_mkwrite call, we call page_mkwrite with the page unlocked and allow callers to return with it locked, so filesystems can avoid LOR conditions with page lock. The problem with the current scheme is this: a filesystem that wants to associate some metadata with a page as long as the page is dirty, will perform this manipulation in its ->page_mkwrite. It currently then must return with the page unlocked and may not hold any other locks (according to existing page_mkwrite convention). In this window, the VM could write out the page, clearing page-dirty. The filesystem has no good way to detect that a dirty pte is about to be attached, so it will happily write out the page, at which point, the filesystem may manipulate the metadata to reflect that the page is no longer dirty. It is not always possible to perform the required metadata manipulation in ->set_page_dirty, because that function cannot block or fail. The filesystem may need to allocate some data structure, for example. And the VM cannot mark the pte dirty before page_mkwrite, because page_mkwrite is allowed to fail, so we must not allow any window where the page could be written to if page_mkwrite does fail. This solution of holding the page locked over the 3 critical operations (page_mkwrite, setting the pte dirty, and finally setting the page dirty) closes out races nicely, preventing page cleaning for writeout being initiated in that window. This provides the filesystem with a strong synchronisation against the VM here. - Sage needs this race closed for ceph filesystem. - Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913). - I need it for fsblock. - I suspect other filesystems may need it too (eg. btrfs). - I have converted buffer.c to the new locking. Even simple block allocation under dirty pages might be susceptible to i_size changing under partial page at the end of file (we also have a buffer.c-side problem here, but it cannot be fixed properly without this patch). - Other filesystems (eg. NFS, maybe btrfs) will need to change their page_mkwrite functions themselves. [ This also moves page_mkwrite another step closer to fault, which should eventually allow page_mkwrite to be moved into ->fault, and thus avoiding a filesystem calldown and page lock/unlock cycle in __do_fault. ] [akpm@linux-foundation.org: fix derefs of NULL ->mapping] Cc: Sage Weil <sage@newdream.net> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	fs: fix page_mkwrite error cases in core code and btrfs	Nick Piggin
	commit 56a76f8275c379ed73c8a43cfa1dfa2f5e9cfa19 upstream fs: fix page_mkwrite error cases in core code and btrfs page_mkwrite is called with neither the page lock nor the ptl held. This means a page can be concurrently truncated or invalidated out from underneath it. Callers are supposed to prevent truncate races themselves, however previously the only thing they can do in case they hit one is to raise a SIGBUS. A sigbus is wrong for the case that the page has been invalidated or truncated within i_size (eg. hole punched). Callers may also have to perform memory allocations in this path, where again, SIGBUS would be wrong. The previous patch ("mm: page_mkwrite change prototype to match fault") made it possible to properly specify errors. Convert the generic buffer.c code and btrfs to return sane error values (in the case of page removed from pagecache, VM_FAULT_NOPAGE will cause the fault handler to exit without doing anything, and the fault will be retried properly). This fixes core code, and converts btrfs as a template/example. All other filesystems defining their own page_mkwrite should be fixed in a similar manner. Acked-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	mm: page_mkwrite change prototype to match fault	Nick Piggin
	commit c2ec175c39f62949438354f603f4aa170846aabb upstream mm: page_mkwrite change prototype to match fault Change the page_mkwrite prototype to take a struct vm_fault, and return VM_FAULT_xxx flags. There should be no functional change. This makes it possible to return much more detailed error information to the VM (and also can provide more information eg. virtual_address to the driver, which might be important in some special cases). This is required for a subsequent fix. And will also make it easier to merge page_mkwrite() with fault() in future. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Felix Blyakher <felixb@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	i2c-algo-pca: Let PCA9564 recover from unacked data byte (state 0x30)	Enrik Berkhan
	commit 2196d1cf4afab93fb64c2e5b417096e49b661612 upstream Currently, the i2c-algo-pca driver does nothing if the chip enters state 0x30 (Data byte in I2CDAT has been transmitted; NOT ACK has been received). Thus, the i2c bus connected to the controller gets stuck afterwards. I have seen this kind of error on a custom board in certain load situations most probably caused by interference or noise. A possible reaction is to let the controller generate a STOP condition. This is documented in the PCA9564 data sheet (2006-09-01) and the same is done for other NACK states as well. Further, state 0x38 isn't handled completely, either. Try to do another START in this case like the data sheet says. As this couldn't be tested, I've added a comment to try to reset the chip if the START doesn't help as suggested by Wolfram Sang. Signed-off-by: Enrik Berkhan <Enrik.Berkhan@ge.com> Reviewed-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	i2c-algo-bit: Fix timeout test	Dave Airlie
	commit 0cdba07bb23cdd3e0d64357ec3d983e6b75e541f upstream When fetching DDC using i2c algo bit, we were often seeing timeouts before getting valid EDID on a retry. The VESA spec states 2ms is the DDC timeout, so when this translates into 1 jiffie and we are close to the end of the time period, it could return with a timeout less than 2ms. Change this code to use time_after instead of time_after_eq. Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	dup2: Fix return value with oldfd == newfd and invalid fd	Jeff Mahoney
	commit 2b79bc4f7ebbd5af3c8b867968f9f15602d5f802 upstream. The return value of dup2 when oldfd == newfd and the fd isn't valid is not getting properly sign extended. We end up with 4294967287 instead of -EBADF. I've reproduced this on SLE11 (2.6.27.21), openSUSE Factory (2.6.29-rc5), and Ubuntu 9.04 (2.6.28). This patch uses a signed int for the error value so it is properly extended. Commit 6c5d0512a091480c9f981162227fdb1c9d70e555 introduced this regression. Reported-by: Jiri Dluhos <jdluhos@novell.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	USB: Gadget: fix UTF conversion in the usbstring library	Alan Stern
	commit 0f43158caddcbb110916212ebe4e39993ae70864 upstream. This patch (as1234) fixes a bug in the UTF8 -> UTF-16 conversion routine in the gadget/usbstring library. In a UTF-8 multi-byte sequence, all bytes after the first should have their high-order two bits set to 10, not 11. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	md: remove ability to explicit set an inactive array to 'clean'.	NeilBrown
	commit 5bf295975416f8e97117bbbcfb0191c00bc3e2b4 upstream. Being able to write 'clean' to an 'array_state' of an inactive array to activate it in 'clean' mode is both unnecessary and inconvenient. It is unnecessary because the same can be achieved by writing 'active'. This activates and array, but it still remains 'clean' until the first write. It is inconvenient because writing 'clean' is more often used to cause an 'active' array to revert to 'clean' mode (thus blocking any writes until a 'write-pending' is promoted to 'active'). Allowing 'clean' to both activate an array and mark an active array as clean can lead to races: One program writes 'clean' to mark the active array as clean at the same time as another program writes 'inactive' to deactivate (stop) and active array. Depending on which writes first, the array could be deactivated and immediately reactivated which isn't what was desired. So just disable the use of 'clean' to activate an array. This avoids a race that can be triggered with mdadm-3.0 and external metadata, so it suitable for -stable. Reported-by: Rafal Marszewski <rafal.marszewski@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	md/raid10: don't clear bitmap during recovery if array will still be degraded.	NeilBrown
	commit 18055569127253755d01733f6ecc004ed02f88d0 upstream. If we have a raid10 with multiple missing devices, and we recover just one of these to a spare, then we risk (depending on the bitmap and array chunk size) clearing bits of the bitmap for which recovery isn't complete (because a device is still missing). This can lead to a subsequent "re-add" being recovered without any IO happening, which would result in loss of data. This patch takes the safe approach of not clearing bitmap bits if the array will still be degraded. This patch is suitable for all active -stable kernels. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	md: fix some (more) errors with bitmaps on devices larger than 2TB.	NeilBrown
	commit db305e507d554430a69ede901a6308e6ecb72349 upstream. If a write intent bitmap covers more than 2TB, we sometimes work with values beyond 32bit, so these need to be sector_t. This patches add the required casts to some unsigned longs that are being shifted up. This will affect any raid10 larger than 2TB, or any raid1/4/5/6 with member devices that are larger than 2TB. Signed-off-by: NeilBrown <neilb@suse.de> Reported-by: "Mario 'BitKoenig' Holbe" <Mario.Holbe@TU-Ilmenau.DE> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	md: fix loading of out-of-date bitmap.	NeilBrown
	commit b74fd2826c5acce20e6f691437b2d19372bc2057 upstream. When md is loading a bitmap which it knows is out of date, it fills each page with 1s and writes it back out again. However the write_page call makes used of bitmap->file_pages and bitmap->last_page_size which haven't been set correctly yet. So this can sometimes fail. Move the setting of file_pages and last_page_size to before the call to write_page. This bug can cause the assembly on an array to fail, thus making the data inaccessible. Hence I think it is a suitable candidate for -stable. Reported-by: Vojtech Pavlik <vojtech@suse.cz> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-08	Linux 2.6.27.23v2.6.27.23	Greg Kroah-Hartman

2009-05-08	rndis_wlan: fix initialization order for workqueue&workers	Jussi Kivilinna
	commit e805e4d0b53506dff4255a2792483f094e7fcd2c upstream. rndis_wext_link_change() might be called from rndis_command() at initialization stage and priv->workqueue/priv->work have not been initialized yet. This causes invalid opcode at rndis_wext_bind on some brands of bcm4320. Fix by initializing workqueue/workers in rndis_wext_bind() before rndis_command is used. This bug has existed since 2.6.25, reported at: http://bugzilla.kernel.org/show_bug.cgi?id=12794 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-08	proc: avoid information leaks to non-privileged processes	Jake Edge
	commit f83ce3e6b02d5e48b3a43b001390e2b58820389d upstream. By using the same test as is used for /proc/pid/maps and /proc/pid/smaps, only allow processes that can ptrace() a given process to see information that might be used to bypass address space layout randomization (ASLR). These include eip, esp, wchan, and start_stack in /proc/pid/stat as well as the non-symbolic output from /proc/pid/wchan. ASLR can be bypassed by sampling eip as shown by the proof-of-concept code at http://code.google.com/p/fuzzyaslr/ As part of a presentation (http://www.cr0.org/paper/to-jt-linux-alsr-leak.pdf) esp and wchan were also noted as possibly usable information leaks as well. The start_stack address also leaks potentially useful information. Cc: Stable Team <stable@kernel.org> Signed-off-by: Jake Edge <jake@lwn.net> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-08	mv643xx_eth: 64bit mib counter read fix	Lennert Buytenhek
	commit 93af7aca44f0e82e67bda10a0fb73d383edcc8bd upstream. On several mv643xx_eth hardware versions, the two 64bit mib counters for 'good octets received' and 'good octets sent' are actually 32bit counters, and reading from the upper half of the register has the same effect as reading from the lower half of the register: an atomic read-and-clear of the entire 32bit counter value. This can under heavy traffic occasionally lead to small numbers being added to the upper half of the 64bit mib counter even though no 32bit wrap has occured. Since we poll the mib counters at least every 30 seconds anyway, we might as well just skip the reads of the upper halves of the hardware counters without breaking the stats, which this patch does. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-08	Ignore madvise(MADV_WILLNEED) for hugetlbfs-backed regions	Mel Gorman
	commit a425a638c858fd10370b573bde81df3ba500e271 upstream. madvise(MADV_WILLNEED) forces page cache readahead on a range of memory backed by a file. The assumption is made that the page required is order-0 and "normal" page cache. On hugetlbfs, this assumption is not true and order-0 pages are allocated and inserted into the hugetlbfs page cache. This leaks hugetlbfs page reservations and can cause BUGs to trigger related to corrupted page tables. This patch causes MADV_WILLNEED to be ignored for hugetlbfs-backed regions. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-08	clockevents: prevent endless loop in tick_handle_periodic()	john stultz
	commit 74a03b69d1b5ce00a568e142ca97e76b7f5239c6 upstream. tick_handle_periodic() can lock up hard when a one shot clock event device is used in combination with jiffies clocksource. Avoid an endless loop issue by requiring that a highres valid clocksource be installed before we call tick_periodic() in a loop when using ONESHOT mode. The result is we will only increment jiffies once per interrupt until a continuous hardware clocksource is available. Without this, we can run into a endless loop, where each cycle through the loop, jiffies is updated which increments time by tick_period or more (due to clock steering), which can cause the event programming to think the next event was before the newly incremented time and fail causing tick_periodic() to be called again and the whole process loops forever. [ Impact: prevent hard lock up ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-08	USB: serial: fix lifetime and locking problems	Alan Stern
	This is commit 2d93148ab6988cad872e65d694c95e8944e1b626 back-ported to 2.6.27. This patch (as1229-1) fixes a few lifetime and locking problems in the usb-serial driver. The main symptom is that an invalid kevent is created when the serial device is unplugged while a connection is active. Ports should be unregistered when device is disconnected, not when the parent usb_serial structure is deallocated. Each open file should hold a reference to the corresponding port structure, and the reference should be released when the file is closed. serial->disc_mutex should be acquired in serial_open(), to resolve the classic race between open and disconnect. serial_close() doesn't need to hold both serial->disc_mutex and port->mutex at the same time. Release the subdriver's module reference only after releasing all the other references, in case one of the release routines needs to invoke some code in the subdriver module. Replace a call to flush_scheduled_work() (which is prone to deadlocks) with cancel_work_sync(). Also, add a call to cancel_work_sync() in the disconnect routine. Reduce the scope of serial->disc_mutex in serial_disconnect(). The only place it really needs to protect is where the "disconnected" flag is set. Call the shutdown method from within serial_disconnect() instead of destroy_serial(), because some subdrivers expect the port data structures still to be in existence when their shutdown method runs. This fixes the bug reported in http://bugs.freedesktop.org/show_bug.cgi?id=20703 Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-08	MIPS: CVE-2009-0029: Enable syscall wrappers	dann frazier
	Backport of upstream commits by: Ralf Baechle <ralf@linux-mips.org> Xiaotian Feng <Xiaotian.Feng@windriver.com> upstream commits: dbda6ac0897603f6c6dfadbbc37f9882177ec7ac d6c178e9694e7e0c7ffe0289cf4389a498cac735 c189846ecf900cd6b3ad7d3cef5b45a746ce646b Signed-off-by: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>