linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2010-10-28	guard page for stacks that grow upwards	Luck, Tony
	commit 8ca3eb08097f6839b2206e2242db4179aee3cfb3 upstream. pa-risc and ia64 have stacks that grow upwards. Check that they do not run into other mappings. By making VM_GROWSUP 0x0 on architectures that do not ever use it, we can avoid some unpleasant #ifdefs in check_stack_guard_page(). Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-20	compat: Make compat_alloc_user_space() incorporate the access_ok()	H. Peter Anvin
	commit c41d68a513c71e35a14f66d71782d27a79a81ea6 upstream. compat_alloc_user_space() expects the caller to independently call access_ok() to verify the returned area. A missing call could introduce problems on some architectures. This patch incorporates the access_ok() check into compat_alloc_user_space() and also adds a sanity check on the length. The existing compat_alloc_user_space() implementations are renamed arch_compat_alloc_user_space() and are used as part of the implementation of the new global function. This patch assumes NULL will cause __get_user()/__put_user() to either fail or access userspace on all architectures. This should be followed by checking the return value of compat_access_user_space() for NULL in the callers, at which time the access_ok() in the callers can also be removed. Reported-by: Ben Hawkes <hawkes@sota.gen.nz> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: James Bottomley <jejb@parisc-linux.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	math-emu: correct test for downshifting fraction in _FP_FROM_INT()	Mikael Pettersson
	commit f8324e20f8289dffc646d64366332e05eaacab25 upstream. The kernel's math-emu code contains a macro _FP_FROM_INT() which is used to convert an integer to a raw normalized floating-point value. It does this basically in three steps: 1. Compute the exponent from the number of leading zero bits. 2. Downshift large fractions to put the MSB in the right position for normalized fractions. 3. Upshift small fractions to put the MSB in the right position. There is an boundary error in step 2, causing a fraction with its MSB exactly one bit above the normalized MSB position to not be downshifted. This results in a non-normalized raw float, which when packed becomes a massively inaccurate representation for that input. The impact of this depends on a number of arch-specific factors, but it is known to have broken emulation of FXTOD instructions on UltraSPARC III, which was originally reported as GCC bug 44631 <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44631>. Any arch which uses math-emu to emulate conversions from integers to same-size floats may be affected. The fix is simple: the exponent comparison used to determine if the fraction should be downshifted must be "<=" not "<". I'm sending a kernel module to test this as a reply to this message. There are also SPARC user-space test cases in the GCC bug entry. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	vfs: add NOFOLLOW flag to umount(2)	Miklos Szeredi
	commit db1f05bb85d7966b9176e293f3ceead1cb8b5d79 upstream. Add a new UMOUNT_NOFOLLOW flag to umount(2). This is needed to prevent symlink attacks in unprivileged unmounts (fuse, samba, ncpfs). Additionally, return -EINVAL if an unknown flag is used (and specify an explicitly unused flag: UMOUNT_UNUSED). This makes it possible for the caller to determine if a flag is supported or not. CC: Eugene Teo <eugene@redhat.com> CC: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	sctp: Fix skb_over_panic resulting from multiple invalid parameter errors ↵	Neil Horman
	(CVE-2010-1173) (v4) commit 5fa782c2f5ef6c2e4f04d3e228412c9b4a4c8809 upstream. Ok, version 4 Change Notes: 1) Minor cleanups, from Vlads notes Summary: Hey- Recently, it was reported to me that the kernel could oops in the following way: <5> kernel BUG at net/core/skbuff.c:91! <5> invalid operand: 0000 [#1] <5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U) vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5 ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod <5> CPU: 0 <5> EIP: 0060:[<c02bff27>] Not tainted VLI <5> EFLAGS: 00010216 (2.6.9-89.0.25.EL) <5> EIP is at skb_over_panic+0x1f/0x2d <5> eax: 0000002c ebx: c033f461 ecx: c0357d96 edx: c040fd44 <5> esi: c033f461 edi: df653280 ebp: 00000000 esp: c040fd40 <5> ds: 007b es: 007b ss: 0068 <5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0) <5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180 e0c2947d <5> 00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004 df653490 <5> 00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e 00000004 <5> Call Trace: <5> [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp] <5> [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp] <5> [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp] <5> [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp] <5> [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp] <5> [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp] <5> [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp] <5> [<c01555a4>] cache_grow+0x140/0x233 <5> [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp] <5> [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp] <5> [<e0c34600>] sctp_rcv+0x454/0x509 [sctp] <5> [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter] <5> [<c02d005e>] nf_iterate+0x40/0x81 <5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151 <5> [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151 <5> [<c02d0362>] nf_hook_slow+0x83/0xb5 <5> [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9 <5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151 <5> [<c02e103e>] ip_rcv+0x334/0x3b4 <5> [<c02c66fd>] netif_receive_skb+0x320/0x35b <5> [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd] <5> [<c02c67a4>] process_backlog+0x6c/0xd9 <5> [<c02c690f>] net_rx_action+0xfe/0x1f8 <5> [<c012a7b1>] __do_softirq+0x35/0x79 <5> [<c0107efb>] handle_IRQ_event+0x0/0x4f <5> [<c01094de>] do_softirq+0x46/0x4d Its an skb_over_panic BUG halt that results from processing an init chunk in which too many of its variable length parameters are in some way malformed. The problem is in sctp_process_unk_param: if (NULL == errp) errp = sctp_make_op_error_space(asoc, chunk, ntohs(chunk->chunk_hdr->length)); if (errp) { sctp_init_cause(errp, SCTP_ERROR_UNKNOWN_PARAM, WORD_ROUND(ntohs(param.p->length))); sctp_addto_chunk(errp, WORD_ROUND(ntohs(param.p->length)), param.v); When we allocate an error chunk, we assume that the worst case scenario requires that we have chunk_hdr->length data allocated, which would be correct nominally, given that we call sctp_addto_chunk for the violating parameter. Unfortunately, we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error chunk, so the worst case situation in which all parameters are in violation requires chunk_hdr->length+(sizeof(sctp_errhdr_t)param_count) bytes of data. The result of this error is that a deliberately malformed packet sent to a listening host can cause a remote DOS, described in CVE-2010-1173: http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173 I've tested the below fix and confirmed that it fixes the issue. We move to a strategy whereby we allocate a fixed size error chunk and ignore errors we don't have space to report. Tested by me successfully Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-26	nfsd: fix vm overcommit crash fix #2	Junjiro R. Okajima
	commit 1b79cd04fab80be61dcd2732e2423aafde9a4c1c upstream. The previous patch from Alan Cox ("nfsd: fix vm overcommit crash", commit 731572d39fcd3498702eda4600db4c43d51e0b26) fixed the problem where knfsd crashes on exported shmemfs objects and strict overcommit is set. But the patch forgot supporting the case when CONFIG_SECURITY is disabled. This patch copies a part of his fix which is mainly for detecting a bug earlier. Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Junjiro R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-26	nfsd: fix vm overcommit crash	Alan Cox
	commit 731572d39fcd3498702eda4600db4c43d51e0b26 upstream. Junjiro R. Okajima reported a problem where knfsd crashes if you are using it to export shmemfs objects and run strict overcommit. In this situation the current->mm based modifier to the overcommit goes through a NULL pointer. We could simply check for NULL and skip the modifier but we've caught other real bugs in the past from mm being NULL here - cases where we did need a valid mm set up (eg the exec bug about a year ago). To preserve the checks and get the logic we want shuffle the checking around and add a new helper to the vm_ security wrappers Also fix a current->mm reference in nommu that should use the passed mm [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix build] Reported-by: Junjiro R. Okajima <hooanon05@yahoo.co.jp> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-26	vfs: Remove the range_cont writeback mode.	Aneesh Kumar K.V
	commit 74baaaaec8b4f22e1ae279f5ecca4ff705b28912 upstream. Ext4 was the only user of range_cont writeback mode and ext4 switched to a different method. So remove the range_cont mode which is not used in the kernel. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> CC: linux-fsdevel@vger.kernel.org Signed-off-by: Jayson R. King <dev@jaysonking.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-26	percpu counter: clean up percpu_counter_sum_and_set()	Mingming Cao
	commit 1f7c14c62ce63805f9574664a6c6de3633d4a354 upstream. percpu_counter_sum_and_set() and percpu_counter_sum() is the same except the former updates the global counter after accounting. Since we are taking the fbc->lock to calculate the precise value of the counter in percpu_counter_sum() anyway, it should simply set fbc->count too, as the percpu_counter_sum_and_set() does. This patch merges these two interfaces into one. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jayson R. King <dev@jaysonking.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2010-04-01	KVM: VMX: Check cpl before emulating debug register access	Avi Kivity
	commit 0a79b009525b160081d75cef5dbf45817956acf2 upstream. Debug registers may only be accessed from cpl 0. Unfortunately, vmx will code to emulate the instruction even though it was issued from guest userspace, possibly leading to an unexpected trap later. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01	KVM: x86 emulator: limit instructions to 15 bytes	Avi Kivity
	commit eb3c79e64a70fb8f7473e30fa07e89c1ecc2c9bb upstream [ <cebbert@redhat.com>: backport to 2.6.27 ] While we are never normally passed an instruction that exceeds 15 bytes, smp games can cause us to attempt to interpret one, which will cause large latencies in non-preempt hosts. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01	sched: wakeup preempt when small overlap	Peter Zijlstra
	commit 15afe09bf496ae10c989e1a375a6b5da7bd3e16e upstream. Lin Ming reported a 10% OLTP regression against 2.6.27-rc4. The difference seems to come from different preemption agressiveness, which affects the cache footprint of the workload and its effective cache trashing. Aggresively preempt a task if its avg overlap is very small, this should avoid the task going to sleep and find it still running when we schedule back to it - saving a wakeup. Reported-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01	sched: fine-tune SD_SIBLING_INIT	Ingo Molnar
	commit 52c642f33b14bfa1b00ef2b68296effb34a573f3 upstream. fine-tune the HT sched-domains parameters as well. On a HT capable box, this increases lat_ctx performance from 23.87 usecs to 1.49 usecs: # before $ ./lat_ctx -s 0 2 "size=0k ovr=1.89 2 23.87 # after $ ./lat_ctx -s 0 2 "size=0k ovr=1.84 2 1.49 Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01	sched: fine-tune SD_MC_INIT	Mike Galbraith
	commit 14800984706bf6936bbec5187f736e928be5c218 upstream. Tune SD_MC_INIT the same way as SD_CPU_INIT: unset SD_BALANCE_NEWIDLE, and set SD_WAKE_BALANCE. This improves vmark by 5%: vmark 132102 125968 125497 messages/sec avg 127855.66 .984 vmark 139404 131719 131272 messages/sec avg 134131.66 1.033 Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01	x86: fix csum_ipv6_magic asm memory clobber	Samuel Thibault
	commit 392d814daf460a9564d29b2cebc51e1ea34e0504 upstream. Just like ip_fast_csum, the assembly snippet in csum_ipv6_magic needs a memory clobber, as it is only passed the address of the buffer, not a memory reference to the buffer itself. This caused failures in Hurd's pfinetv4 when we tried to compile it with gcc-4.3 (bogus checksums). Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-01	resource: add helpers for fetching rlimits	Jiri Slaby
	commit 3e10e716abf3c71bdb5d86b8f507f9e72236c9cd upstream. We want to be sure that compiler fetches the limit variable only once, so add helpers for fetching current and maximal resource limits which do that. Add them to sched.h (instead of resource.h) due to circular dependency sched.h->resource.h->task_struct Alternative would be to create a separate res_access.h or similar. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: James Morris <jmorris@namei.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06	ipv6: reassembly: use seperate reassembly queues for conntrack and local ↵	Patrick McHardy
	delivery commit 0b5ccb2ee250136dd7385b1c7da28417d0d4d32d upstream. Currently the same reassembly queue might be used for packets reassembled by conntrack in different positions in the stack (PREROUTING/LOCAL_OUT), as well as local delivery. This can cause "packet jumps" when the fragment completing a reassembled packet is queued from a different position in the stack than the previous ones. Add a "user" identifier to the reassembly queue key to seperate the queues of each caller, similar to what we do for IPv4. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18	signal: Fix alternate signal stack check	Sebastian Andrzej Siewior
	commit 2a855dd01bc1539111adb7233f587c5c468732ac upstream. All architectures in the kernel increment/decrement the stack pointer before storing values on the stack. On architectures which have the stack grow down sas_ss_sp == sp is not on the alternate signal stack while sas_ss_sp + sas_ss_size == sp is on the alternate signal stack. On architectures which have the stack grow up sas_ss_sp == sp is on the alternate signal stack while sas_ss_sp + sas_ss_size == sp is not on the alternate signal stack. The current implementation fails for architectures which have the stack grow down on the corner case where sas_ss_sp == sp.This was reported as Debian bug #544905 on AMD64. Simplified test case: http://download.breakpoint.cc/tc-sig-stack.c The test case creates the following stack scenario: 0xn0300 stack top 0xn0200 alt stack pointer top (when switching to alt stack) 0xn01ff alt stack end 0xn0100 alt stack start == stack pointer If the signal is sent the stack pointer is pointing to the base address of the alt stack and the kernel erroneously decides that it has already switched to the alternate stack because of the current check for "sp - sas_ss_sp < sas_ss_size" On parisc (stack grows up) the scenario would be: 0xn0200 stack pointer 0xn01ff alt stack end 0xn0100 alt stack start = alt stack pointer base (when switching to alt stack) 0xn0000 stack base This is handled correctly by the current implementation. [ tglx: Modified for archs which have the stack grow up (parisc) which would fail with the correct implementation for stack grows down. Added a check for sp >= current->sas_ss_sp which is strictly not necessary but makes the code symetric for both variants ] Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> LKML-Reference: <20091025143758.GA6653@Chamillionaire.breakpoint.cc> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-08	USB: usb-serial: replace shutdown with disconnect, release	Alan Stern
	This is commit f9c99bb8b3a1ec81af68d484a551307326c2e933 back-ported to 2.6.27.39. This patch (as1254-2) splits up the shutdown method of usb_serial_driver into a disconnect and a release method. The problem is that the usb-serial core was calling shutdown during disconnect handling, but drivers didn't expect it to be called until after all the open file references had been closed. The result was an oops when the close method tried to use memory that had been deallocated by shutdown. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Rory Filer <rfiler@SierraWireless.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09	printk: robustify printk	Peter Zijlstra
	commit b845b517b5e3706a3729f6ea83b88ab85f0725b0 upstream. Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd wakeup by polling from the timer tick. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09	irda: Add irda_skb_cb qdisc related padding	Samuel Ortiz
	commit 69c30e1e7492192f882a3fc11888b320fde5206a upstream. We need to pad irda_skb_cb in order to keep it safe accross dev_queue_xmit() calls. This is some ugly and temporary hack triggered by recent qisc code changes. Even though it fixes bugzilla.kernel.org bug #11795, it will be replaced by a proper fix before 2.6.29 is released. Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09	8250_pci: add IBM Saturn serial card	Benjamin Herrenschmidt
	commit c68d2b1594548cda7f6dbac6a4d9d30a9b01558c upstream. The IBM Saturn serial card has only one port. Without that fixup, the kernel thinks it has two, which confuses userland setup and admin tools as well. [akpm@linux-foundation.org: fix pci-ids.h layout] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Alan Cox <alan@linux.intel.com> Cc: Michael Reed <mreed10@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12	KVM: x86: Disallow hypercalls for guest callers in rings > 0 [CVE-2009-3290]	Jan Kiszka
	[ backport to 2.6.27 by Chuck Ebbert <cebbert@redhat.com> ] commit 07708c4af1346ab1521b26a202f438366b7bcffd upstream. So far unprivileged guest callers running in ring 3 can issue, e.g., MMU hypercalls. Normally, such callers cannot provide any hand-crafted MMU command structure as it has to be passed by its physical address, but they can still crash the guest kernel by passing random addresses. To close the hole, this patch considers hypercalls valid only if issued from guest ring 0. This may still be relaxed on a per-hypercall base in the future once required. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12	x86: Increase MIN_GAP to include randomized stack	Michal Hocko
	[ trivial backport to 2.6.27: Chuck Ebbert <cebbert@redhat.com> ] commit 80938332d8cf652f6b16e0788cf0ca136befe0b5 upstream. Currently we are not including randomized stack size when calculating mmap_base address in arch_pick_mmap_layout for topdown case. This might cause that mmap_base starts in the stack reserved area because stack is randomized by 1GB for 64b (8MB for 32b) and the minimum gap is 128MB. If the stack really grows down to mmap_base then we can get silent mmap region overwrite by the stack values. Let's include maximum stack randomization size into MIN_GAP which is used as the low bound for the gap in mmap. Signed-off-by: Michal Hocko <mhocko@suse.cz> LKML-Reference: <1252400515-6866-1-git-send-email-mhocko@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24	Short write in nfsd becomes a full write to the client	David Shaw
	commit 31dec2538e45e9fff2007ea1f4c6bae9f78db724 upstream. Short write in nfsd becomes a full write to the client If a filesystem being written to via NFS returns a short write count (as opposed to an error) to nfsd, nfsd treats that as a success for the entire write, rather than the short count that actually succeeded. For example, given a 8192 byte write, if the underlying filesystem only writes 4096 bytes, nfsd will ack back to the nfs client that all 8192 bytes were written. The nfs client does have retry logic for short writes, but this is never called as the client is told the complete write succeeded. There are probably other ways it could happen, but in my case it happened with a fuse (filesystem in userspace) filesystem which can rather easily have a partial write. Here is a patch to properly return the short write count to the client. Signed-off-by: David Shaw <dshaw@jabberwocky.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	SUNRPC: Fix tcp reconnection	Trond Myklebust
	This fixes a problem that was reported as Red Hat Bugzilla entry number 485339, in which rpciod starts looping on the TCP connection code, rendering the NFS client unusable for 1/2 minute or so. It is basically a backport of commit f75e6745aa3084124ae1434fd7629853bdaf6798 (SUNRPC: Fix the problem of EADDRNOTAVAIL syslog floods on reconnect) Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	parport: quickfix the proc registration bug	Alan Cox
	commit 05ad709d04799125ed85dd816fdb558258102172 upstream parport: quickfix the proc registration bug Ideally we should have a directory of drivers and a link to the 'active' driver. For now just show the first device which is effectively the existing semantics without a warning. This is an update on the original buggy patch that I then forgot to resubmit. Confusingly it was proposed by Red Hat, written by Etched Pixels fixed and submitted by Intel ... Resolves-Bug: http://bugzilla.kernel.org/show_bug.cgi?id=9749 Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Reduce stack usage in kvm_pv_mmu_op()	Dave Hansen
	(cherry picked from commit 6ad18fba05228fb1d47cdbc0339fe8b3fca1ca26) We're in a hot path. We can't use kmalloc() because it might impact performance. So, we just stick the buffer that we need into the kvm_vcpu_arch structure. This is used very often, so it is not really a waste. We also have to move the buffer structure's definition to the arch-specific x86 kvm header. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-16	NFS: Fix an O_DIRECT Oops...	Trond Myklebust
	commit 1ae88b2e446261c038f2c0c3150ffae142b227a2 upstream. We can't call nfs_readdata_release()/nfs_writedata_release() without first initialising and referencing args.context. Doing so inside nfs_direct_read_schedule_segment()/nfs_direct_write_schedule_segment() causes an Oops. We should rather be calling nfs_readdata_free()/nfs_writedata_free() in those cases. Looking at the O_DIRECT code, the "struct nfs_direct_req" is already referencing the nfs_open_context for us. Since the readdata and writedata structures carry a reference to that, we can simplify things by getting rid of the extra nfs_open_context references, so that we can replace all instances of nfs_readdata_release()/nfs_writedata_release(). Reported-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-16	parisc: ensure broadcast tlb purge runs single threaded	Helge Deller
	commit e82a3b75127188f20c7780bec580e148beb29da7 upstream parisc: ensure broadcast tlb purge runs single threaded The TLB flushing functions on hppa, which causes PxTLB broadcasts on the system bus, needs to be protected by irq-safe spinlocks to avoid irq handlers to deadlock the kernel. The deadlocks only happened during I/O intensive loads and triggered pretty seldom, which is why this bug went so long unnoticed. Signed-off-by: Helge Deller <deller@gmx.de> [edited to use spin_lock_irqsave on UP as well since we'd been locking there all this time anyway, --kyle] Signed-off-by: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-16	x86: fix assembly constraints in native_save_fl()	H. Peter Anvin
	commit f1f029c7bfbf4ee1918b90a431ab823bed812504 upstream. From Gabe Black in bugzilla 13888: native_save_fl is implemented as follows: 11static inline unsigned long native_save_fl(void) 12{ 13 unsigned long flags; 14 15 asm volatile("# __raw_save_flags\n\t" 16 "pushf ; pop %0" 17 : "=g" (flags) 18 : /* no input */ 19 : "memory"); 20 21 return flags; 22} If gcc chooses to put flags on the stack, for instance because this is inlined into a larger function with more register pressure, the offset of the flags variable from the stack pointer will change when the pushf is performed. gcc doesn't attempt to understand that fact, and address used for pop will still be the same. It will write to somewhere near flags on the stack but not actually into it and overwrite some other value. I saw this happen in the ide_device_add_all function when running in a simulator I work on. I'm assuming that some quirk of how the simulated hardware is set up caused the code path this is on to be executed when it normally wouldn't. A simple fix might be to change "=g" to "=r". Reported-by: Gabe Black <spamforgabe@umich.edu> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-30	x25: Fix sleep from timer on socket destroy.	David S. Miller
	[ Upstream commit 14ebaf81e13ce66bff275380b246796fd16cbfa1 ] If socket destuction gets delayed to a timer, we try to lock_sock() from that timer which won't work. Use bh_lock_sock() in that case. Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	security: use mmap_min_addr indepedently of security models	Christoph Lameter
	commit e0a94c2a63f2644826069044649669b5e7ca75d3 upstream. This patch removes the dependency of mmap_min_addr on CONFIG_SECURITY. It also sets a default mmap_min_addr of 4096. mmapping of addresses below 4096 will only be possible for processes with CAP_SYS_RAWIO. Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Eric Paris <eparis@redhat.com> Looks-ok-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-19	personality: fix PER_CLEAR_ON_SETID (CVE-2009-1895)	Julien Tinnes
	commit f9fabcb58a6d26d6efde842d1703ac7cfa9427b6 upstream. We have found that the current PER_CLEAR_ON_SETID mask on Linux doesn't include neither ADDR_COMPAT_LAYOUT, nor MMAP_PAGE_ZERO. The current mask is READ_IMPLIES_EXEC\|ADDR_NO_RANDOMIZE. We believe it is important to add MMAP_PAGE_ZERO, because by using this personality it is possible to have the first page mapped inside a process running as setuid root. This could be used in those scenarios: - Exploiting a NULL pointer dereference issue in a setuid root binary - Bypassing the mmap_min_addr restrictions of the Linux kernel: by running a setuid binary that would drop privileges before giving us control back (for instance by loading a user-supplied library), we could get the first page mapped in a process we control. By further using mremap and mprotect on this mapping, we can then completely bypass the mmap_min_addr restrictions. Less importantly, we believe ADDR_COMPAT_LAYOUT should also be added since on x86 32bits it will in practice disable most of the address space layout randomization (only the stack will remain randomized). Signed-off-by: Julien Tinnes <jt@cr0.org> Signed-off-by: Tavis Ormandy <taviso@sdf.lonestar.org> Acked-by: Christoph Hellwig <hch@infradead.org> Acked-by: Kees Cook <kees@ubuntu.com> Acked-by: Eugene Teo <eugene@redhat.com> [ Shortened lines and fixed whitespace as per Christophs' suggestion ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	IB/mlx4: Add strong ordering to local inval and fast reg work requests	Jack Morgenstein
	commit 2ac6bf4ddc87c3b6b609f8fa82f6ebbffeac12f4 upstream. The ConnectX Programmer's Reference Manual states that the "SO" bit must be set when posting Fast Register and Local Invalidate send work requests. When this bit is set, the work request will be executed only after all previous work requests on the send queue have been executed. (If the bit is not set, Fast Register and Local Invalidate WQEs may begin execution too early, which violates the defined semantics for these operations) This fixes the issue with NFS/RDMA reported in <http://lists.openfabrics.org/pipermail/general/2009-April/059253.html> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	firmware_map: fix hang with x86/32bit	Yinghai Lu
	commit 3b0fde0fac19c180317eb0601b3504083f4b9bf5 upstream. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13484 Peer reported: \| The bug is introduced from kernel 2.6.27, if E820 table reserve the memory \| above 4G in 32bit OS(BIOS-e820: 00000000fff80000 - 0000000120000000 \| (reserved)), system will report Int 6 error and hang up. The bug is caused by \| the following code in drivers/firmware/memmap.c, the resource_size_t is 32bit \| variable in 32bit OS, the BUG_ON() will be invoked to result in the Int 6 \| error. I try the latest 32bit Ubuntu and Fedora distributions, all hit this \| bug. \|====== \|static int firmware_map_add_entry(resource_size_t start, resource_size_t end, \| const char type, \| struct firmware_map_entry entry) and it only happen with CONFIG_PHYS_ADDR_T_64BIT is not set. it turns out we need to pass u64 instead of resource_size_t for that. [akpm@linux-foundation.org: add comment] Reported-and-tested-by: Peer Chen <pchen@nvidia.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	ocfs2: fix i_mutex locking in ocfs2_splice_to_file()	Miklos Szeredi
	commit 328eaaba4e41a04c1dc4679d65bea3fee4349d86 upstream. Rearrange locking of i_mutex on destination and call to ocfs2_rw_lock() so locks are only held while buffers are copied with the pipe_to_file() actor, and not while waiting for more data on the pipe. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	splice: split up __splice_from_pipe()	Miklos Szeredi
	commit b3c2d2ddd63944ef2a1e4a43077b602288107e01 upstream. Split up __splice_from_pipe() into four helper functions: splice_from_pipe_begin() splice_from_pipe_next() splice_from_pipe_feed() splice_from_pipe_end() splice_from_pipe_next() will wait (if necessary) for more buffers to be added to the pipe. splice_from_pipe_feed() will feed the buffers to the supplied actor and return when there's no more data available (or if all of the requested data has been copied). This is necessary so that implementations can do locking around the non-waiting splice_from_pipe_feed(). This patch should not cause any change in behavior. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-19	mm: page_mkwrite change prototype to match fault	Nick Piggin
	commit c2ec175c39f62949438354f603f4aa170846aabb upstream mm: page_mkwrite change prototype to match fault Change the page_mkwrite prototype to take a struct vm_fault, and return VM_FAULT_xxx flags. There should be no functional change. This makes it possible to return much more detailed error information to the VM (and also can provide more information eg. virtual_address to the driver, which might be important in some special cases). This is required for a subsequent fix. And will also make it easier to merge page_mkwrite() with fault() in future. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Felix Blyakher <felixb@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-08	drm/i915: add support for G41 chipset	Zhenyu Wang
	commit 72021788678523047161e97b3dfed695e802a5fd upstream. This had been delayed for some time due to failure to work on the one piece of G41 hardware we had, and lack of success reports from anybody else. Current hardware appears to be OK. Signed-off-by: Zhenyu Wang <zhenyu.z.wang@intel.com> [anholt: hand-applied due to conflicts with IGD patches] Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-02	PCI: fix incorrect mask of PM No_Soft_Reset bit	Yu Zhao
	commit 998dd7c719f62dcfa91d7bf7f4eb9c160e03d817 upstream. Reviewed-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Yu Zhao <yu.zhao@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-02	add some long-missing capabilities to fs_mask	Serge E. Hallyn
	upstream commit: 0ad30b8fd5fe798aae80df6344b415d8309342cc When POSIX capabilities were introduced during the 2.1 Linux cycle, the fs mask, which represents the capabilities which having fsuid==0 is supposed to grant, did not include CAP_MKNOD and CAP_LINUX_IMMUTABLE. However, before capabilities the privilege to call these did in fact depend upon fsuid==0. This patch introduces those capabilities into the fsmask, restoring the old behavior. See the thread starting at http://lkml.org/lkml/2009/3/11/157 for reference. Note that if this fix is deemed valid, then earlier kernel versions (2.4 and 2.2) ought to be fixed too. Changelog: [Mar 23] Actually delete old CAP_FS_SET definition... [Mar 20] Updated against J. Bruce Fields's patch Reported-by: Igor Zhbanov <izh1979@gmail.com> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: stable@kernel.org Cc: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-02	sched: do not count frozen tasks toward load	Nathan Lynch
	upstream commit: e3c8ca8336707062f3f7cb1cd7e6b3c753baccdd Freezing tasks via the cgroup freezer causes the load average to climb because the freezer's current implementation puts frozen tasks in uninterruptible sleep (D state). Some applications which perform job-scheduling functions consult the load average when making decisions. If a cgroup is frozen, the load average does not provide a useful measure of the system's utilization to such applications. This is especially inconvenient if the job scheduler employs the cgroup freezer as a mechanism for preempting low priority jobs. Contrast this with using SIGSTOP for the same purpose: the stopped tasks do not count toward system load. Change task_contributes_to_load() to return false if the task is frozen. This results in /proc/loadavg behavior that better meets users' expectations. Signed-off-by: Nathan Lynch <ntl@pobox.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Nigel Cunningham <nigel@tuxonice.net> Tested-by: Nigel Cunningham <nigel@tuxonice.net> Cc: containers@lists.linux-foundation.org Cc: linux-pm@lists.linux-foundation.org Cc: Matt Helsley <matthltc@us.ibm.com> LKML-Reference: <20090408194512.47a99b95@manatee.lan> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-23	Fix misreporting of #cores as #hyperthreads for Q9550	Joe Korty
	Fix misreporting of #cores for the Intel Quad Core Q9550. For the Q9550, in x86_64 mode, /proc/cpuinfo mistakenly reports the #cores present as the #hyperthreads present. i386 mode was not examined but is assumed to have the same problem. A backport of the following three 2.6.29-rc1 patches fixes the problem: 066941bd4eeb159307a5d7d795100d0887c00442: [PATCH] x86: unmask CPUID levels on Intel CPUs 99fb4d349db7e7dacb2099c5cc320a9e2d31c1ef: [PATCH] x86: unmask CPUID levels on Intel CPUs, fix bdf21a49bab28f0d9613e8d8724ef9c9168b61b9: [PATCH] x86: add MSR_IA32_MISC_ENABLE bits to <asm/msr-index.h> From the first patch: "If the CPUID limit bit in MSR_IA32_MISC_ENABLE is set, clear it to make all CPUID information available. This is required for some features to work, in particular XSAVE." Originally-Developed-by: H. Peter Anvin <hpa@linux.intel.com> Backported-by: Joe Korty <joe.korty@ccur.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Joe Korty <joe.korty@ccur.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-23	nfsd: nfsd should drop CAP_MKNOD for non-root	J. Bruce Fields
	commit 76a67ec6fb79ff3570dcb5342142c16098299911 upstream. Since creating a device node is normally an operation requiring special privilege, Igor Zhbanov points out that it is surprising (to say the least) that a client can, for example, create a device node on a filesystem exported with root_squash. So, make sure CAP_MKNOD is among the capabilities dropped when an nfsd thread handles a request from a non-root user. Reported-by: Igor Zhbanov <izh1979@gmail.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	MIPS: compat: Implement is_compat_task.	Ralf Baechle
	commit 4302e5d53b9166d45317e3ddf0a7a9dab3efd43b upstream. This is a build fix required after "x86-64: seccomp: fix 32/64 syscall hole" (commit 5b1017404aea6d2e552e991b3fd814d839e9cd67). MIPS doesn't have the issue that was fixed for x86-64 by that patch. This also doesn't solve the N32 issue which is that N32 seccomp processes will be treated as non-compat processes thus only have access to N64 syscalls. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate()	Jan Kara
	(cherry picked from commit 7f5aa215088b817add9c71914b83650bdd49f8a9) If we race with commit code setting i_transaction to NULL, we could possibly dereference it. Proper locking requires the journal pointer (to access journal->j_list_lock), which we don't have. So we have to change the prototype of the function so that filesystem passes us the journal pointer. Also add a more detailed comment about why the function jbd2_journal_begin_ordered_truncate() does what it does and how it should be used. Thanks to Dan Carpenter <error27@gmail.com> for pointing to the suspitious code. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Joel Becker <joel.becker@oracle.com> CC: linux-ext4@vger.kernel.org CC: mfasheh@suse.de CC: Dan Carpenter <error27@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	x86-64: seccomp: fix 32/64 syscall hole	Roland McGrath
	commit 5b1017404aea6d2e552e991b3fd814d839e9cd67 upstream. On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with ljmp, and then use the "syscall" instruction to make a 64-bit system call. A 64-bit process make a 32-bit system call with int $0x80. In both these cases under CONFIG_SECCOMP=y, secure_computing() will use the wrong system call number table. The fix is simple: test TS_COMPAT instead of TIF_IA32. Here is an example exploit: /* test case for seccomp circumvention on x86-64 There are two failure modes: compile with -m64 or compile with -m32. The -m64 case is the worst one, because it does "chmod 777 ." (could be any chmod call). The -m32 case demonstrates it was able to do stat(), which can glean information but not harm anything directly. A buggy kernel will let the test do something, print, and exit 1; a fixed kernel will make it exit with SIGKILL before it does anything. / #define _GNU_SOURCE #include <assert.h> #include <inttypes.h> #include <stdio.h> #include <linux/prctl.h> #include <sys/stat.h> #include <unistd.h> #include <asm/unistd.h> int main (int argc, char *argv) { char buf[100]; static const char dot[] = "."; long ret; unsigned st[24]; if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0) perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?"); #ifdef __x86_64__ assert ((uintptr_t) dot < (1UL << 32)); asm ("int $0x80 # %0 <- %1(%2 %3)" : "=a" (ret) : "0" (15), "b" (dot), "c" (0777)); ret = snprintf (buf, sizeof buf, "result %ld (check mode on .!)\n", ret); #elif defined __i386__ asm (".code32\n" "pushl %%cs\n" "pushl $2f\n" "ljmpl $0x33, $1f\n" ".code64\n" "1: syscall # %0 <- %1(%2 %3)\n" "lretl\n" ".code32\n" "2:" : "=a" (ret) : "0" (4), "D" (dot), "S" (&st)); if (ret == 0) ret = snprintf (buf, sizeof buf, "stat . -> st_uid=%u\n", st[7]); else ret = snprintf (buf, sizeof buf, "result %ld\n", ret); #else # error "not this one" #endif write (1, buf, ret); syscall (__NR_exit, 1); return 2; } Signed-off-by: Roland McGrath <roland@redhat.com> [ I don't know if anybody actually uses seccomp, but it's enabled in at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	jsm: additional device support	Adam Lackorzynski
	commit ffa7525c13eb3db0fd19a3e1cffe2ce6f561f5f3 upstream. I have a Digi Neo 8 PCI card (114f:00b1) Serial controller: Digi International Digi Neo 8 (rev 05) that works with the jsm driver after using the following patch. Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de> Cc: Scott H Kilau <Scott_Kilau@digi.com> Cc: Wendy Xiong <wendyx@us.ibm.com> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	8250: fix boot hang with serial console when using with Serial Over Lan port	Mauro Carvalho Chehab
	commit b6adea334c6c89d5e6c94f9196bbf3a279cb53bd upstream. Intel 8257x Ethernet boards have a feature called Serial Over Lan. This feature works by emulating a serial port, and it is detected by kernel as a normal 8250 port. However, this emulation is not perfect, as also noticed on changeset 7500b1f602aad75901774a67a687ee985d85893f. Before this patch, the kernel were trying to check if the serial TX is capable of work using IRQ's. This were done with a code similar this: serial_outp(up, UART_IER, UART_IER_THRI); lsr = serial_in(up, UART_LSR); iir = serial_in(up, UART_IIR); serial_outp(up, UART_IER, 0); if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT) up->bugs \|= UART_BUG_TXEN; This works fine for other 8250 ports, but, on 8250-emulated SoL port, the chip is a little lazy to down UART_IIR_NO_INT at UART_IIR register. Due to that, UART_BUG_TXEN is sometimes enabled. However, as TX IRQ keeps working, and the TX polling is now enabled, the driver miss-interprets the IRQ received later, hanging up the machine until a key is pressed at the serial console. This is the 6 version of this patch. Previous versions were trying to introduce a large enough delay between serial_outp and serial_in(up, UART_IIR), but not taking forever. However, the needed delay couldn't be safely determined. At the experimental tests, a delay of 1us solves most of the cases, but still hangs sometimes. Increasing the delay to 5us was better, but still doesn't solve. A very high delay of 50 ms seemed to work every time. However, poking around with delays and pray for it to be enough doesn't seem to be a good approach, even for a quirk. So, instead of playing with random large arbitrary delays, let's just disable UART_BUG_TXEN for all SoL ports. [akpm@linux-foundation.org: fix warnings] Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>