linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2010-08-02	net/core: neighbour update Oops	Doug Kehn
	commit 91a72a70594e5212c97705ca6a694bd307f7a26b upstream. When configuring DMVPN (GRE + openNHRP) and a GRE remote address is configured a kernel Oops is observed. The obserseved Oops is caused by a NULL header_ops pointer (neigh->dev->header_ops) in neigh_update_hhs() when void (update)(struct hh_cache, const struct net_device, const unsigned char ) = neigh->dev->header_ops->cache_update; is executed. The dev associated with the NULL header_ops is the GRE interface. This patch guards against the possibility that header_ops is NULL. This Oops was first observed in kernel version 2.6.26.8. Signed-off-by: Doug Kehn <rdkehn@yahoo.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	tcp: fix crash in tcp_xmit_retransmit_queue	Ilpo Järvinen
	commit 45e77d314585869dfe43c82679f7e08c9b35b898 upstream. It can happen that there are no packets in queue while calling tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns NULL and that gets deref'ed to get sacked into a local var. There is no work to do if no packets are outstanding so we just exit early. This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out guard to make joining diff nicer). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de> Tested-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	net: fix problem in reading sock TX queue	Tom Herbert
	commit b0f77d0eae0c58a5a9691a067ada112ceeae2d00 upstream. Fix problem in reading the tx_queue recorded in a socket. In dev_pick_tx, the TX queue is read by doing a check with sk_tx_queue_recorded on the socket, followed by a sk_tx_queue_get. The problem is that there is not mutual exclusion across these calls in the socket so it it is possible that the queue in the sock can be invalidated after sk_tx_queue_recorded is called so that sk_tx_queue get returns -1, which sets 65535 in queue_index and thus dev_pick_tx returns 65536 which is a bogus queue and can cause crash in dev_queue_xmit. We fix this by only calling sk_tx_queue_get which does the proper checks. The interface is that sk_tx_queue_get returns the TX queue if the sock argument is non-NULL and TX queue is recorded, else it returns -1. sk_tx_queue_recorded is no longer used so it can be completely removed. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	sky2: enable rx/tx in sky2_phy_reinit()	Brandon Philips
	commit 38000a94a902e94ca8b5498f7871c6316de8957a upstream. sky2_phy_reinit is called by the ethtool helpers sky2_set_settings, sky2_nway_reset and sky2_set_pauseparam when netif_running. However, at the end of sky2_phy_init GM_GP_CTRL has GM_GPCR_RX_ENA and GM_GPCR_TX_ENA cleared. So, doing these commands causes the device to stop working: $ ethtool -r eth0 $ ethtool -A eth0 autoneg off Fix this issue by enabling Rx/Tx after running sky2_phy_init in sky2_phy_reinit. Signed-off-by: Brandon Philips <bphilips@suse.de> Tested-by: Brandon Philips <bphilips@suse.de> Cc: stable@kernel.org Tested-by: Mike McCormack <mikem@ring3k.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	cpmac: do not leak struct net_device on phy_connect errors	Florian Fainelli
	commit ed770f01360b392564650bf1553ce723fa46afec upstream. If the call to phy_connect fails, we will return directly instead of freeing the previously allocated struct net_device. Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	ALSA: hda - Add Macbook 5,2 quirk	Luke Yelavich
	commit 3bfea98ff73d377ffce0d4c7f938b7ef958cdb35 upstream. BugLink: https://bugs.launchpad.net/bugs/463178 Set Macbook 5,2 (106b:4a00) hardware to use ALC885_MB5 Signed-off-by: Luke Yelavich <luke.yelavich@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	CIFS: Fix a malicious redirect problem in the DNS lookup code	David Howells
	commit 4c0c03ca54f72fdd5912516ad0a23ec5cf01bda7 upstream. Fix the security problem in the CIFS filesystem DNS lookup code in which a malicious redirect could be installed by a random user by simply adding a result record into one of their keyrings with add_key() and then invoking a CIFS CFS lookup [CVE-2010-2524]. This is done by creating an internal keyring specifically for the caching of DNS lookups. To enforce the use of this keyring, the module init routine creates a set of override credentials with the keyring installed as the thread keyring and instructs request_key() to only install lookup result keys in that keyring. The override is then applied around the call to request_key(). This has some additional benefits when a kernel service uses this module to request a key: (1) The result keys are owned by root, not the user that caused the lookup. (2) The result keys don't pop up in the user's keyrings. (3) The result keys don't come out of the quota of the user that caused the lookup. The keyring can be viewed as root by doing cat /proc/keys: 2a0ca6c3 I----- 1 perm 1f030000 0 0 keyring .dns_resolver: 1/4 It can then be listed with 'keyctl list' by root. # keyctl list 0x2a0ca6c3 1 key in keyring: 726766307: --alswrv 0 0 dns_resolver: foo.bar.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com> Acked-by: Steve French <smfrench@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	cifs: don't attempt busy-file rename unless it's in same directory	Jeff Layton
	commit ed0e3ace576d297a5c7015401db1060bbf677b94 upstream. Busy-file renames don't actually work across directories, so we need to limit this code to renames within the same dir. This fixes the bug detailed here: https://bugzilla.redhat.com/show_bug.cgi?id=591938 Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	cifs: remove bogus first_time check in NTLMv2 session setup code	Jeff Layton
	commit 8a224d489454b7457105848610cfebebdec5638d upstream. This bug appears to be the result of a cut-and-paste mistake from the NTLMv1 code. The function to generate the MAC key was commented out, but not the conditional above it. The conditional then ended up causing the session setup key not to be copied to the buffer unless this was the first session on the socket, and that made all but the first NTLMv2 session setup fail. Fix this by removing the conditional and all of the commented clutter that made it difficult to see. Reported-by: Gunther Deschner <gdeschne@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	hwmon: (it87) Fix in7 on IT8720F	Jean Delvare
	commit 436cad2a41a40c6c32bd9152b63d17eeb1f7c99b upstream. The IT8720F has no VIN7 pin, so VCCH should always be routed internally to VIN7 with an internal divider. Curiously, there still is a configuration bit to control this, which means it can be set incorrectly. And even more curiously, many boards out there are improperly configured, even though the IT8720F datasheet claims that the internal routing of VCCH to VIN7 is the default setting. So we force the internal routing in this case. It turns out that all boards with the wrong setting are from Gigabyte, so I suspect a BIOS bug. But it's easy enough to workaround in the driver, so let's do it. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Jean-Marc Spaggiari <jean-marc@spaggiari.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	hwmon: (coretemp) Skip duplicate CPU entries	Jean Delvare
	commit d883b9f0977269d519469da72faec6a7f72cb489 upstream. On hyper-threaded CPUs, each core appears twice in the CPU list. Skip the second entry to avoid duplicate sensors. Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Huaxu Wan <huaxu.wan@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	hwmon: (coretemp) Properly label the sensors	Jean Delvare
	commit 3f4f09b4be35d38d6e2bf22c989443e65e70fc4c upstream. Don't assume that CPU entry number and core ID always match. It worked in the simple cases (single CPU, no HT) but fails on multi-CPU systems. Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Huaxu Wan <huaxu.wan@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	hwmon: (k10temp) Do not blacklist known working CPU models	Jean Delvare
	commit eefc2d9e3d4f8820f2c128a0e44a23de28b1ed64 upstream. When detecting AM2+ or AM3 socket with DDR2, only blacklist cores which are known to exist in AM2+ format. Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Clemens Ladisch <clemens@ladisch.de> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	hwmon: (k8temp) Fix temperature reporting for ASB1 processor revisions	Andreas Herrmann
	commit d535bad90dad4eb42ec6528043fcfb53627d4f89 upstream. Reported temperature for ASB1 CPUs is too high. Add ASB1 CPU revisions (these are also non-desktop variants) to the list of CPUs for which the temperature fixup is not required. Example: (from LENOVO ThinkPad Edge 13, 01972NG, system was idle) Current kernel reports $ sensors k8temp-pci-00c3 Adapter: PCI adapter Core0 Temp: +74.0 C Core0 Temp: +70.0 C Core1 Temp: +69.0 C Core1 Temp: +70.0 C With this patch I have $ sensors k8temp-pci-00c3 Adapter: PCI adapter Core0 Temp: +54.0 C Core0 Temp: +51.0 C Core1 Temp: +48.0 C Core1 Temp: +49.0 C Cc: Rudolf Marek <r.marek@assembler.cz> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	hwmon: (k8temp) Bypass core swapping on single-core processors	Jean Delvare
	commit cd4de21f7e65a8cd04860f5661b3c18648ee52a1 upstream. Commit a2e066bba2aad6583e3ff648bf28339d6c9f0898 introduced core swapping for CPU models 64 and later. I recently had a report about a Sempron 3200+, model 95, for which this patch broke temperature reading. It happens that this is a single-core processor, so the effect of the swapping was to read a temperature value for a core that didn't exist, leading to an incorrect value (-49 degrees C.) Disabling core swapping on singe-core processors should fix this. Additional comment from Andreas: The BKDG says Thermal Sensor Core Select (ThermSenseCoreSel)-Bit 2. This bit selects the CPU whose temperature is reported in the CurTemp field. This bit only applies to dual core processors. For single core processors CPU0 Thermal Sensor is always selected. k8temp_probe() correctly detected that SEL_CORE can't be used on single core CPU. Thus k8temp did never update the temperature values stored in temp[1][x] and -49 degrees was reported. For single core CPUs we must use the values read into temp[0][x]. Signed-off-by: Jean Delvare <khali@linux-fr.org> Tested-by: Rick Moritz <rhavin@gmx.net> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	ssb: Handle Netbook devices where the SPROM address is changed	Christoph Fritz
	For some Netbook computers with Broadcom BCM4312 wireless interfaces, the SPROM has been moved to a new location. When the ssb driver tries to read the old location, the systems hangs when trying to read a non-existent location. Such freezes are particularly bad as they do not log the failure. This patch is modified from commit da1fdb02d9200ff28b6f3a380d21930335fe5429 with some pieces from other mainline changes so that it can be applied to stable 2.6.34.Y. Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	veth: Dont kfree_skb() after dev_forward_skb()	Eric Dumazet
	[ Upstream commit 6ec82562ffc6f297d0de36d65776cff8e5704867 ] In case of congestion, netif_rx() frees the skb, so we must assume dev_forward_skb() also consume skb. Bug introduced by commit 445409602c092 (veth: move loopback logic to common location) We must change dev_forward_skb() to always consume skb, and veth to not double free it. Bug report : http://marc.info/?l=linux-netdev&m=127310770900442&w=3 Reported-by: Martín Ferrari <martin.ferrari@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	tcp: use correct net ns in cookie_v4_check()	Eric Dumazet
	[ Upstream commit c44649216522cd607a4027d2ebf4a8147d3fa94c ] Its better to make a route lookup in appropriate namespace. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	tcp: tcp_synack_options() fix	Eric Dumazet
	[ Upstream commit de213e5eedecdfb1b1eea7e6be28bc64cac5c078 ] Commit 33ad798c924b4a (tcp: options clean up) introduced a problem if MD5+SACK+timestamps were used in initial SYN message. Some stacks (old linux for example) try to negotiate MD5+SACK+TSTAMP sessions, but since 40 bytes of tcp options space are not enough to store all the bits needed, we chose to disable timestamps in this case. We send a SYN-ACK _without_ timestamp option, but socket has timestamps enabled and all further outgoing messages contain a TS block, all with the initial timestamp of the remote peer. Fix is to really disable timestamps option for the whole session. Reported-by: Bijay Singh <Bijay.Singh@guavus.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	r8169: fix mdio_read and update mdio_write according to hw specs	Timo Teräs
	[ Upstream commit 81a95f049962ec20a9aed888e676208b206f0f2e ] Realtek confirmed that a 20us delay is needed after mdio_read and mdio_write operations. Reduce the delay in mdio_write, and add it to mdio_read too. Also add a comment that the 20us is from hw specs. Signed-off-by: Timo Teräs <timo.teras@iki.fi> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	r8169: fix random mdio_write failures	Timo Teräs
	[ Upstream commit 024a07bacf8287a6ddfa83e9d5b951c5e8b4070e ] Some configurations need delay between the "write completed" indication and new write to work reliably. Realtek driver seems to use longer delay when polling the "write complete" bit, so it waits long enough between writes with high probability (but could probably break too). This patch adds a new udelay to make sure we wait unconditionally some time after the write complete indication. This caused a regression with XID 18000000 boards when the board specific phy configuration writing many mdio registers was added in commit 2e955856ff (r8169: phy init for the 8169scd). Some of the configration mdio writes would almost always fail, and depending on failure might leave the PHY in non-working state. Signed-off-by: Timo Teräs <timo.teras@iki.fi> Acked-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	pegasus: fix USB device ID for ETX-US2	Tadashi Abe
	[ Upstream commit 95718c1c25370b2c85061a4d8dfab2831b3ad280 ] USB device ID definition for I-O Data ETX-US2 is wrong. Correct ID is 0x093a. Here's snippet from /proc/bus/usb/devices; T: Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 2 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=ff(vend.) Sub=ff Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=04bb ProdID=093a Rev= 1.01 S: Manufacturer=I-O DATA DEVICE,INC. S: Product=I-O DATA ETX2-US2 S: SerialNumber=A26427 C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=224mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=00 Driver=pegasus E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=03(Int.) MxPS= 8 Ivl=125us This patch enables pegasus driver to work fine with ETX-US2. Signed-off-by: Tadashi Abe <tabe@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	net: Fix FDDI and TR config checks in ipv4 arp and LLC.	David S. Miller
	[ Upstream commit f0ecde1466f21edf577b809735f4f35f354777a0 ] Need to check both CONFIG_FOO and CONFIG_FOO_MODULE Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	ipv6: Fix default multicast hops setting.	David S. Miller
	[ Upstream commit f935aa9e99d6ec74a50871c120e6b21de7256efb ] As per RFC 3493 the default multicast hops setting for a socket should be "1" just like ipv4. Ironically we have a IPV6_DEFAULT_MCASTHOPS macro it just wasn't being used. Reported-by: Elliot Hughes <enh@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	gro: Fix bogus gso_size on the first fraglist entry	Herbert Xu
	[ Upstream commit 622e0ca1cd4d459f5af4f2c65f4dc0dd823cb4c3 ] When GRO produces fraglist entries, and the resulting skb hits an interface that is incapable of TSO but capable of FRAGLIST, we end up producing a bogus packet with gso_size non-zero. This was reported in the field with older versions of KVM that did not set the TSO bits on tuntap. This patch fixes that. Reported-by: Igor Zhang <yugzhang@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	net/dccp: expansion of error code size	Yoichi Yuasa
	[ Upstream commit d9b52dc6fd1fbb2bad645cbc86a60f984c1cb179 ] Because MIPS's EDQUOT value is 1133(0x46d). It's larger than u8. Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02	bridge: fdb cleanup runs too often	stephen hemminger
	[ Upstream commit 25442e06d20aaba7d7b16438078a562b3e4cf19b ] It is common in end-node, non STP bridges to set forwarding delay to zero; which causes the forwarding database cleanup to run every clock tick. Change to run only as soon as needed or at next ageing timer interval which ever is sooner. Use round_jiffies_up macro rather than attempting round up by changing value. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	Linux 2.6.33.6v2.6.33.6	Greg Kroah-Hartman

2010-07-05	sctp: fix append error cause to ERROR chunk correctly	Wei Yongjun
	commit 2e3219b5c8a2e44e0b83ae6e04f52f20a82ac0f2 upstream. commit 5fa782c2f5ef6c2e4f04d3e228412c9b4a4c8809 sctp: Fix skb_over_panic resulting from multiple invalid \ parameter errors (CVE-2010-1173) (v4) cause 'error cause' never be add the the ERROR chunk due to some typo when check valid length in sctp_init_cause_fixed(). Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	qla2xxx: Disable MSI on qla24xx chips other than QLA2432.	Ben Hutchings
	commit 6377a7ae1ab82859edccdbc8eaea63782efb134d upstream. On specific platforms, MSI is unreliable on some of the QLA24xx chips, resulting in fatal I/O errors under load, as reported in <http://bugs.debian.org/572322> and by some RHEL customers. Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	KEYS: find_keyring_by_name() can gain access to a freed keyring	Toshiyuki Okajima
	commit cea7daa3589d6b550546a8c8963599f7c1a3ae5c upstream. find_keyring_by_name() can gain access to a keyring that has had its reference count reduced to zero, and is thus ready to be freed. This then allows the dead keyring to be brought back into use whilst it is being destroyed. The following timeline illustrates the process: \|(cleaner) (user) \| \| free_user(user) sys_keyctl() \| \| \| \| key_put(user->session_keyring) keyctl_get_keyring_ID() \| \|\| //=> keyring->usage = 0 \| \| \|schedule_work(&key_cleanup_task) lookup_user_key() \| \|\| \| \| kmem_cache_free(,user) \| \| . \|[KEY_SPEC_USER_KEYRING] \| . install_user_keyrings() \| . \|\| \| key_cleanup() [<= worker_thread()] \|\| \| \| \|\| \| [spin_lock(&key_serial_lock)] \|[mutex_lock(&key_user_keyr..mutex)] \| \| \|\| \| atomic_read() == 0 \|\| \| \|{ rb_ease(&key->serial_node,) } \|\| \| \| \|\| \| [spin_unlock(&key_serial_lock)] \|find_keyring_by_name() \| \| \|\|\| \| keyring_destroy(keyring) \|\|[read_lock(&keyring_name_lock)] \| \|\| \|\|\| \| \|[write_lock(&keyring_name_lock)] \|\|atomic_inc(&keyring->usage) \| \|. \|\|\| * GET freeing keyring * \| \|. \|\|[read_unlock(&keyring_name_lock)] \| \|\| \|\| \| \|list_del() \|[mutex_unlock(&key_user_k..mutex)] \| \|\| \| \| \|[write_unlock(&keyring_name_lock)] INVALID keyring is returned \| \| . \| kmem_cache_free(,keyring) . \| . \| atomic_dec(&keyring->usage) v * DESTROYED * TIME If CONFIG_SLUB_DEBUG=y then we may see the following message generated: ============================================================================= BUG key_jar: Poison overwritten ----------------------------------------------------------------------------- INFO: 0xffff880197a7e200-0xffff880197a7e200. First byte 0x6a instead of 0x6b INFO: Allocated in key_alloc+0x10b/0x35f age=25 cpu=1 pid=5086 INFO: Freed in key_cleanup+0xd0/0xd5 age=12 cpu=1 pid=10 INFO: Slab 0xffffea000592cb90 objects=16 used=2 fp=0xffff880197a7e200 flags=0x200000000000c3 INFO: Object 0xffff880197a7e200 @offset=512 fp=0xffff880197a7e300 Bytes b4 0xffff880197a7e1f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ Object 0xffff880197a7e200: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b jkkkkkkkkkkkkkkk Alternatively, we may see a system panic happen, such as: BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 IP: [<ffffffff810e61a3>] kmem_cache_alloc+0x5b/0xe9 PGD 6b2b4067 PUD 6a80d067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/kernel/kexec_crash_loaded CPU 1 ... Pid: 31245, comm: su Not tainted 2.6.34-rc5-nofixed-nodebug #2 D2089/PRIMERGY RIP: 0010:[<ffffffff810e61a3>] [<ffffffff810e61a3>] kmem_cache_alloc+0x5b/0xe9 RSP: 0018:ffff88006af3bd98 EFLAGS: 00010002 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88007d19900b RDX: 0000000100000000 RSI: 00000000000080d0 RDI: ffffffff81828430 RBP: ffffffff81828430 R08: ffff88000a293750 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000100000 R12: 00000000000080d0 R13: 00000000000080d0 R14: 0000000000000296 R15: ffffffff810f20ce FS: 00007f97116bc700(0000) GS:ffff88000a280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000001 CR3: 000000006a91c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process su (pid: 31245, threadinfo ffff88006af3a000, task ffff8800374414c0) Stack: 0000000512e0958e 0000000000008000 ffff880037f8d180 0000000000000001 0000000000000000 0000000000008001 ffff88007d199000 ffffffff810f20ce 0000000000008000 ffff88006af3be48 0000000000000024 ffffffff810face3 Call Trace: [<ffffffff810f20ce>] ? get_empty_filp+0x70/0x12f [<ffffffff810face3>] ? do_filp_open+0x145/0x590 [<ffffffff810ce208>] ? tlb_finish_mmu+0x2a/0x33 [<ffffffff810ce43c>] ? unmap_region+0xd3/0xe2 [<ffffffff810e4393>] ? virt_to_head_page+0x9/0x2d [<ffffffff81103916>] ? alloc_fd+0x69/0x10e [<ffffffff810ef4ed>] ? do_sys_open+0x56/0xfc [<ffffffff81008a02>] ? system_call_fastpath+0x16/0x1b Code: 0f 1f 44 00 00 49 89 c6 fa 66 0f 1f 44 00 00 65 4c 8b 04 25 60 e8 00 00 48 8b 45 00 49 01 c0 49 8b 18 48 85 db 74 0d 48 63 45 18 <48> 8b 04 03 49 89 00 eb 14 4c 89 f9 83 ca ff 44 89 e6 48 89 ef RIP [<ffffffff810e61a3>] kmem_cache_alloc+0x5b/0xe9 This problem is that find_keyring_by_name does not confirm that the keyring is valid before accepting it. Skipping keyrings that have been reduced to a zero count seems the way to go. To this end, use atomic_inc_not_zero() to increment the usage count and skip the candidate keyring if that returns false. The following script _may_ cause the bug to happen, but there's no guarantee as the window of opportunity is small: #!/bin/sh LOOP=100000 USER=dummy_user /bin/su -c "exit;" $USER \|\| { /usr/sbin/adduser -m $USER; add=1; } for ((i=0; i<LOOP; i++)) do /bin/su -c "echo '$i' > /dev/null" $USER done (( add == 1 )) && /usr/sbin/userdel -r $USER exit Note that the nominated user must not be in use. An alternative way of testing this may be: for ((i=0; i<100000; i++)) do keyctl session foo /bin/true \|\| break done >&/dev/null as that uses a keyring named "foo" rather than relying on the user and user-session named keyrings. Reported-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	KEYS: Return more accurate error codes	Dan Carpenter
	commit 4d09ec0f705cf88a12add029c058b53f288cfaa2 upstream. We were using the wrong variable here so the error codes weren't being returned properly. The original code returns -ENOKEY. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	parisc: clear floating point exception flag on SIGFPE signal	Helge Deller
	commit 550f0d922286556c7ea43974bb7921effb5a5278 upstream. Clear the floating point exception flag before returning to user space. This is needed, else the libc trampoline handler may hit the same SIGFPE again while building up a trampoline to a signal handler. Fixes debian bug #559406. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-07-05	KVM: SVM: Don't allow nested guest to VMMCALL into host	Joerg Roedel
	This patch disables the possibility for a l2-guest to do a VMMCALL directly into the host. This would happen if the l1-hypervisor doesn't intercept VMMCALL and the l2-guest executes this instruction. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit 0d945bd9351199744c1e89d57a70615b6ee9f394)
2010-07-05	KVM: x86: Inject #GP with the right rip on efer writes	Roedel, Joerg
	This patch fixes a bug in the KVM efer-msr write path. If a guest writes to a reserved efer bit the set_efer function injects the #GP directly. The architecture dependent wrmsr function does not see this, assumes success and advances the rip. This results in a #GP in the guest with the wrong rip. This patch fixes this by reporting efer write errors back to the architectural wrmsr function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit b69e8caef5b190af48c525f6d715e7b7728a77f6)
2010-07-05	KVM: x86: Add missing locking to arch specific vcpu ioctls	Avi Kivity
	Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit 8fbf065d625617bbbf6b72d5f78f84ad13c8b547)
2010-07-05	KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls	Avi Kivity
	Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit 98001d8d017cea1ee0f9f35c6227bbd63ef5005b)
2010-07-05	KVM: Fix wallclock version writing race	Avi Kivity
	Wallclock writing uses an unprotected global variable to hold the version; this can cause one guest to interfere with another if both write their wallclock at the same time. Acked-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit 9ed3c444ab8987c7b219173a2f7807e3f71e234e)
2010-07-05	KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots	Avi Kivity
	On svm, kvm_read_pdptr() may require reading guest memory, which can sleep. Push the spinlock into mmu_alloc_roots(), and only take it after we've read the pdptr. Tested-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit 8facbbff071ff2b19268d3732e31badc60471e21)
2010-07-05	KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)	Shane Wang
	Per document, for feature control MSR: Bit 1 enables VMXON in SMX operation. If the bit is clear, execution of VMXON in SMX operation causes a general-protection exception. Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution of VMXON outside SMX operation causes a general-protection exception. This patch is to enable this kind of check with SMX for VMXON in KVM. Signed-off-by: Shane Wang <shane.wang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit cafd66595d92591e4bd25c3904e004fc6f897e2d)
2010-07-05	KVM: MMU: Segregate shadow pages with different cr0.wp	Avi Kivity
	When cr0.wp=0, we may shadow a gpte having u/s=1 and r/w=0 with an spte having u/s=0 and r/w=1. This allows excessive access if the guest sets cr0.wp=1 and accesses through this spte. Fix by making cr0.wp part of the base role; we'll have different sptes for the two cases and the problem disappears. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit 3dbe141595faa48a067add3e47bba3205b79d33c)
2010-07-05	KVM: x86: Check LMA bit before set_efer	Sheng Yang
	kvm_x86_ops->set_efer() would execute vcpu->arch.efer = efer, so the checking of LMA bit didn't work. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit a3d204e28579427609c3d15d2310127ebaa47d94)
2010-07-05	KVM: Don't allow lmsw to clear cr0.pe	Avi Kivity
	The current lmsw implementation allows the guest to clear cr0.pe, contrary to the manual, which breaks EMM386.EXE. Fix by ORing the old cr0.pe with lmsw's operand. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit f78e917688edbf1f14c318d2e50dc8e7dad20445)
2010-07-05	x86, paravirt: Add a global synchronization point for pvclock	Glauber Costa
	In recent stress tests, it was found that pvclock-based systems could seriously warp in smp systems. Using ingo's time-warp-test.c, I could trigger a scenario as bad as 1.5mi warps a minute in some systems. (to be fair, it wasn't that bad in most of them). Investigating further, I found out that such warps were caused by the very offset-based calculation pvclock is based on. This happens even on some machines that report constant_tsc in its tsc flags, specially on multi-socket ones. Two reads of the same kernel timestamp at approx the same time, will likely have tsc timestamped in different occasions too. This means the delta we calculate is unpredictable at best, and can probably be smaller in a cpu that is legitimately reading clock in a forward ocasion. Some adjustments on the host could make this window less likely to happen, but still, it pretty much poses as an intrinsic problem of the mechanism. A while ago, I though about using a shared variable anyway, to hold clock last state, but gave up due to the high contention locking was likely to introduce, possibly rendering the thing useless on big machines. I argue, however, that locking is not necessary. We do a read-and-return sequence in pvclock, and between read and return, the global value can have changed. However, it can only have changed by means of an addition of a positive value. So if we detected that our clock timestamp is less than the current global, we know that we need to return a higher one, even though it is not exactly the one we compared to. OTOH, if we detect we're greater than the current time source, we atomically replace the value with our new readings. This do causes contention on big boxes (but big here means BIG), but it seems like a good trade off, since it provide us with a time source guaranteed to be stable wrt time warps. After this patch is applied, I don't see a single warp in time during 5 days of execution, in any of the machines I saw them before. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> CC: Jeremy Fitzhardinge <jeremy@goop.org> CC: Avi Kivity <avi@redhat.com> CC: Marcelo Tosatti <mtosatti@redhat.com> CC: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit 489fb490dbf8dab0249ad82b56688ae3842a79e8)
2010-07-05	KVM: SVM: Report emulated SVM features to userspace	Joerg Roedel
	This patch implements the reporting of the emulated SVM features to userspace instead of the real hardware capabilities. Every real hardware capability needs emulation in nested svm so the old behavior was broken. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit c2c63a493924e09a1984d1374a0e60dfd54fc0b0)
2010-07-05	KVM: x86: Add callback to let modules decide over some supported cpuid bits	Joerg Roedel
	This patch adds the get_supported_cpuid callback to kvm_x86_ops. It will be used in do_cpuid_ent to delegate the decission about some supported cpuid bits to the architecture modules. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit d4330ef2fb2236a1e3a176f0f68360f4c0a8661b)
2010-07-05	KVM: PPC: Do not create debugfs if fail to create vcpu	Wei Yongjun
	If fail to create the vcpu, we should not create the debugfs for it. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Alexander Graf <agraf@suse.de> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit 06056bfb944a0302a8f22eb45f09123de7fb417b)
2010-07-05	KVM: s390: Fix possible memory leak of in kvm_arch_vcpu_create()	Wei Yongjun
	This patch fixed possible memory leak in kvm_arch_vcpu_create() under s390, which would happen when kvm_arch_vcpu_create() fails. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Carsten Otte <cotte@de.ibm.com> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit 7b06bf2ffa15e119c7439ed0b024d44f66d7b605)
2010-07-05	KVM: SVM: Fix wrong interrupt injection in enable_irq_windows	Joerg Roedel
	The nested_svm_intr() function does not execute the vmexit anymore. Therefore we may still be in the nested state after that function ran. This patch changes the nested_svm_intr() function to return wether the irq window could be enabled. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit 8fe546547cf6857a9d984bfe2f2194910f3fc5d0)
2010-07-05	KVM: SVM: Don't sync nested cr8 to lapic and back	Joerg Roedel
	This patch makes syncing of the guest tpr to the lapic conditional on !nested. Otherwise a nested guest using the TPR could freeze the guest. Another important change this patch introduces is that the cr8 intercept bits are no longer ORed at vmrun emulation if the guest sets VINTR_MASKING in its VMCB. The reason is that nested cr8 accesses need alway be handled by the nested hypervisor because they change the shadow version of the tpr. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> (cherry picked from commit 88ab24adc7142506c8583ac36a34fa388300b750)