linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2011-10-03	fs/9p: Always ask new inode in lookup for cache mode disabled	Aneesh Kumar K.V
	commit 73f507171cfa407b19f254aef95cbb058c8180cf upstream. This make sure we don't end up reusing the unlinked inode object. The ideal way is to use inode i_generation. But i_generation is not available in userspace always. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	net/9p: Fix kernel crash with msize 512K	Aneesh Kumar K.V
	commit b49d8b5d7007a673796f3f99688b46931293873e upstream. With msize equal to 512K (PAGE_SIZE * VIRTQUEUE_NUM), we hit multiple crashes. This patch fix those. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	fs/9p: Add OS dependent open flags in 9p protocol	Aneesh Kumar K.V
	commit f88657ce3f9713a0c62101dffb0e972a979e77b9 upstream. Some of the flags are OS/arch dependent we add a 9p protocol value which maps to asm-generic/fcntl.h values in Linux Based on the original patch from Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> [extra comments from author as to why this needs to go to stable: Earlier for different operation such as open we used the values of open flag as defined by the OS. But some of these flags such as O_DIRECT are arch dependent. So if we have the 9p client and server running on different architectures, we end up with client sending client architecture value of these open flag and server will try to map these values to what its architecture states. For ex: O_DIRECT on a x86 client maps to #define O_DIRECT 00040000 Where as on sparc server it will maps to #define O_DIRECT 0x100000 Hence we need to map these open flags to OS/arch independent flag values. Getting these changes to an early version of kernel ensures us that we work with different combination of client and server. We should ideally backport this patch to all possible kernel version.] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	fs/9p: Don't update file type when updating file attributes	Aneesh Kumar K.V
	commit 45089142b1497dab2327d60f6c71c40766fc3ea4 upstream. We should only update attributes that we can change on stat2inode. Also do file type initialization in v9fs_init_inode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	fs/9p: Add fid before dentry instantiation	Aneesh Kumar K.V
	commit 5441ae5eb3614d3c28f77073370738a2820c88e4 upstream. d_instantiate marks the dentry positive. So a parallel lookup and mkdir of the directory can find dentry that doesn't have fid attached. This can result in both the code path doing v9fs_fid_add which results in v9fs_dentry leak. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	ACPICA: Do not repair _TSS return package if _PSS is present	Fenghua Yu
	commit 8f9c91273e36e5762c617c23e4fd48d5172e0dac upstream. We can only sort the _TSS return package if there is no _PSS in the same scope. This is because if _PSS is present, the ACPI specification dictates that the _TSS Power Dissipation field is to be ignored, and therefore some BIOSs leave garbage values in the _TSS Power field(s). In this case, it is best to just return the _TSS package as-is. Reported-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	iommu/amd: Make sure iommu->need_sync contains correct value	Joerg Roedel
	commit f1ca1512e765337a7c09eb875eedef8ea4e07654 upstream. The value is only set to true but never set back to false, which causes to many completion-wait commands to be sent to hardware. Fix it with this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	iommu/amd: Don't take domain->lock recursivly	Joerg Roedel
	commit e33acde91140f1809952d1c135c36feb66a51887 upstream. The domain_flush_devices() function takes the domain->lock. But this function is only called from update_domain() which itself is already called unter the domain->lock. This causes a deadlock situation when the dma-address-space of a domain grows larger than 1GB. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	irda: fix smsc-ircc2 section mismatch warning	Randy Dunlap
	commit f470e5ae34d68880a38aa79ee5c102ebc2a1aef6 upstream. Fix section mismatch warning: WARNING: drivers/net/irda/smsc-ircc2.o(.devinit.text+0x1a7): Section mismatch in reference from the function smsc_ircc_pnp_probe() to the function .init.text:smsc_ircc_open() Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	9p: close ACL leaks	Al Viro
	commit 1ec95bf34d976b38897d1977b155a544d77b05e7 upstream. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	net/9p: Fix the msize calculation.	Venkateswararao Jujjuri (JV)
	commit c9ffb05ca5b5098d6ea468c909dd384d90da7d54 upstream. msize represents the maximum PDU size that includes P9_IOHDRSZ. Signed-off-by: Venkateswararao Jujjuri "<jvrao@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	fs/9p: Always ask new inode in create	Aneesh Kumar K.V
	commit ed80fcfac2565fa866d93ba14f0e75de17a8223e upstream. This make sure we don't end up reusing the unlinked inode object. The ideal way is to use inode i_generation. But i_generation is not available in userspace always. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	fs/9p: Fix invalid mount options/args	Prem Karat
	commit a2dd43bb0d7b9ce28f8a39254c25840c0730498e upstream. Without this fix, if any invalid mount options/args are passed while mouting the 9p fs, no error (-EINVAL) is returned and default arg value is assigned. This fix returns -EINVAL when an invalid arguement is found while parsing mount options. Signed-off-by: Prem Karat <prem.karat@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	fs/9p: When doing inode lookup compare qid details and inode mode bits.	Aneesh Kumar K.V
	commit fd2421f54423f307ecd31bdebdca6bc317e0c492 upstream. This make sure we don't use wrong inode from the inode hash. The inode number of the file deleted is reused by the next file system object created and if we only use inode number for inode hash lookup we could end up with wrong struct inode. Also compare inode generation number. Not all Linux file system provide st_gen in userspace. So it could be 0; Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	fs/9p: Fid is not valid after a failed clunk.	Aneesh Kumar K.V
	commit 5034990e28efb2d232ee82443a9edd62defd17ba upstream. free the fid even in case of failed clunk. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	VirtIO can transfer VIRTQUEUE_NUM of pages.	jvrao
	commit 7f781679dd596c8abde8336b4d0d166d6a4aad04 upstream. Signed-off-by: Venkateswararao Jujjuri "<jvrao@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	Fix the size of receive buffer packing onto VirtIO ring.	jvrao
	commit 114e6f3a5ede73d5b56e145f04680c61c3dd67c4 upstream. Signed-off-by: Venkateswararao Jujjuri "<jvrao@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	net/9p: fix client code to fail more gracefully on protocol error	Eric Van Hensbergen
	commit b85f7d92d7bd7e3298159e8b1eed8cb8cbbb0348 upstream. There was a BUG_ON to protect against a bad id which could be dealt with more gracefully. Reported-by: Natalie Orlin <norlin@us.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	vp7045: fix buffer setup	Florian Mickler
	commit fc61ccd35fd59d5362d37c8bf9c0526c85086c84 upstream. dvb_usb_device_init calls the frontend_attach method of this driver which uses vp7045_usb_ob. In order to have a buffer ready in vp7045_usb_op, it has to be allocated before that happens. Luckily we can use the whole private data as the buffer as it gets separately allocated on the heap via kzalloc in dvb_usb_device_init and is thus apt for use via usb_control_msg. This fixes a BUG: unable to handle kernel paging request at 0000000000001e78 reported by Tino Keitel and diagnosed by Dan Carpenter. Tested-by: Tino Keitel <tino.keitel@tikei.de> Signed-off-by: Florian Mickler <florian@mickler.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	nuvoton-cir: simplify raw IR sample handling	Jarod Wilson
	commit de4ed0c111ed078b8729a5cc49c23197740f5bad upstream. The nuvoton-cir driver was storing up consecutive pulse-pulse and space-space samples internally, for no good reason, since ir_raw_event_store_with_filter() already merges back to back like samples types for us. This should also fix a regression introduced late in 3.0 that related to a timeout change, which actually becomes correct when coupled with this change. Tested with RC6 and RC5 on my own nuvoton-cir hardware atop vanilla 3.0.0, after verifying quirky behavior in 3.0 due to the timeout change. Reported-by: Stephan Raue <sraue@openelec.tv> CC: Stephan Raue <sraue@openelec.tv> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	md: Fix handling for devices from 2TB to 4TB in 0.90 metadata.	NeilBrown
	commit 27a7b260f71439c40546b43588448faac01adb93 upstream. 0.90 metadata uses an unsigned 32bit number to count the number of kilobytes used from each device. This should allow up to 4TB per device. However we multiply this by 2 (to get sectors) before casting to a larger type, so sizes above 2TB get truncated. Also we allow rdev->sectors to be larger than 4TB, so it is possible for the array to be resized larger than the metadata can handle. So make sure rdev->sectors never exceeds 4TB when 0.90 metadata is in used. Also the sanity check at the end of super_90_load should include level 1 as it used ->size too. (RAID0 and Linear don't use ->size at all). Reported-by: Pim Zandbergen <P.Zandbergen@macroscoop.nl> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	Avoid dereferencing a 'request_queue' after last close.	NeilBrown
	commit 94007751bb02797ba87bac7aacee2731ac2039a3 upstream. On the last close of an 'md' device which as been stopped, the device is destroyed and in particular the request_queue is freed. The free is done in a separate thread so it might happen a short time later. __blkdev_put calls bdev_inode_switch_bdi after ->release has been called. Since commit f758eeabeb96f878c860e8f110f94ec8820822a9 bdev_inode_switch_bdi will dereference the 'old' bdi, which lives inside a request_queue, to get a spin lock. This causes the last close on an md device to sometime take a spin_lock which lives in freed memory - which results in an oops. So move the called to bdev_inode_switch_bdi before the call to ->release. Cc: Christoph Hellwig <hch@lst.de> Cc: Hugh Dickins <hughd@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	drm/nouveau: properly handle allocation failure in nouveau_sgdma_populate	Marcin Slusarz
	commit 17c8b960930da3599e47801a54ac0ea1070545d2 upstream. Not cleaning after alloc failure would result in crash on destroy, because nouveau_sgdma_clear assumes "ttm_alloced" to be not null when "pages" is not null. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	ARM: davinci: fix cache flush build error	Linus Walleij
	commit 897a6a1a14837d6d582bfd1fd7aba00be44b6469 upstream. The TNET variant of DaVinci compiles some code that it shares with other DaVinci variants, however it has a V6 CPU rather than an ARM926T, thus the hardcoded call to arm926_flush_kern_cache_all() in sleep.S will obviously fail, and we need to build with the v6_flush_kern_cache_all() call instead. This was triggered by manually altering the DaVinci config to build the TNET version. Cc: Dave Martin <dave.martin@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	ARM: davinci: da850 EVM: read mac address from SPI flash	Sudhakar Rajashekhara
	commit 810198bc9c109489dfadc57131c5183ce6ad2d7d upstream. DA850/OMAP-L138 EMAC driver uses random mac address instead of a fixed one because the mac address is not stuffed into EMAC platform data. This patch provides a function which reads the mac address stored in SPI flash (registered as MTD device) and populates the EMAC platform data. The function which reads the mac address is registered as a callback which gets called upon addition of MTD device. NOTE: In case the MAC address stored in SPI flash is erased, follow the instructions at [1] to restore it. [1] http://processors.wiki.ti.com/index.php/GSG:_OMAP-L138_DVEVM_Additional_Procedures#Restoring_MAC_address_on_SPI_Flash Modifications in v2: Guarded registering the mtd_notifier only when MTD is enabled. Earlier this was handled using mtd_has_partitions() call, but this has been removed in Linux v3.0. Modifications in v3: a. Guarded da850_evm_m25p80_notify_add() function and da850evm_spi_notifier structure with CONFIG_MTD macros. b. Renamed da850_evm_register_mtd_user() function to da850_evm_setup_mac_addr() and removed the struct mtd_notifier argument to this function. c. Passed the da850evm_spi_notifier structure to register_mtd_user() function. Modifications in v4: Moved the da850_evm_setup_mac_addr() function within the first CONFIG_MTD ifdef construct. Signed-off-by: Sudhakar Rajashekhara <sudhakar.raj@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	ARM: 7081/1: mach-integrator: fix the clocksource	Linus Walleij
	commit bb9ea77846620ed2b37e74c852d72c7a476b248c upstream. I was intrigued by the fact that the clock stood still on the Integrator, but it wasn't strange at all, because the timer was set up all wrong and probably has been for a while. With this patch the clock starts ticking again: make the timer periodic (reload), \|= on the divisor bit and load the timer before starting it. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	hwmon: (max16065) Fix current calculation	Guenter Roeck
	commit ff71c182f461da5ae9d2d65f8a63f5a9193b9be1 upstream. Current calculation is completely wrong. Add missing brackets to fix it. Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	xen/smp: Warn user why they keel over - nosmp or noapic and what to use instead.	Konrad Rzeszutek Wilk
	commit ed467e69f16e6b480e2face7bc5963834d025f91 upstream. We have hit a couple of customer bugs where they would like to use those parameters to run an UP kernel - but both of those options turn of important sources of interrupt information so we end up not being able to boot. The correct way is to pass in 'dom0_max_vcpus=1' on the Xen hypervisor line and the kernel will patch itself to be a UP kernel. Fixes bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=637308 Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	xen: x86_32: do not enable iterrupts when returning from exception in ↵	Igor Mammedov
	interrupt context commit d198d499148a0c64a41b3aba9e7dd43772832b91 upstream. If vmalloc page_fault happens inside of interrupt handler with interrupts disabled then on exit path from exception handler when there is no pending interrupts, the following code (arch/x86/xen/xen-asm_32.S:112): cmpw $0x0001, XEN_vcpu_info_pending(%eax) sete XEN_vcpu_info_mask(%eax) will enable interrupts even if they has been previously disabled according to eflags from the bounce frame (arch/x86/xen/xen-asm_32.S:99) testb $X86_EFLAGS_IF>>8, 8+1+ESP_OFFSET(%esp) setz XEN_vcpu_info_mask(%eax) Solution is in setting XEN_vcpu_info_mask only when it should be set according to cmpw $0x0001, XEN_vcpu_info_pending(%eax) but not clearing it if there isn't any pending events. Reproducer for bug is attached to RHBZ 707552 Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	mmc: sdhci-s3c: Fix mmc card I/O problem	Girish K S
	commit 49bb1e619568ec84785ceb366f07db2a6f0b64cc upstream. This patch fixes the problem in sdhci-s3c host driver for Samsung Soc's. During the card identification stage the mmc core driver enumerates for the best bus width in combination with the highest available data rate. It starts enumerating from the highest bus width (8) to lowest width (1). In case of few MMC cards the 4-bit bus enumeration fails and tries the 1-bit bus enumeration. When switched to 1-bit bus mode the host driver has to clear the previous bus width setting and apply the new setting. The current patch will clear the previous bus mode and apply the new mode setting. Signed-off-by: Girish K S <girish.shivananjappa@linaro.org> Acked-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	mmc: core: use non-reentrant workqueue for clock gating	Mika Westerberg
	commit 50a50f9248497484c678631a9c1a719f1aaeab79 upstream. The default multithread workqueue can cause the same work to be executed concurrently on a different CPUs. This isn't really suitable for clock gating as it might already gated the clock and gating it twice results both host->clk_old and host->ios.clock to be set to 0. To prevent this from happening we use system_nrt_wq instead. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Chris Ball <cjb@laptop.org> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	mmc: core: prevent aggressive clock gating racing with ios updates	Mika Westerberg
	commit 778e277cb82411c9002ca28ccbd216c4d9eb9158 upstream. We have seen at least two different races when clock gating kicks in in a middle of ios structure update. First one happens when ios->clock is changed outside of aggressive clock gating framework, for example via mmc_set_clock(). The race might happen when we run following code: mmc_set_ios(): ... if (ios->clock > 0) mmc_set_ungated(host); Now if gating kicks in right after the condition check we end up setting host->clk_gated to false even though we have just gated the clock. Next time a request is started we try to ungate and restore the clock in mmc_host_clk_hold(). However since we have host->clk_gated set to false the original clock is not restored. This eventually will cause the host controller to hang since its clock is disabled while we are trying to issue a request. For example on Intel Medfield platform we see: [ 13.818610] mmc2: Timeout waiting for hardware interrupt. [ 13.818698] sdhci: =========== REGISTER DUMP (mmc2)=========== [ 13.818753] sdhci: Sys addr: 0x00000000 \| Version: 0x00008901 [ 13.818804] sdhci: Blk size: 0x00000000 \| Blk cnt: 0x00000000 [ 13.818853] sdhci: Argument: 0x00000000 \| Trn mode: 0x00000000 [ 13.818903] sdhci: Present: 0x1fff0000 \| Host ctl: 0x00000001 [ 13.818951] sdhci: Power: 0x0000000d \| Blk gap: 0x00000000 [ 13.819000] sdhci: Wake-up: 0x00000000 \| Clock: 0x00000000 [ 13.819049] sdhci: Timeout: 0x00000000 \| Int stat: 0x00000000 [ 13.819098] sdhci: Int enab: 0x00ff00c3 \| Sig enab: 0x00ff00c3 [ 13.819147] sdhci: AC12 err: 0x00000000 \| Slot int: 0x00000000 [ 13.819196] sdhci: Caps: 0x6bee32b2 \| Caps_1: 0x00000000 [ 13.819245] sdhci: Cmd: 0x00000000 \| Max curr: 0x00000000 [ 13.819292] sdhci: Host ctl2: 0x00000000 [ 13.819331] sdhci: ADMA Err: 0x00000000 \| ADMA Ptr: 0x00000000 [ 13.819377] sdhci: =========================================== [ 13.919605] mmc2: Reset 0x2 never completed. and it never recovers. Second race might happen while running mmc_power_off(): static void mmc_power_off(struct mmc_host host) { host->ios.clock = 0; host->ios.vdd = 0; [ clock gating kicks in here ] / * Reset ocr mask to be the highest possible voltage supported for * this mmc host. This value will be used at next power up. */ host->ocr = 1 << (fls(host->ocr_avail) - 1); if (!mmc_host_is_spi(host)) { host->ios.bus_mode = MMC_BUSMODE_OPENDRAIN; host->ios.chip_select = MMC_CS_DONTCARE; } host->ios.power_mode = MMC_POWER_OFF; host->ios.bus_width = MMC_BUS_WIDTH_1; host->ios.timing = MMC_TIMING_LEGACY; mmc_set_ios(host); } If the clock gating worker kicks in while we are only partially updated the ios structure the host controller gets incomplete ios and might not work as supposed. Again on Intel Medfield platform we get: [ 4.185349] kernel BUG at drivers/mmc/host/sdhci.c:1155! [ 4.185422] invalid opcode: 0000 [#1] PREEMPT SMP [ 4.185509] Modules linked in: [ 4.185565] [ 4.185608] Pid: 4, comm: kworker/0:0 Not tainted 3.0.0+ #240 Intel Corporation Medfield/iCDKA [ 4.185742] EIP: 0060:[<c136364e>] EFLAGS: 00010083 CPU: 0 [ 4.185827] EIP is at sdhci_set_power+0x3e/0xd0 [ 4.185891] EAX: f5ff98e0 EBX: f5ff98e0 ECX: 00000000 EDX: 00000001 [ 4.185970] ESI: f5ff977c EDI: f5ff9904 EBP: f644fe98 ESP: f644fe94 [ 4.186049] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 4.186125] Process kworker/0:0 (pid: 4, ti=f644e000 task=f644c0e0 task.ti=f644e000) [ 4.186219] Stack: [ 4.186257] f5ff98e0 f644feb0 c1365173 00000282 f5ff9460 f5ff96e0 f5ff96e0 f644feec [ 4.186418] c1355bd8 f644c0e0 c1499c3d f5ff96e0 f644fed4 00000006 f5ff96e0 00000286 [ 4.186579] f644fedc c107922b f644feec 00000286 f5ff9460 f5ff9700 f644ff10 c135839e [ 4.186739] Call Trace: [ 4.186802] [<c1365173>] sdhci_set_ios+0x1c3/0x340 [ 4.186883] [<c1355bd8>] mmc_gate_clock+0x68/0x120 [ 4.186963] [<c1499c3d>] ? _raw_spin_unlock_irqrestore+0x4d/0x60 [ 4.187052] [<c107922b>] ? trace_hardirqs_on+0xb/0x10 [ 4.187134] [<c135839e>] mmc_host_clk_gate_delayed+0xbe/0x130 [ 4.187219] [<c105ec09>] ? process_one_work+0xf9/0x5b0 [ 4.187300] [<c135841d>] mmc_host_clk_gate_work+0xd/0x10 [ 4.187379] [<c105ec82>] process_one_work+0x172/0x5b0 [ 4.187457] [<c105ec09>] ? process_one_work+0xf9/0x5b0 [ 4.187538] [<c1358410>] ? mmc_host_clk_gate_delayed+0x130/0x130 [ 4.187625] [<c105f3c8>] worker_thread+0x118/0x330 [ 4.187700] [<c1496cee>] ? preempt_schedule+0x2e/0x50 [ 4.187779] [<c105f2b0>] ? rescuer_thread+0x1f0/0x1f0 [ 4.187857] [<c1062cf4>] kthread+0x74/0x80 [ 4.187931] [<c1062c80>] ? __init_kthread_worker+0x60/0x60 [ 4.188015] [<c149acfa>] kernel_thread_helper+0x6/0xd [ 4.188079] Code: 81 fa 00 00 04 00 0f 84 a7 00 00 00 7f 21 81 fa 80 00 00 00 0f 84 92 00 00 00 81 fa 00 00 0 [ 4.188780] EIP: [<c136364e>] sdhci_set_power+0x3e/0xd0 SS:ESP 0068:f644fe94 [ 4.188898] ---[ end trace a7b23eecc71777e4 ]--- This BUG() comes from the fact that ios.power_mode was still in previous value (MMC_POWER_ON) and ios.vdd was set to zero. We prevent these by inhibiting the clock gating while we update the ios structure. Both problems can be reproduced by simply running the device in a reboot loop. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Chris Ball <cjb@laptop.org> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	mmc: rename mmc_host_clk_{ungate\|gate} to mmc_host_clk_{hold\|release}	Mika Westerberg
	commit 08c14071fda4e69abb9d5b1566651cd092b158d3 upstream. As per suggestion by Linus Walleij: > If you think the names of the functions are confusing then > you may rename them, say like this: > > mmc_host_clk_ungate() -> mmc_host_clk_hold() > mmc_host_clk_gate() -> mmc_host_clk_release() > > Which would make the usecases more clear (This is CC'd to stable@ because the next two patches, which fix observable races, depend on it.) Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	x86, perf: Check that current->mm is alive before getting user callchain	Andrey Vagin
	commit 20afc60f892d285fde179ead4b24e6a7938c2f1b upstream. An event may occur when an mm is already released. I added an event in dequeue_entity() and caught a panic with the following backtrace: [ 434.421110] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [ 434.421258] IP: [<ffffffff810464ac>] __get_user_pages_fast+0x9c/0x120 ... [ 434.421258] Call Trace: [ 434.421258] [<ffffffff8101ae81>] copy_from_user_nmi+0x51/0xf0 [ 434.421258] [<ffffffff8109a0d5>] ? sched_clock_local+0x25/0x90 [ 434.421258] [<ffffffff8101b048>] perf_callchain_user+0x128/0x170 [ 434.421258] [<ffffffff811154cd>] ? __perf_event_header__init_id+0xed/0x100 [ 434.421258] [<ffffffff81116690>] perf_prepare_sample+0x200/0x280 [ 434.421258] [<ffffffff81118da8>] __perf_event_overflow+0x1b8/0x290 [ 434.421258] [<ffffffff81065240>] ? tg_shares_up+0x0/0x670 [ 434.421258] [<ffffffff8104fe1a>] ? walk_tg_tree+0x6a/0xb0 [ 434.421258] [<ffffffff81118f44>] perf_swevent_overflow+0xc4/0xf0 [ 434.421258] [<ffffffff81119150>] do_perf_sw_event+0x1e0/0x250 [ 434.421258] [<ffffffff81119204>] perf_tp_event+0x44/0x70 [ 434.421258] [<ffffffff8105701f>] ftrace_profile_sched_block+0xdf/0x110 [ 434.421258] [<ffffffff8106121d>] dequeue_entity+0x2ad/0x2d0 [ 434.421258] [<ffffffff810614ec>] dequeue_task_fair+0x1c/0x60 [ 434.421258] [<ffffffff8105818a>] dequeue_task+0x9a/0xb0 [ 434.421258] [<ffffffff810581e2>] deactivate_task+0x42/0xe0 [ 434.421258] [<ffffffff814bc019>] thread_return+0x191/0x808 [ 434.421258] [<ffffffff81098a44>] ? switch_task_namespaces+0x24/0x60 [ 434.421258] [<ffffffff8106f4c4>] do_exit+0x464/0x910 [ 434.421258] [<ffffffff8106f9c8>] do_group_exit+0x58/0xd0 [ 434.421258] [<ffffffff8106fa57>] sys_exit_group+0x17/0x20 [ 434.421258] [<ffffffff8100b202>] system_call_fastpath+0x16/0x1b Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1314693156-24131-1-git-send-email-avagin@openvz.org Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	sched: Fix a memory leak in __sdt_free()	WANG Cong
	commit feff8fa0075bdfd43c841e9d689ed81adda988d6 upstream. This patch fixes the following memory leak: unreferenced object 0xffff880107266800 (size 512): comm "sched-powersave", pid 3718, jiffies 4323097853 (age 27495.450s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81133940>] create_object+0x187/0x28b [<ffffffff814ac103>] kmemleak_alloc+0x73/0x98 [<ffffffff811232ba>] __kmalloc_node+0x104/0x159 [<ffffffff81044b98>] kzalloc_node.clone.97+0x15/0x17 [<ffffffff8104cb90>] build_sched_domains+0xb7/0x7f3 [<ffffffff8104d4df>] partition_sched_domains+0x1db/0x24a [<ffffffff8109ee4a>] do_rebuild_sched_domains+0x3b/0x47 [<ffffffff810a00c7>] rebuild_sched_domains+0x10/0x12 [<ffffffff8104d5ba>] sched_power_savings_store+0x6c/0x7b [<ffffffff8104d5df>] sched_mc_power_savings_store+0x16/0x18 [<ffffffff8131322c>] sysdev_class_store+0x20/0x22 [<ffffffff81193876>] sysfs_write_file+0x108/0x144 [<ffffffff81135b10>] vfs_write+0xaf/0x102 [<ffffffff81135d23>] sys_write+0x4d/0x74 [<ffffffff814c8a42>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1313671017-4112-1-git-send-email-amwang@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	sched: Move blk_schedule_flush_plug() out of __schedule()	Thomas Gleixner
	commit 9c40cef2b799f9b5e7fa5de4d2ad3a0168ba118c upstream. There is no real reason to run blk_schedule_flush_plug() with interrupts and preemption disabled. Move it into schedule() and call it when the task is going voluntarily to sleep. There might be false positives when the task is woken between that call and actually scheduling, but that's not really different from being woken immediately after switching away. This fixes a deadlock in the scheduler where the blk_schedule_flush_plug() callchain enables interrupts and thereby allows a wakeup to happen of the task that's going to sleep. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-dwfxtra7yg1b5r65m32ywtct@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	sched: Separate the scheduler entry for preemption	Thomas Gleixner
	commit c259e01a1ec90063042f758e409cd26b2a0963c8 upstream. Block-IO and workqueues call into notifier functions from the scheduler core code with interrupts and preemption disabled. These calls should be made before entering the scheduler core. To simplify this, separate the scheduler core code into __schedule(). __schedule() is directly called from the places which set PREEMPT_ACTIVE and from schedule(). This allows us to add the work checks into schedule(), so they are only called when a task voluntary goes to sleep. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20110622174918.813258321@linutronix.de Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	rtc: Fix RTC PIE frequency limit	John Stultz
	commit 938f97bcf1bdd1b681d5d14d1d7117a2e22d4434 upstream. Thomas earlier submitted a fix to limit the RTC PIE freq, but picked 5000Hz out of the air. Willy noticed that we should instead use the 8192Hz max from the rtc man documentation. Cc: Willy Tarreau <w@1wt.eu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	alarmtimers: Avoid possible denial of service with high freq periodic timers	John Stultz
	commit 6af7e471e5a7746b8024d70b4363d3dfe41d36b8 upstream. Its possible to jam up the alarm timers by setting very small interval timers, which will cause the alarmtimer subsystem to spend all of its time firing and restarting timers. This can effectivly lock up a box. A deeper fix is needed, closely mimicking the hrtimer code, but for now just cap the interval to 100us to avoid userland hanging the system. CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	alarmtimers: Memset itimerspec passed into alarm_timer_get	John Stultz
	commit ea7802f630d356acaf66b3c0b28c00a945fc35dc upstream. Following common_timer_get, zero out the itimerspec passed in. CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	alarmtimers: Avoid possible null pointer traversal	John Stultz
	commit 971c90bfa2f0b4fe52d6d9002178d547706f1343 upstream. We don't check if old_setting is non null before assigning it, so correct this. CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	MXC: iomux-v3: correct NO_PAD_CTRL definition	Troy Kisky
	commit 425933b30b0ccfac58065bca6c853ea627443cdf upstream. iomux-v3.c uses NO_PAD_CTRL as a 32 bit value so it should not be shifted left by MUX_PAD_CTRL_SHIFT(41) Previously, anything requesting NO_PAD_CTRL would get their pad control register set to 0. Since it is a pad control mask, place it with the other mask values. Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com> Acked-by: Lothar Waßmann <LW@KARO-electronics.de> Tested-by: Lothar Waßmann <LW@KARO-electronics.de> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Cc: John Ogness <john.ogness@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	igb: fix WOL on second port of i350 device	Carolyn Wyborny
	commit 6d337dce664b6872ddf1655f6b1fcab76ce35b08 upstream. This patch fixes a problem where WOL would fail on second port of i350 device. Reported-by: Martin Wilck <martin.wilck@ts.fujitsu.com> Reported-by: Stefan Assmann<sassmann@redhat.com> Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	mm: page allocator: reconsider zones for allocation after direct reclaim	Mel Gorman
	commit 76d3fbf8fbf6cc78ceb63549e0e0c5bc8a88f838 upstream. With zone_reclaim_mode enabled, it's possible for zones to be considered full in the zonelist_cache so they are skipped in the future. If the process enters direct reclaim, the ZLC may still consider zones to be full even after reclaiming pages. Reconsider all zones for allocation if direct reclaim returns successfully. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	mm: page allocator: initialise ZLC for first zone eligible for zone_reclaim	Mel Gorman
	commit cd38b115d5ad79b0100ac6daa103c4fe2c50a913 upstream. There have been a small number of complaints about significant stalls while copying large amounts of data on NUMA machines reported on a distribution bugzilla. In these cases, zone_reclaim was enabled by default due to large NUMA distances. In general, the complaints have not been about the workload itself unless it was a file server (in which case the recommendation was disable zone_reclaim). The stalls are mostly due to significant amounts of time spent scanning the preferred zone for pages to free. After a failure, it might fallback to another node (as zonelists are often node-ordered rather than zone-ordered) but stall quickly again when the next allocation attempt occurs. In bad cases, each page allocated results in a full scan of the preferred zone. Patch 1 checks the preferred zone for recent allocation failure which is particularly important if zone_reclaim has failed recently. This avoids rescanning the zone in the near future and instead falling back to another node. This may hurt node locality in some cases but a failure to zone_reclaim is more expensive than a remote access. Patch 2 clears the zlc information after direct reclaim. Otherwise, zone_reclaim can mark zones full, direct reclaim can reclaim enough pages but the zone is still not considered for allocation. This was tested on a 24-thread 2-node x86_64 machine. The tests were focused on large amounts of IO. All tests were bound to the CPUs on node-0 to avoid disturbances due to processes being scheduled on different nodes. The kernels tested are 3.0-rc6-vanilla Vanilla 3.0-rc6 zlcfirst Patch 1 applied zlcreconsider Patches 1+2 applied FS-Mark ./fs_mark -d /tmp/fsmark-10813 -D 100 -N 5000 -n 208 -L 35 -t 24 -S0 -s 524288 fsmark-3.0-rc6 3.0-rc6 3.0-rc6 vanilla zlcfirs zlcreconsider Files/s min 54.90 ( 0.00%) 49.80 (-10.24%) 49.10 (-11.81%) Files/s mean 100.11 ( 0.00%) 135.17 (25.94%) 146.93 (31.87%) Files/s stddev 57.51 ( 0.00%) 138.97 (58.62%) 158.69 (63.76%) Files/s max 361.10 ( 0.00%) 834.40 (56.72%) 802.40 (55.00%) Overhead min 76704.00 ( 0.00%) 76501.00 ( 0.27%) 77784.00 (-1.39%) Overhead mean 1485356.51 ( 0.00%) 1035797.83 (43.40%) 1594680.26 (-6.86%) Overhead stddev 1848122.53 ( 0.00%) 881489.88 (109.66%) 1772354.90 ( 4.27%) Overhead max 7989060.00 ( 0.00%) 3369118.00 (137.13%) 10135324.00 (-21.18%) MMTests Statistics: duration User/Sys Time Running Test (seconds) 501.49 493.91 499.93 Total Elapsed Time (seconds) 2451.57 2257.48 2215.92 MMTests Statistics: vmstat Page Ins 46268 63840 66008 Page Outs 90821596 90671128 88043732 Swap Ins 0 0 0 Swap Outs 0 0 0 Direct pages scanned 13091697 8966863 8971790 Kswapd pages scanned 0 1830011 1831116 Kswapd pages reclaimed 0 1829068 1829930 Direct pages reclaimed 13037777 8956828 8648314 Kswapd efficiency 100% 99% 99% Kswapd velocity 0.000 810.643 826.346 Direct efficiency 99% 99% 96% Direct velocity 5340.128 3972.068 4048.788 Percentage direct scans 100% 83% 83% Page writes by reclaim 0 3 0 Slabs scanned 796672 720640 720256 Direct inode steals 7422667 7160012 7088638 Kswapd inode steals 0 1736840 2021238 Test completes far faster with a large increase in the number of files created per second. Standard deviation is high as a small number of iterations were much higher than the mean. The number of pages scanned by zone_reclaim is reduced and kswapd is used for more work. LARGE DD 3.0-rc6 3.0-rc6 3.0-rc6 vanilla zlcfirst zlcreconsider download tar 59 ( 0.00%) 59 ( 0.00%) 55 ( 7.27%) dd source files 527 ( 0.00%) 296 (78.04%) 320 (64.69%) delete source 36 ( 0.00%) 19 (89.47%) 20 (80.00%) MMTests Statistics: duration User/Sys Time Running Test (seconds) 125.03 118.98 122.01 Total Elapsed Time (seconds) 624.56 375.02 398.06 MMTests Statistics: vmstat Page Ins 3594216 439368 407032 Page Outs 23380832 23380488 23377444 Swap Ins 0 0 0 Swap Outs 0 436 287 Direct pages scanned 17482342 69315973 82864918 Kswapd pages scanned 0 519123 575425 Kswapd pages reclaimed 0 466501 522487 Direct pages reclaimed 5858054 2732949 2712547 Kswapd efficiency 100% 89% 90% Kswapd velocity 0.000 1384.254 1445.574 Direct efficiency 33% 3% 3% Direct velocity 27991.453 184832.737 208171.929 Percentage direct scans 100% 99% 99% Page writes by reclaim 0 5082 13917 Slabs scanned 17280 29952 35328 Direct inode steals 115257 1431122 332201 Kswapd inode steals 0 0 979532 This test downloads a large tarfile and copies it with dd a number of times - similar to the most recent bug report I've dealt with. Time to completion is reduced. The number of pages scanned directly is still disturbingly high with a low efficiency but this is likely due to the number of dirty pages encountered. The figures could probably be improved with more work around how kswapd is used and how dirty pages are handled but that is separate work and this result is significant on its own. Streaming Mapped Writer MMTests Statistics: duration User/Sys Time Running Test (seconds) 124.47 111.67 112.64 Total Elapsed Time (seconds) 2138.14 1816.30 1867.56 MMTests Statistics: vmstat Page Ins 90760 89124 89516 Page Outs 121028340 120199524 120736696 Swap Ins 0 86 55 Swap Outs 0 0 0 Direct pages scanned 114989363 96461439 96330619 Kswapd pages scanned 56430948 56965763 57075875 Kswapd pages reclaimed 27743219 27752044 27766606 Direct pages reclaimed 49777 46884 36655 Kswapd efficiency 49% 48% 48% Kswapd velocity 26392.541 31363.631 30561.736 Direct efficiency 0% 0% 0% Direct velocity 53780.091 53108.759 51581.004 Percentage direct scans 67% 62% 62% Page writes by reclaim 385 122 1513 Slabs scanned 43008 39040 42112 Direct inode steals 0 10 8 Kswapd inode steals 733 534 477 This test just creates a large file mapping and writes to it linearly. Time to completion is again reduced. The gains are mostly down to two things. In many cases, there is less scanning as zone_reclaim simply gives up faster due to recent failures. The second reason is that memory is used more efficiently. Instead of scanning the preferred zone every time, the allocator falls back to another zone and uses it instead improving overall memory utilisation. This patch: initialise ZLC for first zone eligible for zone_reclaim. The zonelist cache (ZLC) is used among other things to record if zone_reclaim() failed for a particular zone recently. The intention is to avoid a high cost scanning extremely long zonelists or scanning within the zone uselessly. Currently the zonelist cache is setup only after the first zone has been considered and zone_reclaim() has been called. The objective was to avoid a costly setup but zone_reclaim is itself quite expensive. If it is failing regularly such as the first eligible zone having mostly mapped pages, the cost in scanning and allocation stalls is far higher than the ZLC initialisation step. This patch initialises ZLC before the first eligible zone calls zone_reclaim(). Once initialised, it is checked whether the zone failed zone_reclaim recently. If it has, the zone is skipped. As the first zone is now being checked, additional care has to be taken about zones marked full. A zone can be marked "full" because it should not have enough unmapped pages for zone_reclaim but this is excessive as direct reclaim or kswapd may succeed where zone_reclaim fails. Only mark zones "full" after zone_reclaim fails if it failed to reclaim enough pages after scanning. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	drm/radeon/kms: make sure pci max read request size is valid on evergreen+ (v2)	Alex Deucher
	commit d054ac16eeb658bccadb06b12c39cee22243b10f upstream. If the bios or OS sets the pci max read request size to 0 or an invalid value (6,7), it can result in a hang or slowdown. Check and set it to something sane if it's invalid. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=42162 v2: use pci reg defines from include/linux/pci_regs.h Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-03	drm/radeon/kms: set a default max_pixel_clock	Dave Airlie