aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-02-22Linux 3.4.82v3.4.82Greg Kroah-Hartman
2014-02-22genirq: Add missing irq_to_desc export for CONFIG_SPARSE_IRQ=nPaul Gortmaker
commit 2c45aada341121438affc4cb8d5b4cfaa2813d3d upstream. In allmodconfig builds for sparc and any other arch which does not set CONFIG_SPARSE_IRQ, the following will be seen at modpost: CC [M] lib/cpu-notifier-error-inject.o CC [M] lib/pm-notifier-error-inject.o ERROR: "irq_to_desc" [drivers/gpio/gpio-mcp23s08.ko] undefined! make[2]: *** [__modpost] Error 1 This happens because commit 3911ff30f5 ("genirq: export handle_edge_irq() and irq_to_desc()") added one export for it, but there were actually two instances of it, in an if/else clause for CONFIG_SPARSE_IRQ. Add the second one. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Jiri Kosina <jkosina@suse.cz> Link: http://lkml.kernel.org/r/1392057610-11514-1-git-send-email-paul.gortmaker@windriver.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22ring-buffer: Fix first commit on sub-buffer having non-zero deltaSteven Rostedt (Red Hat)
commit d651aa1d68a2f0a7ee65697b04c6a92f8c0a12f2 upstream. Each sub-buffer (buffer page) has a full 64 bit timestamp. The events on that page use a 27 bit delta against that timestamp in order to save on bits written to the ring buffer. If the time between events is larger than what the 27 bits can hold, a "time extend" event is added to hold the entire 64 bit timestamp again and the events after that hold a delta from that timestamp. As a "time extend" is always paired with an event, it is logical to just allocate the event with the time extend, to make things a bit more efficient. Unfortunately, when the pairing code was written, it removed the "delta = 0" from the first commit on a page, causing the events on the page to be slightly skewed. Fixes: 69d1b839f7ee "ring-buffer: Bind time extend and data events together" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22power: max17040: Fix NULL pointer dereference when there is no platform_dataKrzysztof Kozlowski
commit ac323d8d807060f7c95a685a9fe861e7b6300993 upstream. Fix NULL pointer dereference of "chip->pdata" if platform_data was not supplied to the driver. The driver during probe stored the pointer to the platform_data: chip->pdata = client->dev.platform_data; Later it was dereferenced in max17040_get_online() and max17040_get_status(). If platform_data was not supplied, the NULL pointer exception would happen: [ 6.626094] Unable to handle kernel of a at virtual address 00000000 [ 6.628557] pgd = c0004000 [ 6.632868] [00000000] *pgd=66262564 [ 6.634636] Unable to handle kernel paging request at virtual address e6262000 [ 6.642014] pgd = de468000 [ 6.644700] [e6262000] *pgd=00000000 [ 6.648265] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 6.653552] Modules linked in: [ 6.656598] CPU: 0 PID: 31 Comm: kworker/0:1 Not tainted 3.10.14-02717-gc58b4b4 #505 [ 6.664334] Workqueue: events max17040_work [ 6.668488] task: dfa11b80 ti: df9f6000 task.ti: df9f6000 [ 6.673873] PC is at show_pte+0x80/0xb8 [ 6.677687] LR is at show_pte+0x3c/0xb8 [ 6.681503] pc : [<c001b7b8>] lr : [<c001b774>] psr: 600f0113 [ 6.681503] sp : df9f7d58 ip : 600f0113 fp : 00000009 [ 6.692965] r10: 00000000 r9 : 00000000 r8 : dfa11b80 [ 6.698171] r7 : df9f7ea0 r6 : e6262000 r5 : 00000000 r4 : 00000000 [ 6.704680] r3 : 00000000 r2 : e6262000 r1 : 600f0193 r0 : c05b3750 [ 6.711194] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel [ 6.718485] Control: 10c53c7d Table: 5e46806a DAC: 00000015 [ 6.724218] Process kworker/0:1 (pid: 31, stack limit = 0xdf9f6238) [ 6.730465] Stack: (0xdf9f7d58 to 0xdf9f8000) [ 6.914325] [<c001b7b8>] (show_pte+0x80/0xb8) from [<c047107c>] (__do_kernel_fault.part.9+0x44/0x74) [ 6.923425] [<c047107c>] (__do_kernel_fault.part.9+0x44/0x74) from [<c001bb7c>] (do_page_fault+0x2c4/0x360) [ 6.933144] [<c001bb7c>] (do_page_fault+0x2c4/0x360) from [<c0008400>] (do_DataAbort+0x34/0x9c) [ 6.941825] [<c0008400>] (do_DataAbort+0x34/0x9c) from [<c000e5d8>] (__dabt_svc+0x38/0x60) [ 6.950058] Exception stack(0xdf9f7ea0 to 0xdf9f7ee8) [ 6.955099] 7ea0: df0c1790 00000000 00000002 00000000 df0c1794 df0c1790 df0c1790 00000042 [ 6.963271] 7ec0: df0c1794 00000001 00000000 00000009 00000000 df9f7ee8 c0306268 c0306270 [ 6.971419] 7ee0: a00f0113 ffffffff [ 6.974902] [<c000e5d8>] (__dabt_svc+0x38/0x60) from [<c0306270>] (max17040_work+0x8c/0x144) [ 6.983317] [<c0306270>] (max17040_work+0x8c/0x144) from [<c003f364>] (process_one_work+0x138/0x440) [ 6.992429] [<c003f364>] (process_one_work+0x138/0x440) from [<c003fa64>] (worker_thread+0x134/0x3b8) [ 7.001628] [<c003fa64>] (worker_thread+0x134/0x3b8) from [<c00454bc>] (kthread+0xa4/0xb0) [ 7.009875] [<c00454bc>] (kthread+0xa4/0xb0) from [<c000eb28>] (ret_from_fork+0x14/0x2c) [ 7.017943] Code: e1a03005 e2422480 e0826104 e59f002c (e7922104) [ 7.024017] ---[ end trace 73bc7006b9cc5c79 ]--- Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Fixes: c6f4a42de60b981dd210de01cd3e575835e3158e Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22time: Fix overflow when HZ is smaller than 60Mikulas Patocka
commit 80d767d770fd9c697e434fd080c2db7b5c60c6dd upstream. When compiling for the IA-64 ski emulator, HZ is set to 32 because the emulation is slow and we don't want to waste too many cycles processing timers. Alpha also has an option to set HZ to 32. This causes integer underflow in kernel/time/jiffies.c: kernel/time/jiffies.c:66:2: warning: large integer implicitly truncated to unsigned type [-Woverflow] .mult = NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */ ^ This patch reduces the JIFFIES_SHIFT value to avoid the overflow. Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1401241639100.23871@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22md/raid5: Fix CPU hotplug callback registrationOleg Nesterov
commit 789b5e0315284463617e106baad360cb9e8db3ac upstream. Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Interestingly, the raid5 code can actually prevent double initialization and hence can use the following simplified form of callback registration: register_cpu_notifier(&foobar_cpu_notifier); get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); put_online_cpus(); A hotplug operation that occurs between registering the notifier and calling get_online_cpus(), won't disrupt anything, because the code takes care to perform the memory allocations only once. So reorganize the code in raid5 this way to fix the deadlock with callback registration. Cc: linux-raid@vger.kernel.org Fixes: 36d1c6476be51101778882897b315bd928c8c7b5 Signed-off-by: Oleg Nesterov <oleg@redhat.com> [Srivatsa: Fixed the unregister_cpu_notifier() deadlock, added the free_scratch_buffer() helper to condense code further and wrote the changelog.] Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22KVM: return an error code in kvm_vm_ioctl_register_coalesced_mmio()Dan Carpenter
commit aac5c4226e7136c331ed384c25d5560204da10a0 upstream. If kvm_io_bus_register_dev() fails then it returns success but it should return an error code. I also did a little cleanup like removing an impossible NULL test. Fixes: 2b3c246a682c ('KVM: Make coalesced mmio use a device per zone') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22IB/qib: Add missing serdes init sequenceMike Marciniszyn
commit 2f75e12c4457a9b3d042c0a0d748fa198dc2ffaf upstream. Research has shown that commit a77fcf895046 ("IB/qib: Use a single txselect module parameter for serdes tuning") missed a key serdes init sequence. This patch add that sequence. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22block: add cond_resched() to potentially long running ioctl discard loopJens Axboe
commit c8123f8c9cb517403b51aa41c3c46ff5e10b2c17 upstream. When mkfs issues a full device discard and the device only supports discards of a smallish size, we can loop in blkdev_issue_discard() for a long time. If preempt isn't enabled, this can turn into a softlock situation and the kernel will start complaining. Add an explicit cond_resched() at the end of the loop to avoid that. Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22Modpost: fixed USB alias generation for ranges including 0x9 and 0xAJan Moskyto Matejka
commit 03b56329f9bb5a1cb73d7dc659d529a9a9bf3acc upstream. Commit afe2dab4f6 ("USB: add hex/bcd detection to usb modalias generation") changed the routine that generates alias ranges. Before that change, only digits 0-9 were supported; the commit tried to fix the case when the range includes higher values than 0x9. Unfortunately, the commit didn't fix the case when the range includes both 0x9 and 0xA, meaning that the final range must look like [x-9A-y] where x <= 0x9 and y >= 0xA -- instead the [x-9A-x] range was produced. Modprobe doesn't complain as it sees no difference between no-match and bad-pattern results of fnmatch(). Fixing this simple bug to fix the aliases. Also changing the hardcoded beginning of the range to uppercase as all the other letters are also uppercase in the device version numbers. Fortunately, this affects only the dvb-usb-dib0700 module, AFAIK. Signed-off-by: Jan Moskyto Matejka <mq@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22usb: option: blacklist ZTE MF667 net interfaceRaymond Wanyoike
commit 3635c7e2d59f7861afa6fa5e87e2a58860ff514d upstream. Interface #5 of 19d2:1270 is a net interface which has been submitted to the qmi_wwan driver so consequently remove it from the option driver. Signed-off-by: Raymond Wanyoike <raymond.wanyoike@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22usb-storage: enable multi-LUN scanning when neededAlan Stern
commit 823d12c95c666fa7ab7dad208d735f6bc6afabdc upstream. People sometimes create their own custom-configured kernels and forget to enable CONFIG_SCSI_MULTI_LUN. This causes problems when they plug in a USB storage device (such as a card reader) with more than one LUN. Fortunately, we can tell fairly easily when a storage device claims to have more than one LUN. When that happens, this patch asks the SCSI layer to probe all the LUNs automatically, regardless of the config setting. The patch also updates the Kconfig help text for usb-storage, explaining that CONFIG_SCSI_MULTI_LUN may be necessary. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Thomas Raschbacher <lordvan@lordvan.com> CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net> CC: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22usb-storage: restrict bcdDevice range for Super Top in Cypress ATACBAlan Stern
commit a9c143c82608bee2a36410caa56d82cd86bdc7fa upstream. The Cypress ATACB unusual-devs entry for the Super Top SATA bridge causes problems. Although it was originally reported only for bcdDevice = 0x160, its range was much larger. This resulted in a bug report for bcdDevice 0x220, so the range was capped at 0x219. Now Milan reports errors with bcdDevice 0x150. Therefore this patch restricts the range to just 0x160. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-and-tested-by: Milan Svoboda <milan.svoboda@centrum.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22usb-storage: add unusual-devs entry for BlackBerry 9000Alan Stern
commit c5637e5119c43452a00e27c274356b072263ecbb upstream. This patch adds an unusual-devs entry for the BlackBerry 9000. This fixes Bugzilla #22442. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Moritz Moeller-Herrmann <moritz-kernel@moeller-herrmann.de> Tested-by: Moritz Moeller-Herrmann <moritz-kernel@moeller-herrmann.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22USB: ftdi_sio: add Tagsys RFID Reader IDsUlrich Hahn
commit 76f24e3f39a1a94bab0d54e98899d64abcd9f69c upstream. Adding two more IDs to the ftdi_sio usb serial driver. It now connects Tagsys RFID readers. There might be more IDs out there for other Tagsys models. Signed-off-by: Ulrich Hahn <uhahn@eanco.de> Cc: Johan Hovold <johan@hovold.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22usb: ftdi_sio: add Mindstorms EV3 console adapterBjørn Mork
commit 67847baee056892dc35efb9c3fd05ae7f075588c upstream. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22staging:iio:ad799x fix error_free_irq which was freeing an irq that may not ↵Hartmut Knaack
have been requested commit 38408d056188be29a6c4e17f3703c796551bb330 upstream. Only free an IRQ in error_free_irq, if it has been requested previously. Signed-off-by: Hartmut Knaack <knaack.h@gmx.de> Acked-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22tty: n_gsm: Fix for modems with brk in modem status controlLars Poeschel
commit 3ac06b905655b3ef2fd2196bab36e4587e1e4e4f upstream. 3GPP TS 07.10 states in section 5.4.6.3.7: "The length byte contains the value 2 or 3 ... depending on the break signal." The break byte is optional and if it is sent, the length is 3. In fact the driver was not able to work with modems that send this break byte in their modem status control message. If the modem just sends the break byte if it is really set, then weird things might happen. The code for deconding the modem status to the internal linux presentation in gsm_process_modem has already a big comment about this 2 or 3 byte length thing and it is already able to decode the brk, but the code calling the gsm_process_modem function in gsm_control_modem does not encode it and hand it over the right way. This patch fixes this. Without this fix if the modem sends the brk byte in it's modem status control message the driver will hang when opening a muxed channel. Signed-off-by: Lars Poeschel <poeschel@lemonage.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22lockd: send correct lock when granting a delayed lock.NeilBrown
commit 2ec197db1a56c9269d75e965f14c344b58b2a4f6 upstream. If an NFS client attempts to get a lock (using NLM) and the lock is not available, the server will remember the request and when the lock becomes available it will send a GRANT request to the client to provide the lock. If the client already held an adjacent lock, the GRANT callback will report the union of the existing and new locks, which can confuse the client. This happens because __posix_lock_file (called by vfs_lock_file) updates the passed-in file_lock structure when adjacent or over-lapping locks are found. To avoid this problem we take a copy of the two fields that can be changed (fl_start and fl_end) before the call and restore them afterwards. An alternate would be to allocate a 'struct file_lock', initialise it, use locks_copy_lock() to take a copy, then locks_release_private() after the vfs_lock_file() call. But that is a lot more work. Reported-by: Olaf Kirch <okir@suse.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -- v1 had a couple of issues (large on-stack struct and didn't really work properly). This version is much better tested. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-02-22raw: test against runtime value of max_raw_minorsPaul Bolle
commit 5bbb2ae3d6f896f8d2082d1eceb6131c2420b7cf upstream. bind_get() checks the device number it is called with. It uses MAX_RAW_MINORS for the upper bound. But MAX_RAW_MINORS is set at compile time while the actual number of raw devices can be set at runtime. This means the test can either be too strict or too lenient. And if the test ends up being too lenient bind_get() might try to access memory beyond what was allocated for "raw_devices". So check against the runtime value (max_raw_minors) in this function. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22spi: Fix crash with double message finalisation on error handlingGeert Uytterhoeven
commit 1f802f8249a0da536877842c43c7204064c4de8b upstream. This reverts commit e120cc0dcf2880a4c5c0a6cb27b655600a1cfa1d. It causes a NULL pointer dereference with drivers using the generic spi_transfer_one_message(), which always calls spi_finalize_current_message(), which zeroes master->cur_msg. Drivers implementing transfer_one_message() theirselves must always call spi_finalize_current_message(), even if the transfer failed: * @transfer_one_message: the subsystem calls the driver to transfer a single * message while queuing transfers that arrive in the meantime. When the * driver is finished with this message, it must call * spi_finalize_current_message() so the subsystem can issue the next * transfer Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org> Signed-off-by: Mark Brown <broonie@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22s390: fix kernel crash due to linkage stack instructionsMartin Schwidefsky
commit 8d7f6690cedb83456edd41c9bd583783f0703bf0 upstream. The kernel currently crashes with a low-address-protection exception if a user space process executes an instruction that tries to use the linkage stack. Set the base-ASTE origin and the subspace-ASTE origin of the dispatchable-unit-control-table to point to a dummy ASTE. Set up control register 15 to point to an empty linkage stack with no room left. A user space process with a linkage stack instruction will still crash but with a different exception which is correctly translated to a segmentation fault instead of a kernel oops. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22s390/dump: Fix dump memory detectionMichael Holzheu
commit d7736ff5be31edaa4fe5ab62810c64529a24b149 upstream. Dumps created by kdump or zfcpdump can contain invalid memory holes when dumping z/VM systems that have memory pressure. For example: # zgetdump -i /proc/vmcore. Memory map: 0000000000000000 - 0000000000bfffff (12 MB) 0000000000e00000 - 00000000014fffff (7 MB) 000000000bd00000 - 00000000f3bfffff (3711 MB) The memory detection function find_memory_chunks() issues tprot to find valid memory chunks. In case of CMM it can happen that pages are marked as unstable via set_page_unstable() in arch_free_page(). If z/VM has released that pages, tprot returns -EFAULT and indicates a memory hole. So fix this and switch off CMM in case of kdump or zfcpdump. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22mac80211: fix fragmentation code, particularly for encryptionJohannes Berg
commit 338f977f4eb441e69bb9a46eaa0ac715c931a67f upstream. The "new" fragmentation code (since my rewrite almost 5 years ago) erroneously sets skb->len rather than using skb_trim() to adjust the length of the first fragment after copying out all the others. This leaves the skb tail pointer pointing to after where the data originally ended, and thus causes the encryption MIC to be written at that point, rather than where it belongs: immediately after the data. The impact of this is that if software encryption is done, then a) encryption doesn't work for the first fragment, the connection becomes unusable as the first fragment will never be properly verified at the receiver, the MIC is practically guaranteed to be wrong b) we leak up to 8 bytes of plaintext (!) of the packet out into the air This is only mitigated by the fact that many devices are capable of doing encryption in hardware, in which case this can't happen as the tail pointer is irrelevant in that case. Additionally, fragmentation is not used very frequently and would normally have to be configured manually. Fix this by using skb_trim() properly. Fixes: 2de8e0d999b8 ("mac80211: rewrite fragmentation") Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmemEric W. Biederman
commit 96c7a2ff21501691587e1ae969b83cbec8b78e08 upstream. Recently due to a spike in connections per second memcached on 3 separate boxes triggered the OOM killer from accept. At the time the OOM killer was triggered there was 4GB out of 36GB free in zone 1. The problem was that alloc_fdtable was allocating an order 3 page (32KiB) to hold a bitmap, and there was sufficient fragmentation that the largest page available was 8KiB. I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious but I do agree that order 3 allocations are very likely to succeed. There are always pathologies where order > 0 allocations can fail when there are copious amounts of free memory available. Using the pigeon hole principle it is easy to show that it requires 1 page more than 50% of the pages being free to guarantee an order 1 (8KiB) allocation will succeed, 1 page more than 75% of the pages being free to guarantee an order 2 (16KiB) allocation will succeed and 1 page more than 87.5% of the pages being free to guarantee an order 3 allocate will succeed. A server churning memory with a lot of small requests and replies like memcached is a common case that if anything can will skew the odds against large pages being available. Therefore let's not give external applications a practical way to kill linux server applications, and specify __GFP_NORETRY to the kmalloc in alloc_fdmem. Unless I am misreading the code and by the time the code reaches should_alloc_retry in __alloc_pages_slowpath (where __GFP_NORETRY becomes signification). We have already tried everything reasonable to allocate a page and the only thing left to do is wait. So not waiting and falling back to vmalloc immediately seems like the reasonable thing to do even if there wasn't a chance of triggering the OOM killer. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Cong Wang <cwang@twopensource.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22xen-blkfront: handle backend CLOSED without CLOSINGDavid Vrabel
commit 3661371701e714f0cea4120f6a365340858fb4e4 upstream. Backend drivers shouldn't transistion to CLOSED unless the frontend is CLOSED. If a backend does transition to CLOSED too soon then the frontend may not see the CLOSING state and will not properly shutdown. So, treat an unexpected backend CLOSED state the same as CLOSING. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20Linux 3.4.81v3.4.81Greg Kroah-Hartman
2014-02-20nfs: tear down caches in nfs_init_writepagecache when allocation failsJeff Layton
commit 3dd4765fce04c0b4af1e0bc4c0b10f906f95fabc upstream. ...and ensure that we tear down the nfs_commit_data cache too when unloading the module. Cc: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: drop the nfs_cdata_cachep cleanup; it doesn't exist] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20lib/vsprintf.c: kptr_restrict: fix pK-error in SysRq show-all-timers(Q)Dan Rosenberg
commit 3715c5309f6d175c3053672b73fd4f73be16fd07 upstream. When using ALT+SysRq+Q all the pointers are replaced with "pK-error" like this: [23153.208033] .base: pK-error with echo h > /proc/sysrq-trigger it works: [23107.776363] .base: ffff88023e60d540 The intent behind this behavior was to return "pK-error" in cases where the %pK format specifier was used in interrupt context, because the CAP_SYSLOG check wouldn't be meaningful. Clearly this should only apply when kptr_restrict is actually enabled though. Reported-by: Stevie Trujillo <stevie.trujillo@gmail.com> Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20virtio-blk: Use block layer provided spinlockAsias He
commit 2c95a3290919541b846bee3e0fbaa75860929f53 upstream. Block layer will allocate a spinlock for the queue if the driver does not provide one in blk_init_queue(). The reason to use the internal spinlock is that blk_cleanup_queue() will switch to use the internal spinlock in the cleanup code path. if (q->queue_lock != &q->__queue_lock) q->queue_lock = &q->__queue_lock; However, processes which are in D state might have taken the driver provided spinlock, when the processes wake up, they would release the block provided spinlock. ===================================== [ BUG: bad unlock balance detected! ] 3.4.0-rc7+ #238 Not tainted ------------------------------------- fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at: [<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380 but there are no more locks to release! other info that might help us debug this: 1 lock held by fio/3587: #0: (&(&vblk->lock)->rlock){......}, at: [<ffffffff8132661a>] get_request_wait+0x19a/0x250 Other drivers use block layer provided spinlock as well, e.g. SCSI. Switching to the block layer provided spinlock saves a bit of memory and does not increase lock contention. Performance test shows no real difference is observed before and after this patch. Changes in v2: Improve commit log as Michael suggested. Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Signed-off-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20Input: synaptics - handle out of bounds values from the hardwareSeth Forshee
commit c0394506e69b37c47d391c2a7bbea3ea236d8ec8 upstream. The touchpad on the Acer Aspire One D250 will report out of range values in the extreme lower portion of the touchpad. These appear as abrupt changes in the values reported by the hardware from very low values to very high values, which can cause unexpected vertical jumps in the position of the mouse pointer. What seems to be happening is that the value is wrapping to a two's compliment negative value of higher resolution than the 13-bit value reported by the hardware, with the high-order bits being truncated. This patch adds handling for these values by converting them to the appropriate negative values. The only tricky part about this is deciding when to treat a number as negative. It stands to reason that if out of range values can be reported on the low end then it could also happen on the high end, so not all out of range values should be treated as negative. The approach taken here is to split the difference between the maximum legitimate value for the axis and the maximum possible value that the hardware can report, treating values greater than this number as negative and all other values as positive. This can be tweaked later if hardware is found that operates outside of these parameters. BugLink: http://bugs.launchpad.net/bugs/1001251 Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Reviewed-by: Daniel Kurtz <djkurtz@chromium.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20PM / Hibernate: Hibernate/thaw fixes/improvementsBojan Smojver
commit 5a21d489fd9541a4a66b9a500659abaca1b19a51 upstream. 1. Do not allocate memory for buffers from emergency pools, unless absolutely required. Do not warn about and do not retry non-essential failed allocations. 2. Do not check the amount of free pages left on every single page write, but wait until one map is completely populated and then check. 3. Set maximum number of pages for read buffering consistently, instead of inadvertently depending on the size of the sector type. 4. Fix copyright line, which I missed when I submitted the hibernation threading patch. 5. Dispense with bit shifting arithmetic to improve readability. 6. Really recalculate the number of pages required to be free after all allocations have been done. 7. Fix calculation of pages required for read buffering. Only count in pages that do not belong to high memory. Signed-off-by: Bojan Smojver <bojan@rexursive.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20KVM: Fix buffer overflow in kvm_set_irq()Avi Kivity
commit f2ebd422f71cda9c791f76f85d2ca102ae34a1ed upstream. kvm_set_irq() has an internal buffer of three irq routing entries, allowing connecting a GSI to three IRQ chips or on MSI. However setup_routing_entry() does not properly enforce this, allowing three irqchip routes followed by an MSI route to overflow the buffer. Fix by ensuring that an MSI entry is added to an empty list. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20target/file: Re-enable optional fd_buffered_io=1 operationNicholas Bellinger
commit b32f4c7ed85c5cee2a21a55c9f59ebc9d57a2463 upstream. This patch re-adds the ability to optionally run in buffered FILEIO mode (eg: w/o O_DSYNC) for device backends in order to once again use the Linux buffered cache as a write-back storage mechanism. This logic was originally dropped with mainline v3.5-rc commit: commit a4dff3043c231d57f982af635c9d2192ee40e5ae Author: Nicholas Bellinger <nab@linux-iscsi.org> Date: Wed May 30 16:25:41 2012 -0700 target/file: Use O_DSYNC by default for FILEIO backends This difference with this patch is that fd_create_virtdevice() now forces the explicit setting of emulate_write_cache=1 when buffered FILEIO operation has been enabled. (v2: Switch to FDBD_HAS_BUFFERED_IO_WCE + add more detailed comment as requested by hch) Reported-by: Ferry <iscsitmp@bananateam.nl> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20target/file: Use O_DSYNC by default for FILEIO backendsNicholas Bellinger
commit a4dff3043c231d57f982af635c9d2192ee40e5ae upstream. Convert to use O_DSYNC for all cases at FILEIO backend creation time to avoid the extra syncing of pure timestamp updates with legacy O_SYNC during default operation as recommended by hch. Continue to do this independently of Write Cache Enable (WCE) bit, as WCE=0 is currently the default for all backend devices and enabled by user on per device basis via attrib/emulate_write_cache. This patch drops the now unnecessary fd_buffered_io= token usage that was originally signalling when to explictly disable O_SYNC at backend creation time for buffered I/O operation. This can end up being dangerous for a number of reasons during physical node failure, so go ahead and drop this option for now when O_DSYNC is used as the default. Also allow explict FUA WRITEs -> vfs_fsync_range() call to function in fd_execute_cmd() independently of WCE bit setting. Reported-by: Christoph Hellwig <hch@lst.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> [bwh: Backported to 3.2: - We have fd_do_task() and not fd_execute_cmd() - Various fields are in struct se_task rather than struct se_cmd - fd_create_virtdevice() flags initialisation hasn't been cleaned up] Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast()Jan Kara
commit 603e7729920e42b3c2f4dbfab9eef4878cb6e8fa upstream. qib_user_sdma_queue_pkts() gets called with mmap_sem held for writing. Except for get_user_pages() deep down in qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all. Even more interestingly the function qib_user_sdma_queue_pkts() (and also qib_user_sdma_coalesce() called somewhat later) call copy_from_user() which can hit a page fault and we deadlock on trying to get mmap_sem when handling that fault. So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and leave mmap_sem locking for mm. This deadlock has actually been observed in the wild when the node is under memory pressure. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Roland Dreier <roland@purestorage.com> [Backported to 3.4: (Thank to Ben Hutchings) - Adjust context - Adjust indentation and nr_pages argument in qib_user_sdma_pin_pages()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20sched/nohz: Fix rq->cpu_load calculations some morePeter Zijlstra
commit 5aaa0b7a2ed5b12692c9ffb5222182bd558d3146 upstream. Follow up on commit 556061b00 ("sched/nohz: Fix rq->cpu_load[] calculations") since while that fixed the busy case it regressed the mostly idle case. Add a callback from the nohz exit to also age the rq->cpu_load[] array. This closes the hole where either there was no nohz load balance pass during the nohz, or there was a 'significant' amount of idle time between the last nohz balance and the nohz exit. So we'll update unconditionally from the tick to not insert any accidental 0 load periods while busy, and we try and catch up from nohz idle balance and nohz exit. Both these are still prone to missing a jiffy, but that has always been the case. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: pjt@google.com Cc: Venkatesh Pallipadi <venki@google.com> Link: http://lkml.kernel.org/n/tip-kt0trz0apodbf84ucjfdbr1a@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20sched/nohz: Fix rq->cpu_load[] calculationsPeter Zijlstra
commit 556061b00c9f2fd6a5524b6bde823ef12f299ecf upstream. While investigating why the load-balancer did funny I found that the rq->cpu_load[] tables were completely screwy.. a bit more digging revealed that the updates that got through were missing ticks followed by a catchup of 2 ticks. The catchup assumes the cpu was idle during that time (since only nohz can cause missed ticks and the machine is idle etc..) this means that esp. the higher indices were significantly lower than they ought to be. The reason for this is that its not correct to compare against jiffies on every jiffy on any other cpu than the cpu that updates jiffies. This patch cludges around it by only doing the catch-up stuff from nohz_idle_balance() and doing the regular stuff unconditionally from the tick. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: pjt@google.com Cc: Venkatesh Pallipadi <venki@google.com> Link: http://lkml.kernel.org/n/tip-tp4kj18xdd5aj4vvj0qg55s2@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20ftrace: Have function graph only trace based on global_ops filtersSteven Rostedt
commit 23a8e8441a0a74dd612edf81dc89d1600bc0a3d1 upstream. Doing some different tests, I discovered that function graph tracing, when filtered via the set_ftrace_filter and set_ftrace_notrace files, does not always keep with them if another function ftrace_ops is registered to trace functions. The reason is that function graph just happens to trace all functions that the function tracer enables. When there was only one user of function tracing, the function graph tracer did not need to worry about being called by functions that it did not want to trace. But now that there are other users, this becomes a problem. For example, one just needs to do the following: # cd /sys/kernel/debug/tracing # echo schedule > set_ftrace_filter # echo function_graph > current_tracer # cat trace [..] 0) | schedule() { ------------------------------------------ 0) <idle>-0 => rcu_pre-7 ------------------------------------------ 0) ! 2980.314 us | } 0) | schedule() { ------------------------------------------ 0) rcu_pre-7 => <idle>-0 ------------------------------------------ 0) + 20.701 us | } # echo 1 > /proc/sys/kernel/stack_tracer_enabled # cat trace [..] 1) + 20.825 us | } 1) + 21.651 us | } 1) + 30.924 us | } /* SyS_ioctl */ 1) | do_page_fault() { 1) | __do_page_fault() { 1) 0.274 us | down_read_trylock(); 1) 0.098 us | find_vma(); 1) | handle_mm_fault() { 1) | _raw_spin_lock() { 1) 0.102 us | preempt_count_add(); 1) 0.097 us | do_raw_spin_lock(); 1) 2.173 us | } 1) | do_wp_page() { 1) 0.079 us | vm_normal_page(); 1) 0.086 us | reuse_swap_page(); 1) 0.076 us | page_move_anon_rmap(); 1) | unlock_page() { 1) 0.082 us | page_waitqueue(); 1) 0.086 us | __wake_up_bit(); 1) 1.801 us | } 1) 0.075 us | ptep_set_access_flags(); 1) | _raw_spin_unlock() { 1) 0.098 us | do_raw_spin_unlock(); 1) 0.105 us | preempt_count_sub(); 1) 1.884 us | } 1) 9.149 us | } 1) + 13.083 us | } 1) 0.146 us | up_read(); When the stack tracer was enabled, it enabled all functions to be traced, which now the function graph tracer also traces. This is a side effect that should not occur. To fix this a test is added when the function tracing is changed, as well as when the graph tracer is enabled, to see if anything other than the ftrace global_ops function tracer is enabled. If so, then the graph tracer calls a test trampoline that will look at the function that is being traced and compare it with the filters defined by the global_ops. As an optimization, if there's no other function tracers registered, or if the only registered function tracers also use the global ops, the function graph infrastructure will call the registered function graph callback directly and not go through the test trampoline. Fixes: d2d45c7a03a2 "tracing: Have stack_tracer use a separate list of functions" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20ftrace: Fix synchronization location disabling and freeing ftrace_opsSteven Rostedt
commit a4c35ed241129dd142be4cadb1e5a474a56d5464 upstream. The synchronization needed after ftrace_ops are unregistered must happen after the callback is disabled from becing called by functions. The current location happens after the function is being removed from the internal lists, but not after the function callbacks were disabled, leaving the functions susceptible of being called after their callbacks are freed. This affects perf and any externel users of function tracing (LTTng and SystemTap). Fixes: cdbe61bfe704 "ftrace: Allow dynamically allocated function tracers" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20ftrace: Synchronize setting function_trace_op with ftrace_trace_functionSteven Rostedt
commit 405e1d834807e51b2ebd3dea81cb51e53fb61504 upstream. [ Partial commit backported to 3.4. The ftrace_sync() code by this is required for other fixes that 3.4 needs. ] ftrace_trace_function is a variable that holds what function will be called directly by the assembly code (mcount). If just a single function is registered and it handles recursion itself, then the assembly will call that function directly without any helper function. It also passes in the ftrace_op that was registered with the callback. The ftrace_op to send is stored in the function_trace_op variable. The ftrace_trace_function and function_trace_op needs to be coordinated such that the called callback wont be called with the wrong ftrace_op, otherwise bad things can happen if it expected a different op. Luckily, there's no callback that doesn't use the helper functions that requires this. But there soon will be and this needs to be fixed. Use a set_function_trace_op to store the ftrace_op to set the function_trace_op to when it is safe to do so (during the update function within the breakpoint or stop machine calls). Or if dynamic ftrace is not being used (static tracing) then we have to do a bit more synchronization when the ftrace_trace_function is set as that takes affect immediately (as oppose to dynamic ftrace doing it with the modification of the trampoline). Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20dm sysfs: fix a module unload raceMikulas Patocka
commit 2995fa78e423d7193f3b57835f6c1c75006a0315 upstream. This reverts commit be35f48610 ("dm: wait until embedded kobject is released before destroying a device") and provides an improved fix. The kobject release code that calls the completion must be placed in a non-module file, otherwise there is a module unload race (if the process calling dm_kobject_release is preempted and the DM module unloaded after the completion is triggered, but before dm_kobject_release returns). To fix this race, this patch moves the completion code to dm-builtin.c which is always compiled directly into the kernel if BLK_DEV_DM is selected. The patch introduces a new dm_kobject_holder structure, its purpose is to keep the completion and kobject in one place, so that it can be accessed from non-module code without the need to export the layout of struct mapped_device to that code. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20mm: setup pageblock_order before it's used by sparsememXishi Qiu
commit ca57df79d4f64e1a4886606af4289d40636189c5 upstream. On architectures with CONFIG_HUGETLB_PAGE_SIZE_VARIABLE set, such as Itanium, pageblock_order is a variable with default value of 0. It's set to the right value by set_pageblock_order() in function free_area_init_core(). But pageblock_order may be used by sparse_init() before free_area_init_core() is called along path: sparse_init() ->sparse_early_usemaps_alloc_node() ->usemap_size() ->SECTION_BLOCKFLAGS_BITS ->((1UL << (PFN_SECTION_SHIFT - pageblock_order)) * NR_PAGEBLOCK_BITS) The uninitialized pageblock_size will cause memory wasting because usemap_size() returns a much bigger value then it's really needed. For example, on an Itanium platform, sparse_init() pageblock_order=0 usemap_size=24576 free_area_init_core() before pageblock_order=0, usemap_size=24576 free_area_init_core() after pageblock_order=12, usemap_size=8 That means 24K memory has been wasted for each section, so fix it by calling set_pageblock_order() from sparse_init(). Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Jiang Liu <liuj97@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Keping Chen <chenkeping@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [lizf: Backported to 3.4: adjust context] Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20mm/page_alloc.c: remove pageblock_default_order()Andrew Morton
commit 955c1cd7401565671b064e499115344ec8067dfd upstream. This has always been broken: one version takes an unsigned int and the other version takes no arguments. This bug was hidden because one version of set_pageblock_order() was a macro which doesn't evaluate its argument. Simplify it all and remove pageblock_default_order() altogether. Reported-by: rajman mekaco <rajman.mekaco@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [lizf: Backported to 3.4: adjust context] Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20drm/i915: kick any firmware framebuffers before claiming the gttDaniel Vetter
commit 9f846a16d213523fbe6daea17e20df6b8ac5a1e5 upstream. Especially vesafb likes to map everything as uc- (yikes), and if that mapping hangs around still while we try to map the gtt as wc the kernel will downgrade our request to uc-, resulting in abyssal performance. Unfortunately we can't do this as early as readon does (i.e. as the first thing we do when initializing the hw) because our fb/mmio space region moves around on a per-gen basis. So I've had to move it below the gtt initialization, but that seems to work, too. The important thing is that we do this before we set up the gtt wc mapping. Now an altogether different question is why people compile their kernels with vesafb enabled, but I guess making things just work isn't bad per se ... v2: - s/radeondrmfb/inteldrmfb/ - fix up error handling v3: Kill #ifdef X86, this is Intel after all. Noticed by Ben Widawsky. v4: Jani Nikula complained about the pointless bool primary initialization. v5: Don't oops if we can't allocate, noticed by Chris Wilson. v6: Resolve conflicts with agp rework and fixup whitespace. This is commit e188719a2891f01b3100d in drm-next. Backport to 3.5 -fixes queue requested by Dave Airlie - due to grub using vesa on fedora their initrd seems to load vesafb before loading the real kms driver. So tons more people actually experience a dead-slow gpu. Hence also the Cc: stable. Reported-and-tested-by: "Kilarski, Bernard R" <bernard.r.kilarski@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com> [lizf: Backported to 3.4: adjust context] Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20ext4: protect group inode free counting with group lockTao Ma
commit 6f2e9f0e7d795214b9cf5a47724a273b705fd113 upstream. Now when we set the group inode free count, we don't have a proper group lock so that multiple threads may decrease the inode free count at the same time. And e2fsck will complain something like: Free inodes count wrong for group #1 (1, counted=0). Fix? no Free inodes count wrong for group #2 (3, counted=0). Fix? no Directories count wrong for group #2 (780, counted=779). Fix? no Free inodes count wrong for group #3 (2272, counted=2273). Fix? no So this patch try to protect it with the ext4_lock_group. btw, it is found by xfstests test case 269 and the volume is mkfsed with the parameter "-O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr" and I have run it 100 times and the error in e2fsck doesn't show up again. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20printk: Fix scheduling-while-atomic problem in console_cpu_notify()Paul E. McKenney
commit 85eae82a0855d49852b87deac8653e4ebc8b291f upstream. The console_cpu_notify() function runs with interrupts disabled in the CPU_DYING case. It therefore cannot block, for example, as will happen when it calls console_lock(). Therefore, remove the CPU_DYING leg of the switch statement to avoid this problem. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Guillaume Morin <guillaume@morinfr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=yPeter Oberparleiter
commit 6583327c4dd55acbbf2a6f25e775b28b3abf9a42 upstream. Commit d61931d89b, "x86: Add optimized popcnt variants" introduced compile flag -fcall-saved-rdi for lib/hweight.c. When combined with options -fprofile-arcs and -O2, this flag causes gcc to generate broken constructor code. As a result, a 64 bit x86 kernel compiled with CONFIG_GCOV_PROFILE_ALL=y prints message "gcov: could not create file" and runs into sproadic BUGs during boot. The gcc people indicate that these kinds of problems are endemic when using ad hoc calling conventions. It is therefore best to treat any file compiled with ad hoc calling conventions as an isolated environment and avoid things like profiling or coverage analysis, since those subsystems assume a "normal" calling conventions. This patch avoids the bug by excluding lib/hweight.o from coverage profiling. Reported-by: Meelis Roos <mroos@linux.ee> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/52F3A30C.7050205@linux.vnet.ibm.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irqKOSAKI Motohiro
commit 227d53b397a32a7614667b3ecaf1d89902fb6c12 upstream. To use spin_{un}lock_irq is dangerous if caller disabled interrupt. During aio buffer migration, we have a possibility to see the following call stack. aio_migratepage [disable interrupt] migrate_page_copy clear_page_dirty_for_io set_page_dirty __set_page_dirty_buffers __set_page_dirty spin_lock_irq This mean, current aio migration is a deadlockable. spin_lock_irqsave is a safer alternative and we should use it. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reported-by: David Rientjes rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of ↵KOSAKI Motohiro
spin_lock_irq() commit a85d9df1ea1d23682a0ed1e100e6965006595d06 upstream. During aio stress test, we observed the following lockdep warning. This mean AIO+numa_balancing is currently deadlockable. The problem is, aio_migratepage disable interrupt, but __set_page_dirty_nobuffers unintentionally enable it again. Generally, all helper function should use spin_lock_irqsave() instead of spin_lock_irq() because they don't know caller at all. other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&ctx->completion_lock)->rlock); <Interrupt> lock(&(&ctx->completion_lock)->rlock); *** DEADLOCK *** dump_stack+0x19/0x1b print_usage_bug+0x1f7/0x208 mark_lock+0x21d/0x2a0 mark_held_locks+0xb9/0x140 trace_hardirqs_on_caller+0x105/0x1d0 trace_hardirqs_on+0xd/0x10 _raw_spin_unlock_irq+0x2c/0x50 __set_page_dirty_nobuffers+0x8c/0xf0 migrate_page_copy+0x434/0x540 aio_migratepage+0xb1/0x140 move_to_new_page+0x7d/0x230 migrate_pages+0x5e5/0x700 migrate_misplaced_page+0xbc/0xf0 do_numa_page+0x102/0x190 handle_pte_fault+0x241/0x970 handle_mm_fault+0x265/0x370 __do_page_fault+0x172/0x5a0 do_page_fault+0x1a/0x70 page_fault+0x28/0x30 Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <jweiner@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>