linux/drivers/block, branch v3.4.96

virtio-blk: Don't free ida when disk is in use

2014-06-11T19:04:18Z

commit f4953fe6c4aeada2d5cafd78aa97587a46d2d8f9 upstream. When a file system is mounted on a virtio-blk disk, we then remove it and then reattach it, the reattached disk gets the same disk name and ids as the hot removed one. This leads to very nasty effects - mostly rendering the newly attached device completely unusable. Trying what happens when I do the same thing with a USB device, I saw that the sd node simply doesn't get free'd when a device gets forcefully removed. Imitate the same behavior for vd devices. This way broken vd devices simply are never free'd and newly attached ones keep working just fine. Signed-off-by: Alexander Graf Signed-off-by: Rusty Russell Signed-off-by: Ben Hutchings Cc: Yijing Wang Signed-off-by: Greg Kroah-Hartman

virtio-blk: Reset device after blk_cleanup_queue()

2014-06-11T19:04:17Z

commit 483001c765af6892b3fc3726576cb42f17d1d6b5 upstream. blk_cleanup_queue() will call blk_drian_queue() to drain all the requests before queue DEAD marking. If we reset the device before blk_cleanup_queue() the drain would fail. 1) if the queue is stopped in do_virtblk_request() because device is full, the q->request_fn() will not be called. blk_drain_queue() { while(true) { ... if (!list_empty(&q->queue_head)) __blk_run_queue(q) { if (queue is not stoped) q->request_fn() } ... } } Do no reset the device before blk_cleanup_queue() gives the chance to start the queue in interrupt handler blk_done(). 2) In commit b79d866c8b7014a51f611a64c40546109beaf24a, We abort requests dispatched to driver before blk_cleanup_queue(). There is a race if requests are dispatched to driver after the abort and before the queue DEAD mark. To fix this, instead of aborting the requests explicitly, we can just reset the device after after blk_cleanup_queue so that the device can complete all the requests before queue DEAD marking in the drain process. Cc: Rusty Russell Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Signed-off-by: Asias He Acked-by: Michael S. Tsirkin Signed-off-by: Rusty Russell Signed-off-by: Ben Hutchings Cc: Yijing Wang Signed-off-by: Greg Kroah-Hartman

virtio-blk: Call del_gendisk() before disable guest kick

2014-06-11T19:04:17Z

commit 02e2b124943648fba0a2ccee5c3656a5653e0151 upstream. del_gendisk() might not return due to failing to remove the /sys/block/vda/serial sysfs entry when another thread (udev) is trying to read it. virtblk_remove() vdev->config->reset() : guest will not kick us through interrupt del_gendisk() device_del() kobject_del(): got stuck, sysfs entry ref count non zero sysfs_open_file(): user space process read /sys/block/vda/serial sysfs_get_active() : got sysfs entry ref count dev_attr_show() virtblk_serial_show() blk_execute_rq() : got stuck, interrupt is disabled request cannot be finished This patch fixes it by calling del_gendisk() before we disable guest's interrupt so that the request sent in virtblk_serial_show() will be finished and del_gendisk() will success. This fixes another race in hot-unplug process. It is save to call del_gendisk(vblk->disk) before flush_work(&vblk->config_work) which might access vblk->disk, because vblk->disk is not freed until put_disk(vblk->disk). Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Signed-off-by: Asias He Acked-by: Michael S. Tsirkin Signed-off-by: Rusty Russell Signed-off-by: Ben Hutchings Cc: Yijing Wang Signed-off-by: Greg Kroah-Hartman

virtio-blk: Fix hot-unplug race in remove method

2014-06-11T19:04:17Z

commit b79d866c8b7014a51f611a64c40546109beaf24a upstream. If we reset the virtio-blk device before the requests already dispatched to the virtio-blk driver from the block layer are finised, we will stuck in blk_cleanup_queue() and the remove will fail. blk_cleanup_queue() calls blk_drain_queue() to drain all requests queued before DEAD marking. However it will never success if the device is already stopped. We'll have q->in_flight[] > 0, so the drain will not finish. How to reproduce the race: 1. hot-plug a virtio-blk device 2. keep reading/writing the device in guest 3. hot-unplug while the device is busy serving I/O Test: ~1000 rounds of hot-plug/hot-unplug test passed with this patch. Changes in v3: - Drop blk_abort_queue and blk_abort_request - Use __blk_end_request_all to complete request dispatched to driver Changes in v2: - Drop req_in_flight - Use virtqueue_detach_unused_buf to get request dispatched to driver Signed-off-by: Asias He Signed-off-by: Rusty Russell Signed-off-by: Ben Hutchings Cc: Yijing Wang Signed-off-by: Greg Kroah-Hartman

virtio_blk: Drop unused request tracking list

2014-06-11T19:04:17Z

commit f65ca1dc6a8c81c6bd72297d4399ec5f4c1f3a01 upstream. Benchmark shows small performance improvement on fusion io device. Before: seq-read : io=1,024MB, bw=19,982KB/s, iops=39,964, runt= 52475msec seq-write: io=1,024MB, bw=20,321KB/s, iops=40,641, runt= 51601msec rnd-read : io=1,024MB, bw=15,404KB/s, iops=30,808, runt= 68070msec rnd-write: io=1,024MB, bw=14,776KB/s, iops=29,552, runt= 70963msec After: seq-read : io=1,024MB, bw=20,343KB/s, iops=40,685, runt= 51546msec seq-write: io=1,024MB, bw=20,803KB/s, iops=41,606, runt= 50404msec rnd-read : io=1,024MB, bw=16,221KB/s, iops=32,442, runt= 64642msec rnd-write: io=1,024MB, bw=15,199KB/s, iops=30,397, runt= 68991msec Signed-off-by: Asias He Signed-off-by: Rusty Russell [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings Cc: Yijing Wang Signed-off-by: Greg Kroah-Hartman

nbd: fsync and kill block device on shutdown

2014-06-07T23:02:10Z

commit 3a2d63f87989e01437ba994df5f297528c353d7d upstream. There are two problems with shutdown in the NBD driver. 1: Receiving the NBD_DISCONNECT ioctl does not sync the filesystem. This patch adds the sync operation into __nbd_ioctl()'s NBD_DISCONNECT handler. This is useful because BLKFLSBUF is restricted to processes that have CAP_SYS_ADMIN, and the NBD client may not possess it (fsync of the block device does not sync the filesystem, either). 2: Once we clear the socket we have no guarantee that later reads will come from the same backing storage. The patch adds calls to kill_bdev() in __nbd_ioctl()'s socket clearing code so the page cache is cleaned, lest reads that hit on the page cache will return stale data from the previously-accessible disk. Example: # qemu-nbd -r -c/dev/nbd0 /dev/sr0 # file -s /dev/nbd0 /dev/stdin: # UDF filesystem data (version 1.5) etc. # qemu-nbd -d /dev/nbd0 # qemu-nbd -r -c/dev/nbd0 /dev/sda # file -s /dev/nbd0 /dev/stdin: # UDF filesystem data (version 1.5) etc. While /dev/sda has: # file -s /dev/sda /dev/sda: x86 boot sector; etc. Signed-off-by: Paolo Bonzini Acked-by: Paul Clements Cc: Alex Bligh Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.2: - Adjusted context - s/\bnbd\b/lo/ - Incorporate export of kill_bdev() from commit ff01bb483265 ('fs: move code out of buffer.c')] Signed-off-by: Ben Hutchings [hq: Backported to 3.4: Adjusted context] Signed-off-by: Qiang Huang Signed-off-by: Greg Kroah-Hartman

floppy: properly handle failure on add_disk loop

2014-06-07T23:02:06Z

commit d60e7ec18c3fb2cbf90969ccd42889eb2d03aef9 upstream. On floppy initialization, if something failed inside the loop we call add_disk, there was no cleanup of previous iterations in the error handling. Signed-off-by: Herton Ronaldo Krzesinski Signed-off-by: Jiri Kosina Signed-off-by: Jens Axboe [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings Cc: Qiang Huang Signed-off-by: Greg Kroah-Hartman

floppy: don't write kernel-only members to FDRAWCMD ioctl output

2014-05-13T12:11:29Z

commit 2145e15e0557a01b9195d1c7199a1b92cb9be81f upstream. Do not leak kernel-only floppy_raw_cmd structure members to userspace. This includes the linked-list pointer and the pointer to the allocated DMA space. Signed-off-by: Matthew Daley Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman

floppy: ignore kernel-only members in FDRAWCMD ioctl input

2014-05-13T12:11:29Z

commit ef87dbe7614341c2e7bfe8d32fcb7028cc97442c upstream. Always clear out these floppy_raw_cmd struct members after copying the entire structure from userspace so that the in-kernel version is always valid and never left in an interdeterminate state. Signed-off-by: Matthew Daley Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman

xen/blkback: Check for insane amounts of request on the ring (v6).

2014-03-11T23:10:07Z

commit 9371cadbbcc7c00c81753b9727b19fb3bc74d458 upstream. commit 8e3f8755545cc4a7f4da8e9ef76d6d32e0dca576 upstream. Check that the ring does not have an insane amount of requests (more than there could fit on the ring). If we detect this case we will stop processing the requests and wait until the XenBus disconnects the ring. The existing check RING_REQUEST_CONS_OVERFLOW which checks for how many responses we have created in the past (rsp_prod_pvt) vs requests consumed (req_cons) and whether said difference is greater or equal to the size of the ring, does not catch this case. Wha the condition does check if there is a need to process more as we still have a backlog of responses to finish. Note that both of those values (rsp_prod_pvt and req_cons) are not exposed on the shared ring. To understand this problem a mini crash course in ring protocol response/request updates is in place. There are four entries: req_prod and rsp_prod; req_event and rsp_event to track the ring entries. We are only concerned about the first two - which set the tone of this bug. The req_prod is a value incremented by frontend for each request put on the ring. Conversely the rsp_prod is a value incremented by the backend for each response put on the ring (rsp_prod gets set by rsp_prod_pvt when pushing the responses on the ring). Both values can wrap and are modulo the size of the ring (in block case that is 32). Please see RING_GET_REQUEST and RING_GET_RESPONSE for the more details. The culprit here is that if the difference between the req_prod and req_cons is greater than the ring size we have a problem. Fortunately for us, the '__do_block_io_op' loop: rc = blk_rings->common.req_cons; rp = blk_rings->common.sring->req_prod; while (rc != rp) { .. blk_rings->common.req_cons = ++rc; /* before make_response() */ } will loop up to the point when rc == rp. The macros inside of the loop (RING_GET_REQUEST) is smart and is indexing based on the modulo of the ring size. If the frontend has provided a bogus req_prod value we will loop until the 'rc == rp' - which means we could be processing already processed requests (or responses) often. The reason the RING_REQUEST_CONS_OVERFLOW is not helping here is b/c it only tracks how many responses we have internally produced and whether we would should process more. The astute reader will notice that the macro RING_REQUEST_CONS_OVERFLOW provides two arguments - more on this later. For example, if we were to enter this function with these values: blk_rings->common.sring->req_prod = X+31415 (X is the value from the last time __do_block_io_op was called). blk_rings->common.req_cons = X blk_rings->common.rsp_prod_pvt = X The RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, blk_rings->common.req_cons) is doing: req_cons - rsp_prod_pvt >= 32 Which is, X - X >= 32 or 0 >= 32 And that is false, so we continue on looping (this bug). If we re-use said macro RING_REQUEST_CONS_OVERFLOW and pass in the rp instead (sring->req_prod) of rc, the this macro can do the check: req_prod - rsp_prov_pvt >= 32 Which is, X + 31415 - X >= 32 , or 31415 >= 32 which is true, so we can error out and break out of the function. Unfortunatly the difference between rsp_prov_pvt and req_prod can be at 32 (which would error out in the macro). This condition exists when the backend is lagging behind with the responses and still has not finished responding to all of them (so make_response has not been called), and the rsp_prov_pvt + 32 == req_cons. This ends up with us not being able to use said macro. Hence introducing a new macro called RING_REQUEST_PROD_OVERFLOW which does a simple check of: req_prod - rsp_prod_pvt > RING_SIZE And with the X values from above: X + 31415 - X > 32 Returns true. Also not that if the ring is full (which is where the RING_REQUEST_CONS_OVERFLOW triggered), we would not hit the same condition: X + 32 - X > 32 Which is false. Lets use that macro. Note that in v5 of this patchset the macro was different - we used an earlier version. [v1: Move the check outside the loop] [v2: Add a pr_warn as suggested by David] [v3: Use RING_REQUEST_CONS_OVERFLOW as suggested by Jan] [v4: Move wake_up after kthread_stop as suggested by Jan] [v5: Use RING_REQUEST_PROD_OVERFLOW instead] [v6: Use RING_REQUEST_PROD_OVERFLOW - Jan's version] Signed-off-by: Konrad Rzeszutek Wilk Reviewed-by: Jan Beulich [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings Cc: Yijing Wang Signed-off-by: Greg Kroah-Hartman