linux/drivers/md, branch v3.10.43

md: always set MD_RECOVERY_INTR when interrupting a reshape thread.

2014-06-11T19:03:24Z

commit 2ac295a544dcae9299cba13ce250419117ae7fd1 upstream. Commit 8313b8e57f55b15e5b7f7fc5d1630bbf686a9a97 md: fix problem when adding device to read-only array with bitmap. added a called to md_reap_sync_thread() which cause a reshape thread to be interrupted (in particular, it could cause md_thread() to never even call md_do_sync()). However it didn't set MD_RECOVERY_INTR so ->finish_reshape() would not know that the reshape didn't complete. This only happens when mddev->ro is set and normally reshape threads don't run in that situation. But raid5 and raid10 can start a reshape thread during "run" is the array is in the middle of a reshape. They do this even if ->ro is set. So it is best to set MD_RECOVERY_INTR before abortingg the sync thread, just in case. Though it rare for this to trigger a problem it can cause data corruption because the reshape isn't finished properly. So it is suitable for any stable which the offending commit was applied to. (3.2 or later) Fixes: 8313b8e57f55b15e5b7f7fc5d1630bbf686a9a97 Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman

md: always set MD_RECOVERY_INTR when aborting a reshape or other "resync".

2014-06-11T19:03:24Z

commit 3991b31ea072b070081ca3bfa860a077eda67de5 upstream. If mddev->ro is set, md_to_sync will (correctly) abort. However in that case MD_RECOVERY_INTR isn't set. If a RESHAPE had been requested, then ->finish_reshape() will be called and it will think the reshape was successful even though nothing happened. Normally a resync will not be requested if ->ro is set, but if an array is stopped while a reshape is on-going, then when the array is started, the reshape will be restarted. If the array is also set read-only at this point, the reshape will instantly appear to success, resulting in data corruption. Consequently, this patch is suitable for any -stable kernel. Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman

dm cache: always split discards on cache block boundaries

2014-06-11T19:03:23Z

commit f1daa838e861ae1a0fb7cd9721a21258430fcc8c upstream. The DM cache target cannot cope with discards that span multiple cache blocks, so each discard bio that spans more than one cache block must get split by the DM core. Signed-off-by: Heinz Mauelshagen Acked-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm crypt: fix cpu hotplug crash by removing per-cpu structure

2014-06-07T20:25:39Z

commit 610f2de3559c383caf8fbbf91e9968102dff7ca0 upstream. The DM crypt target used per-cpu structures to hold pointers to a ablkcipher_request structure. The code assumed that the work item keeps executing on a single CPU, so it didn't use synchronization when accessing this structure. If a CPU is disabled by writing 0 to /sys/devices/system/cpu/cpu*/online, the work item could be moved to another CPU. This causes dm-crypt crashes, like the following, because the code starts using an incorrect ablkcipher_request: smpboot: CPU 7 is now offline BUG: unable to handle kernel NULL pointer dereference at 0000000000000130 IP: [] crypt_convert+0x12d/0x3c0 [dm_crypt] ... Call Trace: [] ? kcryptd_crypt+0x305/0x470 [dm_crypt] [] ? finish_task_switch+0x40/0xc0 [] ? process_one_work+0x168/0x470 [] ? worker_thread+0x10b/0x390 [] ? manage_workers.isra.26+0x290/0x290 [] ? kthread+0xaf/0xc0 [] ? kthread_create_on_node+0x120/0x120 [] ? ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_node+0x120/0x120 Fix this bug by removing the per-cpu definition. The structure ablkcipher_request is accessed via a pointer from convert_context. Consequently, if the work item is rescheduled to a different CPU, the thread still uses the same ablkcipher_request. This change may undermine performance improvements intended by commit c0297721 ("dm crypt: scale to multiple cpus") on select hardware. In practice no performance difference was observed on recent hardware. But regardless, correctness is more important than performance. Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

md: avoid possible spinning md thread at shutdown.

2014-06-07T20:25:32Z

commit 0f62fb220aa4ebabe8547d3a9ce4a16d3c045f21 upstream. If an md array with externally managed metadata (e.g. DDF or IMSM) is in use, then we should not set safemode==2 at shutdown because: 1/ this is ineffective: user-space need to be involved in any 'safemode' handling, 2/ The safemode management code doesn't cope with safemode==2 on external metadata and md_check_recover enters an infinite loop. Even at shutdown, an infinite-looping process can be problematic, so this could cause shutdown to hang. Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman

md/raid1: r1buf_pool_alloc: free allocate pages when subsequent allocation fails.

2014-05-31T04:52:12Z

commit da1aab3dca9aa88ae34ca392470b8943159e25fe upstream. When performing a user-request check/repair (MD_RECOVERY_REQUEST is set) on a raid1, we allocate multiple bios each with their own set of pages. If the page allocations for one bio fails, we currently do *not* free the pages allocated for the previous bios, nor do we free the bio itself. This patch frees all the already-allocate pages, and makes sure that all the bios are freed as well. This bug can cause a memory leak which can ultimately OOM a machine. It was introduced in 3.10-rc1. Fixes: a07876064a0b73ab5ef1ebcf14b1cf0231c07858 Cc: Kent Overstreet Reported-by: Russell King - ARM Linux Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman

dm thin: fix dangling bio in process_deferred_bios error path

2014-05-13T11:59:46Z

commit fe76cd88e654124d1431bb662a0fc6e99ca811a5 upstream. If unable to ensure_next_mapping() we must add the current bio, which was removed from the @bios list via bio_list_pop, back to the deferred_bios list before all the remaining @bios. Signed-off-by: Mike Snitzer Acked-by: Joe Thornber Signed-off-by: Greg Kroah-Hartman

dm transaction manager: fix corruption due to non-atomic transaction commit

2014-05-13T11:59:45Z

commit a9d45396f5956d0b615c7ae3b936afd888351a47 upstream. The persistent-data library used by dm-thin, dm-cache, etc is transactional. If anything goes wrong, such as an io error when writing new metadata or a power failure, then we roll back to the last transaction. Atomicity when committing a transaction is achieved by: a) Never overwriting data from the previous transaction. b) Writing the superblock last, after all other metadata has hit the disk. This commit and the following commit ("dm: take care to copy the space map roots before locking the superblock") fix a bug associated with (b). When committing it was possible for the superblock to still be written in spite of an io error occurring during the preceeding metadata flush. With these commits we're careful not to take the write lock out on the superblock until after the metadata flush has completed. Change the transaction manager's semantics for dm_tm_commit() to assume all data has been flushed _before_ the single superblock that is passed in. As a prerequisite, split the block manager's block unlocking and flushing by simplifying dm_bm_flush_and_unlock() to dm_bm_flush(). Now the unlocking must be done separately. This issue was discovered by forcing io errors at the crucial time using dm-flakey. Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm cache: fix access beyond end of origin device

2014-03-24T04:38:18Z

commit e893fba90c09f9b57fb97daae204ea9cc2c52fa5 upstream. In order to avoid wasting cache space a partial block at the end of the origin device is not cached. Unfortunately, the check for such a partial block at the end of the origin device was flawed. Fix accesses beyond the end of the origin device that occured due to attempted promotion of an undetected partial block by: - initializing the per bio data struct to allow cache_end_io to work properly - recognizing access to the partial block at the end of the origin device - avoiding out of bounds access to the discard bitset Otherwise, users can experience errors like the following: attempt to access beyond end of device dm-5: rw=0, want=20971520, limit=20971456 ... device-mapper: cache: promotion failed; couldn't copy block Signed-off-by: Heinz Mauelshagen Acked-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm cache: fix truncation bug when copying a block to/from >2TB fast device

2014-03-24T04:38:18Z

commit 8b9d96666529a979acf4825391efcc7c8a3e9f12 upstream. During demotion or promotion to a cache's >2TB fast device we must not truncate the cache block's associated sector to 32bits. The 32bit temporary result of from_cblock() caused a 32bit multiplication when calculating the sector of the fast device in issue_copy_real(). Use an intermediate 64bit type to store the 32bit from_cblock() to allow for proper 64bit multiplication. Here is an example of how this bug manifests on an ext4 filesystem: EXT4-fs error (device dm-0): ext4_mb_generate_buddy:756: group 17136, 32768 clusters in bitmap, 30688 in gd; block bitmap corrupt. JBD2: Spotted dirty metadata buffer (dev = dm-0, blocknr = 0). There's a risk of filesystem corruption in case of system crash. Signed-off-by: Heinz Mauelshagen Acked-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman