linux/drivers/md, branch v3.12.10

bcache: Data corruption fix

2014-02-06T19:22:19Z

commit ef71ec00002d92a08eb27e9d036e3d48835b6597 upstream. The code that handles overlapping extents that we've just read back in from disk was depending on the behaviour of the code that handles overlapping extents as we're inserting into a btree node in the case of an insert that forced an existing extent to be split: on insert, if we had to split we'd also insert a new extent to represent the top part of the old extent - and then that new extent would get written out. The code that read the extents back in thus not bother with splitting extents - if it saw an extent that ovelapped in the middle of an older extent, it would trim the old extent to only represent the bottom part, assuming that the original insert would've inserted a new extent to represent the top part. I still haven't figured out _how_ it can happen, but I'm now pretty convinced (and testing has confirmed) that there's some kind of an obscure corner case (probably involving extent merging, and multiple overwrites in different sets) that breaks this. The fix is to change the mergesort fixup code to split extents itself when required. Signed-off-by: Kent Overstreet Signed-off-by: Greg Kroah-Hartman

md/raid5: fix long-standing problem with bitmap handling on write failure.

2014-02-06T19:22:13Z

commit 9f97e4b128d2ea90a5f5063ea0ee3b0911f4c669 upstream. Before a write starts we set a bit in the write-intent bitmap. When the write completes we clear that bit if the write was successful to all devices. However if the write wasn't fully successful we should not clear the bit. If the faulty drive is subsequently re-added, the fact that the bit is still set ensure that we will re-write the data that is missing. This logic is mediated by the STRIPE_DEGRADED flag - we only clear the bitmap bit when this flag is not set. Currently we correctly set the flag if a write starts when some devices are failed or missing. But we do *not* set the flag if some device failed during the write attempt. This is wrong and can result in clearing the bit inappropriately. So: set the flag when a write fails. This bug has been present since bitmaps were introduces, so the fix is suitable for any -stable kernel. Reported-by: Ethan Wilson Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman

md/raid5: Fix possible confusion when multiple write errors occur.

2014-01-25T16:49:29Z

commit 1cc03eb93245e63b0b7a7832165efdc52e25b4e6 upstream. commit 5d8c71f9e5fbdd95650be00294d238e27a363b5c md: raid5 crash during degradation Fixed a crash in an overly simplistic way which could leave R5_WriteError or R5_MadeGood set in the stripe cache for devices for which it is no longer relevant. When those devices are removed and spares added the flags are still set and can cause incorrect behaviour. commit 14a75d3e07c784c004b4b44b34af996b8e4ac453 md/raid5: preferentially read from replacement device if possible. Fixed the same bug if a more effective way, so we can now revert the original commit. Reported-and-tested-by: Alexander Lyakas Fixes: 5d8c71f9e5fbdd95650be00294d238e27a363b5c Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman

md/raid10: fix two bugs in handling of known-bad-blocks.

2014-01-25T16:49:29Z

commit b50c259e25d9260b9108dc0c2964c26e5ecbe1c1 upstream. If we discover a bad block when reading we split the request and potentially read some of it from a different device. The code path of this has two bugs in RAID10. 1/ we get a spin_lock with _irq, but unlock without _irq!! 2/ The calculation of 'sectors_handled' is wrong, as can be clearly seen by comparison with raid1.c This leads to at least 2 warnings and a probable crash is a RAID10 ever had known bad blocks. Fixes: 856e08e23762dfb92ffc68fd0a8d228f9e152160 Reported-by: Damian Nowak URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181 Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman

md/raid10: fix bug when raid10 recovery fails to recover a block.

2014-01-25T16:49:29Z

commit e8b849158508565e0cd6bc80061124afc5879160 upstream. commit e875ecea266a543e643b19e44cf472f1412708f9 md/raid10 record bad blocks as needed during recovery. added code to the "cannot recover this block" path to record a bad block rather than fail the whole recovery. Unfortunately this new case was placed *after* r10bio was freed rather than *before*, yet it still uses r10bio. This is will crash with a null dereference. So move the freeing of r10bio down where it is safe. Fixes: e875ecea266a543e643b19e44cf472f1412708f9 Reported-by: Damian Nowak URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181 Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman

md: fix problem when adding device to read-only array with bitmap.

2014-01-25T16:49:29Z

commit 8313b8e57f55b15e5b7f7fc5d1630bbf686a9a97 upstream. If an array is started degraded, and then the missing device is found it can be re-added and a minimal bitmap-based recovery will bring it fully up-to-date. If the array is read-only a recovery would not be allowed. But also if the array is read-only and the missing device was present very recently, then there could be no need for any recovery at all, so we simply include the device in the read-only array without any recovery. However... if the missing device was removed a little longer ago it could be missing some updates, but if a bitmap is present it will be conditionally accepted pending a bitmap-based update. We don't currently detect this case properly and will include that old device into the read-only array with no recovery even though it really needs a recovery. This patch keeps track of whether a bitmap-based-recovery is really needed or not in the new Bitmap_sync rdev flag. If that is set, then the device will not be added to a read-only array. Cc: Andrei Warkentin Fixes: d70ed2e4fafdbef0800e73942482bb075c21578b Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman

dm thin: allow pool in read-only mode to transition to read-write mode

2013-12-20T15:49:05Z

commit 9b7aaa64f96f7ca280d75326fca42f42017b89ef upstream. A thin-pool may be in read-only mode because the pool's data or metadata space was exhausted. To allow for recovery, by adding more space to the pool, we must allow a pool to transition from PM_READ_ONLY to PM_WRITE mode. Otherwise, running out of space will render the pool permanently read-only. Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm thin: re-establish read-only state when switching to fail mode

2013-12-20T15:49:05Z

commit 5383ef3a929a1366e2ced45cd6d74be7aa2a2281 upstream. If the thin-pool transitioned to fail mode and the thin-pool's table were reloaded for some reason: the new table's default pool mode would be read-write, though it will transition to fail mode during resume. When the pool mode transitions directly from PM_WRITE to PM_FAIL we need to re-establish the intermediate read-only state in both the metadata and persistent-data block manager (as is usually done with the normal pool mode transition sequence: PM_WRITE -> PM_READ_ONLY -> PM_FAIL). Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm thin: always fallback the pool mode if commit fails

2013-12-20T15:49:05Z

commit 020cc3b5e28c2e24f59f53a9154faf08564f308e upstream. Rename commit_or_fallback() to commit(). Now all previous calls to commit() will trigger the pool mode to fallback if the commit fails. Also, check the error returned from commit() in alloc_data_block(). Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm thin: switch to read-only mode if metadata space is exhausted

2013-12-20T15:49:05Z

commit 4a02b34e0cf1d0d0dd3737702841da4bf615a50a upstream. Switch the thin pool to read-only mode in alloc_data_block() if dm_pool_alloc_data_block() fails because the pool's metadata space is exhausted. Differentiate between data and metadata space in messages about no free space available. This issue was noticed with the device-mapper-test-suite using: dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/ The quantity of errors logged in this case must be reduced. before patch: device-mapper: thin: 253:4: reached low water mark for metadata device: sending event. device-mapper: space map metadata: unable to allocate new metadata block device-mapper: space map common: dm_tm_shadow_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: space map common: dm_tm_shadow_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: space map common: dm_tm_shadow_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: space map common: dm_tm_shadow_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: space map common: dm_tm_shadow_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: thin: 253:4: commit failed: error = -28 device-mapper: thin: 253:4: switching pool to read-only mode after patch: device-mapper: thin: 253:4: reached low water mark for metadata device: sending event. device-mapper: space map metadata: unable to allocate new metadata block device-mapper: thin: 253:4: no free metadata space available. device-mapper: thin: 253:4: switching pool to read-only mode Signed-off-by: Mike Snitzer Acked-by: Joe Thornber Signed-off-by: Greg Kroah-Hartman