<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/md, branch v2.6.35</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/drivers/md?h=v2.6.35</id>
<link rel='self' href='https://git.amat.us/linux/atom/drivers/md?h=v2.6.35'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2010-06-24T03:36:04Z</updated>
<entry>
<title>md/raid5: don't include 'spare' drives when reshaping to fewer devices.</title>
<updated>2010-06-24T03:36:04Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-06-17T07:48:26Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3424bf6a772cff606fc4bc24a3639c937afb547f'/>
<id>urn:sha1:3424bf6a772cff606fc4bc24a3639c937afb547f</id>
<content type='text'>
There are few situations where it would make any sense to add a spare
when reducing the number of devices in an array, but it is
conceivable:  A 6 drive RAID6 with two missing devices could be
reshaped to a 5 drive RAID6, and a spare could become available
just in time for the reshape, but not early enough to have been
recovered first.  'freezing' recovery can make this easy to
do without any races.

However doing such a thing is a bad idea.  md will not record the
partially-recovered state of the 'spare' and when the reshape
finished it will think that the spare is still spare.
Easiest way to avoid this confusion is to simply disallow it.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
</entry>
<entry>
<title>md/raid5: add a missing 'continue' in a loop.</title>
<updated>2010-06-24T03:35:49Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-06-17T07:41:03Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=2f115882499f3e5eca33d1df07b8876cc752a1ff'/>
<id>urn:sha1:2f115882499f3e5eca33d1df07b8876cc752a1ff</id>
<content type='text'>
As the comment says, the tail of this loop only applies to devices
that are not fully in sync, so if In_sync was set, we should avoid
the rest of the loop.

This bug will hardly ever cause an actual problem.  The worst it
can do is allow an array to be assembled that is dirty and degraded,
which is not generally a good idea (without warning the sysadmin
first).

This will only happen if the array is RAID4 or a RAID5/6 in an
intermediate state during a reshape and so has one drive that is
all 'parity' - no data - while some other device has failed.

This is certainly possible, but not at all common.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
</entry>
<entry>
<title>md/raid5: Allow recovered part of partially recovered devices to be in-sync</title>
<updated>2010-06-24T03:35:39Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-06-17T07:25:21Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=415e72d034c50520ddb7ff79e7d1792c1306f0c9'/>
<id>urn:sha1:415e72d034c50520ddb7ff79e7d1792c1306f0c9</id>
<content type='text'>
During a recovery of reshape the early part of some devices might be
in-sync while the later parts are not.
We we know we are looking at an early part it is good to treat that
part as in-sync for stripe calculations.

This is particularly important for a reshape which suffers device
failure.  Treating the data as in-sync can mean the difference between
data-safety and data-loss.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
</entry>
<entry>
<title>md/raid5: More careful check for "has array failed".</title>
<updated>2010-06-24T03:35:27Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-06-16T07:17:53Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=674806d62fb02a22eea948c9f1b5e58e0947b728'/>
<id>urn:sha1:674806d62fb02a22eea948c9f1b5e58e0947b728</id>
<content type='text'>
When we are reshaping an array, the device failure combinations
that cause us to decide that the array as failed are more subtle.

In particular, any 'spare' will be fully in-sync in the section
of the array that has already been reshaped, thus failures that
affect only that section are less critical.

So encode this subtlety in a new function and call it as appropriate.

The case that showed this problem was a 4 drive RAID5 to 8 drive RAID6
conversion where the last two devices failed.
This resulted in:

  good good good good incomplete good good failed failed

while converting a 5-drive RAID6 to 8 drive RAID5
The incomplete device causes the whole array to look bad,
bad as it was actually good for the section that had been
converted to 8-drives, all the data was actually safe.

Reported-by: Terry Morris &lt;tbmorris@tbmorris.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
</entry>
<entry>
<title>md: Don't update -&gt;recovery_offset when reshaping an array to fewer devices.</title>
<updated>2010-06-24T03:35:18Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-06-16T07:01:25Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=70fffd0bfab1558a8c64c5e903dea1fb84cd9f6b'/>
<id>urn:sha1:70fffd0bfab1558a8c64c5e903dea1fb84cd9f6b</id>
<content type='text'>
When an array is reshaped to have fewer devices, the reshape proceeds
from the end of the devices to the beginning.

If a device happens to be non-In_sync (which is possible but rare)
we would normally update the -&gt;recovery_offset as the reshape
progresses. However that would be wrong as the recover_offset records
that the early part of the device is in_sync, while in fact it would
only be the later part that is in_sync, and in any case the offset
number would be measured from the wrong end of the device.

Relatedly, if after a reshape a spare is discovered to not be
recoverred all the way to the end, not allow spare_active
to incorporate it in the array.

This becomes relevant in the following sample scenario:

A 4 drive RAID5 is converted to a 6 drive RAID6 in a combined
operation.
The RAID5-&gt;RAID6 conversion will cause a 5 drive to be included as a
spare, then the 5drive -&gt; 6drive reshape will effectively rebuild that
spare as it progresses.  The 6th drive is treated as in_sync the whole
time as there is never any case that we might consider reading from
it, but must not because there is no valid data.

If we interrupt this reshape part-way through and reverse it to return
to a 5-drive RAID6 (or event a 4-drive RAID5), we don't want to update
the recovery_offset - as that would be wrong - and we don't want to
include that spare as active in the 5-drive RAID6 when the reversed
reshape completed and it will be mostly out-of-sync still.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
</entry>
<entry>
<title>md/raid5: avoid oops when number of devices is reduced then increased.</title>
<updated>2010-06-24T03:35:02Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-06-16T06:45:16Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=e4e11e385d1e5516ac76c956d6c25e6c2fa1b8d0'/>
<id>urn:sha1:e4e11e385d1e5516ac76c956d6c25e6c2fa1b8d0</id>
<content type='text'>
The entries in the stripe_cache maintained by raid5 are enlarged
when we increased the number of devices in the array, but not
shrunk when we reduce the number of devices.
So if entries are added after reducing the number of devices, we
much ensure to initialise the whole entry, not just the part that
is currently relevant.  Otherwise if we enlarge the array again,
we will reference uninitialised values.

As grow_buffers/shrink_buffer now want to use a count that is stored
explicity in the raid_conf, they should get it from there rather than
being passed it as a parameter.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
</entry>
<entry>
<title>md: enable raid4-&gt;raid0 takeover</title>
<updated>2010-06-24T03:34:57Z</updated>
<author>
<name>Maciej Trela</name>
<email>maciej.trela@intel.com</email>
</author>
<published>2010-06-16T10:56:12Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=049d6c1ef983c9ac43aa423dfd752071a5b0002d'/>
<id>urn:sha1:049d6c1ef983c9ac43aa423dfd752071a5b0002d</id>
<content type='text'>
Only level 5 with layout=PARITY_N can be taken over to raid0 now.
Lets allow level 4 either.

Signed-off-by: Maciej Trela &lt;maciej.trela@intel.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
</entry>
<entry>
<title>md: clear layout after -&gt;raid0 takeover</title>
<updated>2010-06-24T03:34:45Z</updated>
<author>
<name>Maciej Trela</name>
<email>maciej.trela@intel.com</email>
</author>
<published>2010-06-16T10:55:14Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=001048a318d48e93cb6a1246f3b20335b2a7c855'/>
<id>urn:sha1:001048a318d48e93cb6a1246f3b20335b2a7c855</id>
<content type='text'>
After takeover from raid5/10 -&gt; raid0 mddev-&gt;layout is not cleared.

Signed-off-by: Maciej Trela &lt;maciej.trela@intel.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
</entry>
<entry>
<title>md: fix raid10 takeover: use new_layout for setup_conf</title>
<updated>2010-06-24T03:33:51Z</updated>
<author>
<name>Maciej Trela</name>
<email>maciej.trela@intel.com</email>
</author>
<published>2010-06-16T10:46:29Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f73ea87375a1b2bf6c0be82bb9a3cb9d5ee7a407'/>
<id>urn:sha1:f73ea87375a1b2bf6c0be82bb9a3cb9d5ee7a407</id>
<content type='text'>
Use mddev-&gt;new_layout in setup_conf.
Also use new_chunk, and don't set -&gt;degraded in takeover().  That
gets set in run()

Signed-off-by: Maciej Trela &lt;maciej.trela@intel.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
</entry>
<entry>
<title>md: fix handling of array level takeover that re-arranges devices.</title>
<updated>2010-06-24T03:33:24Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-06-15T08:36:03Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=e93f68a1fc6244c05ad8fae28e75835ec74ab34e'/>
<id>urn:sha1:e93f68a1fc6244c05ad8fae28e75835ec74ab34e</id>
<content type='text'>
Most array level changes leave the list of devices largely unchanged,
possibly causing one at the end to become redundant.
However conversions between RAID0 and RAID10 need to renumber
all devices (except 0).

This renumbering is currently being done in the -&gt;run method when the
new personality takes over.  However this is too late as the common
code in md.c might already have invalidated some of the devices if
they had a -&gt;raid_disk number that appeared to high.

Moving it into the -&gt;takeover method is too early as the array is
still active at that time and wrong -&gt;raid_disk numbers could cause
confusion.

So add a -&gt;new_raid_disk field to mdk_rdev_s and use it to communicate
the new raid_disk number.
Now the common code knows exactly which devices need to be renumbered,
and which can be invalidated, and can do it all at a convenient time
when the array is suspend.
It can also update some symlinks in sysfs which previously were not be
updated correctly.

Reported-by: Maciej Trela &lt;maciej.trela@intel.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
</entry>
</feed>
