linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2011-10-31	dm kcopyd: add dm_kcopyd_zero to zero an area	Mikulas Patocka
	This patch introduces dm_kcopyd_zero() to make it easy to use kcopyd to write zeros into the requested areas instead instead of copying. It is implemented by passing a NULL copying source to dm_kcopyd_copy(). The forthcoming thin provisioning target uses this. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-10-31	dm: remove superfluous smp_mb	Namhyung Kim
	Since set_current_state() contains a memory barrier in it, an additional barrier isn't needed. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-10-31	dm: use local printk ratelimit	Namhyung Kim
	printk_ratelimit() shares global ratelimiting state with all other subsystems, so its usage is discouraged. Instead, define and use dm's local state. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-10-31	dm table: propagate non rotational flag	Mandeep Singh Baines
	Allow QUEUE_FLAG_NONROT to propagate up the device stack if all underlying devices are non-rotational. Tools like ureadahead will schedule IOs differently based on the rotational flag. With this patch, I see boot time go from 7.75 s to 7.46 s on my device. Suggested-by: J. Richard Barnette <jrbarnette@chromium.org> Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: dm-devel@redhat.com Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-10-31	md/raid10: Fix bug when activating a hot-spare.	NeilBrown
	This is a fairly serious bug in RAID10. When a RAID10 array is degraded and a hot-spare is activated, the spare does not take up the empty slot, but rather replaces the first working device. This is likely to make the array non-functional. It would normally be possible to recover the data, but that would need care and is not guaranteed. This bug was introduced in commit 2bb77736ae5dca0a189829fbb7379d43364a9dac which first appeared in 3.1. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-26	Merge branch 'for-linus' of git://neil.brown.name/md	Linus Torvalds
	* 'for-linus' of git://neil.brown.name/md: (34 commits) md: Fix some bugs in recovery_disabled handling. md/raid5: fix bug that could result in reads from a failed device. lib/raid6: Fix filename emitted in generated code md.c: trivial comment fix MD: Allow restarting an interrupted incremental recovery. md: clear In_sync bit on devices added to an active array. md: add proper write-congestion reporting to RAID1 and RAID10. md: rename "mdk_personality" to "md_personality" md/bitmap remove fault injection options. md/raid5: typedef removal: raid5_conf_t -> struct r5conf md/raid1: typedef removal: conf_t -> struct r1conf md/raid10: typedef removal: conf_t -> struct r10conf md/raid0: typedef removal: raid0_conf_t -> struct r0conf md/multipath: typedef removal: multipath_conf_t -> struct mpconf md/linear: typedef removal: linear_conf_t -> struct linear_conf md/faulty: remove typedef: conf_t -> struct faulty_conf md/linear: remove typedefs: dev_info_t -> struct dev_info md: remove typedefs: mirror_info_t -> struct mirror_info md: remove typedefs: r10bio_t -> struct r10bio and r1bio_t -> struct r1bio md: remove typedefs: mdk_thread_t -> struct md_thread ...
2011-10-26	md: Fix some bugs in recovery_disabled handling.	NeilBrown
	In 3.0 we changed the way recovery_disabled was handle so that instead of testing against zero, we test an mddev-> value against a conf-> value. Two problems: 1/ one place in raid1 was missed and still sets to '1'. 2/ We didn't explicitly set the conf-> value at array creation time. It defaulted to '0' just like the mddev value does so they could appear equal and thus disable recovery. This did not affect normal 'md' as it calls bind_rdev_to_array which changes the mddev value. However the dmraid interface doesn't call this and so doesn't change ->recovery_disabled; so at array start all recovery is incorrectly disabled. So initialise the 'conf' value to one less that the mddev value, so the will only be the same when explicitly set that way. Reported-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-26	md/raid5: fix bug that could result in reads from a failed device.	NeilBrown
	This bug was introduced in 415e72d034c50520ddb7ff79e7d1792c1306f0c9 which was in 2.6.36. There is a small window of time between when a device fails and when it is removed from the array. During this time we might still read from it, but we won't write to it - so it is possible that we could read stale data. We didn't need the test of 'Faulty' before because the test on In_sync is sufficient. Since we started allowing reads from the early part of non-In_sync devices we need a test on Faulty too. This is suitable for any kernel from 2.6.36 onwards, though the patch might need a bit of tweaking in 3.0 and earlier. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-24	block: Remove the control of complete cpu from bio.	Tao Ma
	bio originally has the functionality to set the complete cpu, but it is broken. Chirstoph said that "This code is unused, and from the all the discussions lately pretty obviously broken. The only thing keeping it serves is creating more confusion and possibly more bugs." And Jens replied with "We can kill bio_set_completion_cpu(). I'm fine with leaving cpu control to the request based drivers, they are the only ones that can toggle the setting anyway". So this patch tries to remove all the work of controling complete cpu from a bio. Cc: Shaohua Li <shaohua.li@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-23	dm kcopyd: fix job_pool leak	Alasdair G Kergon
	Fix memory leak introduced by commit a6e50b409d3f9e0833e69c3c9cca822e8fa4adbb (dm snapshot: skip reading origin when overwriting complete chunk). When allocating a set of jobs from kc->job_pool, job->master_job must be set (to point to itself) so that the mempool item gets freed when the master_job completes. master_job was introduced by commit c6ea41fbbe08f270a8edef99dc369faf809d1bd6 (dm kcopyd: preallocate sub jobs to avoid deadlock) Reported-by: Michael Leun <ml@newton.leun.net> Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-10-19	Merge branch 'v3.1-rc10' into for-3.2/core	Jens Axboe
	Conflicts: block/blk-core.c include/linux/blkdev.h Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19	md.c: trivial comment fix	Chris Dunlop
	Trivial comment fix Signed-off-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-18	MD: Allow restarting an interrupted incremental recovery.	Andrei Warkentin
	If an incremental recovery was interrupted, a subsequent re-add will result in a full recovery, even though an incremental should be possible (seen with raid1). Solve this problem by not updating the superblock on the recovering device until array is not degraded any longer. Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrei Warkentin <andreiw@vmware.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-18	md: clear In_sync bit on devices added to an active array.	NeilBrown
	When we add a device to an active array it can be meaningful to set the 'insync' flag. This indicates that the device is in-sync with the array except for locations recorded in the bitmap. A bitmap-based recovery can then bring it completely in-sync. Internally we move that flag to 'saved_raid_disk' but forgot to clear In_sync like we do in add_new_disk. So clear In_sync after moving its value to saved_raid_disk. Reported-by: Andrei Warkentin <andreiw@vmware.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11	md: add proper write-congestion reporting to RAID1 and RAID10.	NeilBrown
	RAID1 and RAID10 handle write requests by queuing them for handling by a separate thread. This is because when a write-intent-bitmap is active we might need to update the bitmap first, so it is good to queue a lot of writes, then do one big bitmap update for them all. However writeback request devices to appear to be congested after a while so it can make some guesstimate of throughput. The infinite queue defeats that (note that RAID5 has already has a finite queue so it doesn't suffer from this problem). So impose a limit on the number of pending write requests. By default it is 1024 which seems to be generally suitable. Make it configurable via module option just in case someone finds a regression. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11	md: rename "mdk_personality" to "md_personality"	NeilBrown
	"mdk" doesn't mean anything any more. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11	md/bitmap remove fault injection options.	NeilBrown
	These are too hard to use to be much more than noise. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11	md/raid5: typedef removal: raid5_conf_t -> struct r5conf	NeilBrown
	Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11	md/raid1: typedef removal: conf_t -> struct r1conf	NeilBrown
	Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11	md/raid10: typedef removal: conf_t -> struct r10conf	NeilBrown
	Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11	md/raid0: typedef removal: raid0_conf_t -> struct r0conf	NeilBrown
	Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11	md/multipath: typedef removal: multipath_conf_t -> struct mpconf	NeilBrown
	Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11	md/linear: typedef removal: linear_conf_t -> struct linear_conf	NeilBrown
	Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11	md/faulty: remove typedef: conf_t -> struct faulty_conf	NeilBrown
	Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11	md/linear: remove typedefs: dev_info_t -> struct dev_info	NeilBrown
	Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11	md: remove typedefs: mirror_info_t -> struct mirror_info	NeilBrown
	Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11	md: remove typedefs: r10bio_t -> struct r10bio and r1bio_t -> struct r1bio	NeilBrown
	Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11	md: remove typedefs: mdk_thread_t -> struct md_thread	NeilBrown
	Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11	md: remove typedefs: mddev_t -> struct mddev	NeilBrown
	Having mddev_t and 'struct mddev_s' is ugly and not preferred Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11	md: removing typedefs: mdk_rdev_t -> struct md_rdev	NeilBrown
	The typedefs are just annoying. 'mdk' probably refers to 'md_k.h' which used to be an include file that defined this thing. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07	md/raid0: convert some printks to pr_debug.	NeilBrown
	When md assembles a RAID0 array it prints out lots of info which is really just for debugging, so convert that to pr_debug. It also prints out the resulting configuration which could be interesting, so keep that as 'printk' but tidy it up a bit. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07	md: remove PRINTK and dprintk debugging and use pr_debug	NeilBrown
	Being able to dynamically enable these make them much more useful. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07	md: remove some old DEBUGging code.	NeilBrown
	This code is not really helpful and is hard to maintain, so just discard it. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07	md/raid5: convert to macros into inline functions.	NeilBrown
	More type-safety. Easier to read. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07	md/raid1/ avoid bio search in end_sync_read()	NeilBrown
	We know which device we just read from so we don't need to search the bios to find out. Just use ->read_disk. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07	md/raid1: factor out common bio handling code	Namhyung Kim
	When normal-write and sync-read/write bio completes, we should find out the disk number the bio belongs to. Factor those common code out to a separate function. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07	md/raid5: remove pointless NULL test.	NeilBrown
	In the 'abort' branch of run(), 'conf' cannot possibly be NULL, so remove the test. Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07	md/raid1: add documentation to r1_private_data_s data structure.	NeilBrown
	There wasn't much and it is inconsistent. Also rearrange fields to keep related fields together. Reported-by: Aapo Laine <aapo.laine@shiftmail.org> Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-06	Merge branch 'for-linus' of http://people.redhat.com/agk/git/linux-dm	Linus Torvalds
	* 'for-linus' of http://people.redhat.com/agk/git/linux-dm: dm crypt: always disable discard_zeroes_data dm: raid fix write_mostly arg validation dm table: avoid crash if integrity profile changes dm: flakey fix corrupt_bio_byte error path
2011-09-25	dm crypt: always disable discard_zeroes_data	Milan Broz
	If optional discard support in dm-crypt is enabled, discards requests bypass the crypt queue and blocks of the underlying device are discarded. For the read path, discarded blocks are handled the same as normal ciphertext blocks, thus decrypted. So if the underlying device announces discarded regions return zeroes, dm-crypt must disable this flag because after decryption there is just random noise instead of zeroes. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-09-25	dm: raid fix write_mostly arg validation	Jonthan Brassow
	Fix off-by-one error in validation of write_mostly. The user-supplied value given for the 'write_mostly' argument must be an index starting at 0. The validation of the supplied argument failed to check for 'N' ('>' vs '>='), which would have caused an access beyond the end of the array. Reported-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-09-25	dm table: avoid crash if integrity profile changes	Mike Snitzer
	Commit a63a5cf (dm: improve block integrity support) introduced a two-phase initialization of a DM device's integrity profile. This patch avoids dereferencing a NULL 'template_disk' pointer in blk_integrity_register() if there is an integrity profile mismatch in dm_table_set_integrity(). This can occur if the integrity profiles for stacked devices in a DM table are changed between the call to dm_table_prealloc_integrity() and dm_table_set_integrity(). Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: stable@kernel.org # 2.6.39
2011-09-25	dm: flakey fix corrupt_bio_byte error path	Mike Snitzer
	If no arguments were provided to the corrupt_bio_byte feature an error should be returned immediately. Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-09-23	md: don't delay reboot by 1 second if no MD devices exist	Daniel P. Berrange
	The md_notify_reboot() method includes a call to mdelay(1000), to deal with "exotic SCSI devices" which are too volatile on reboot. The delay is unconditional. Even if the machine does not have any block devices, let alone MD devices, the kernel shutdown sequence is slowed down. 1 second does not matter much with physical hardware, but with certain virtualization use cases any wasted time in the bootup & shutdown sequence counts for alot. * drivers/md/md.c: md_notify_reboot() - only impose a delay if there was at least one MD device to be stopped during reboot Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-21	trival: md_k.h should be md.h in the beginning comment of file md.h	Wang Sheng-Hui
	Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-21	md/bitmap: improve handling of 'allclean'.	NeilBrown
	The 'allclean' flag is used to cache the fact that there is nothing to do, so we can avoid waking up and scanning the bitmap regularly. The two sorts of pages that might need the attention of the bitmap daemon are BITMAP_PAGE_PENDING and BITMAP_PAGE_NEEDWRITE pages. So make sure allclean reflects exactly when there are none of those. So: set it before scanning all pages with either bit set. clear it whenever these bits are set clear it when we desire not to clear one of these bits. don't clear it any other time. Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-21	md/bitmap: rename and tidy up BITMAP_PAGE_CLEAN	NeilBrown
	The flag 'BITMAP_PAGE_CLEAN' has a confusing name as it doesn't mean that the page is clean, but rather that there are counters in the page which allow bits in the bitmap to be cleared - i.e. maybe cleaning can happen. So change it to BITMAP_PAGE_PENDING and fix some irregularities: - Don't set it in bitmap_init_from_disk as bitmap_set_memory_bits sets it when needed - in bitmap_daemon_work, if we find a counter that is '1', but need_sync is set, then set BITMAP_PAGE_PENDING again (it was recently cleared) to ensure we don't forget about this bit. Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-21	md: Avoid waking up a thread after it has been freed.	NeilBrown
	Two related problems: 1/ some error paths call "md_unregister_thread(mddev->thread)" without subsequently clearing ->thread. A subsequent call to mddev_unlock will try to wake the thread, and crash. 2/ Most calls to md_wakeup_thread are protected against the thread disappeared either by: - holding the ->mutex - having an active request, so something else must be keeping the array active. However mddev_unlock calls md_wakeup_thread after dropping the mutex and without any certainty of an active request, so the ->thread could theoretically disappear. So we need a spinlock to provide some protections. So change md_unregister_thread to take a pointer to the thread pointer, and ensure that it always does the required locking, and clears the pointer properly. Reported-by: "Moshe Melnikov" <moshe@zadarastorage.com> Signed-off-by: NeilBrown <neilb@suse.de> cc: stable@kernel.org
2011-09-12	block: remove support for bio remapping from ->make_request	Christoph Hellwig
	There is very little benefit in allowing to let a ->make_request instance update the bios device and sector and loop around it in __generic_make_request when we can archive the same through calling generic_make_request from the driver and letting the loop in generic_make_request handle it. Note that various drivers got the return value from ->make_request and returned non-zero values for errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-09-12	block: rename __make_request() to blk_queue_bio()	Jens Axboe
	Now that it's exported, lets put it in a more sane namespace. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>