From 490da40d82b31c0562d3f5edb37810f492ca1c34 Mon Sep 17 00:00:00 2001 From: Tao Ma Date: Wed, 19 Jan 2011 10:51:44 +0800 Subject: blktrace: Don't output messages if NOTIFY isn't set. Now if we enable blktrace, cfq has too many messages output to the trace buffer. It is fine if we don't specify any action mask. But if I do like this: blktrace /dev/sdb -a issue -a complete -o - | blkparse -i - I only want to see 'D' and 'C', while with the following command dd if=/mnt/ocfs2/test of=/dev/null bs=4k count=1 iflag=direct I will get(with a 2.6.37 vanilla kernel): 8,16 0 0 0.000000000 0 m N cfq3805 alloced 8,16 0 0 0.000004126 0 m N cfq3805 insert_request 8,16 0 0 0.000004884 0 m N cfq3805 add_to_rr 8,16 0 0 0.000008417 0 m N cfq workload slice:300 8,16 0 0 0.000009557 0 m N cfq3805 set_active wl_prio:0 wl_type:2 8,16 0 0 0.000010640 0 m N cfq3805 fifo= (null) 8,16 0 0 0.000011193 0 m N cfq3805 dispatch_insert 8,16 0 0 0.000012221 0 m N cfq3805 dispatched a request 8,16 0 0 0.000012802 0 m N cfq3805 activate rq, drv=1 8,16 0 1 0.000013181 3805 D R 114759 + 8 [dd] 8,16 0 2 0.000164244 0 C R 114759 + 8 [0] 8,16 0 0 0.000167997 0 m N cfq3805 complete rqnoidle 0 8,16 0 0 0.000168782 0 m N cfq3805 set_slice=100 8,16 0 0 0.000169874 0 m N cfq3805 arm_idle: 8 group_idle: 0 8,16 0 0 0.000170189 0 m N cfq schedule dispatch 8,16 0 0 0.000397938 0 m N cfq3805 slice expired t=0 8,16 0 0 0.000399763 0 m N cfq3805 sl_used=1 disp=1 charge=1 iops=0 sect=8 8,16 0 0 0.000400227 0 m N cfq3805 del_from_rr 8,16 0 0 0.000400882 0 m N cfq3805 put_queue See, there are 19 lines while I only need 2. I don't think it is appropriate for a user. So this patch will disable any messages if the BLK_TC_NOTIFY isn't set. Now the output for the same command will look like: 8,16 0 1 0.000000000 4908 D R 114759 + 8 [dd] 8,16 0 2 0.000146827 0 C R 114759 + 8 [0] Yes, it is what I want to see. Cc: Steven Rostedt Cc: Jeff Moyer Signed-off-by: Tao Ma Signed-off-by: Jens Axboe --- kernel/trace/blktrace.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 153562d0b93..d95721f3370 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -138,6 +138,13 @@ void __trace_note_message(struct blk_trace *bt, const char *fmt, ...) !blk_tracer_enabled)) return; + /* + * If the BLK_TC_NOTIFY action mask isn't set, don't send any note + * message to the trace. + */ + if (!(bt->act_mask & BLK_TC_NOTIFY)) + return; + local_irq_save(flags); buf = per_cpu_ptr(bt->msg_data, smp_processor_id()); va_start(args, fmt); -- cgit v1.2.3-18-g5258 From 04de96c9c6981c5957aa5db39bbdc4d958d07efa Mon Sep 17 00:00:00 2001 From: Tracey Dent Date: Wed, 19 Jan 2011 08:25:02 -0700 Subject: drivers/block/Makefile: replace the use of -objs with -y Change Makefile to use -y instead of -objs because -objs is deprecated and should now be switched. According to (documentation/kbuild/makefiles.txt). Signed-off-by: Tracey Dent Signed-off-by: Andrew Morton Signed-off-by: Jens Axboe --- drivers/block/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/Makefile b/drivers/block/Makefile index d7f463d6312..40528ba56d1 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -39,4 +39,4 @@ obj-$(CONFIG_XEN_BLKDEV_FRONTEND) += xen-blkfront.o obj-$(CONFIG_BLK_DEV_DRBD) += drbd/ obj-$(CONFIG_BLK_DEV_RBD) += rbd.o -swim_mod-objs := swim.o swim_asm.o +swim_mod-y := swim.o swim_asm.o -- cgit v1.2.3-18-g5258 From ee71a968672a9951aee6014c55511007596425bc Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Wed, 19 Jan 2011 08:25:02 -0700 Subject: loop: queue_lock NULL pointer derefence in blk_throtl_exit Performing $ sudo mount -o loop -o umask=0 /dev/sdb1 /mnt/ mount: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so $ sudo modprobe -r loop results in oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [] do_raw_spin_lock+0x14/0x122 Process modprobe (pid: 6189, threadinfo ffff88009a898000, task ffff880154a88000) Call Trace: [] _raw_spin_lock_irq+0x4a/0x51 [] ? blk_throtl_exit+0x3b/0xa0 [] ? cancel_delayed_work_sync+0xd/0xf [] blk_throtl_exit+0x3b/0xa0 [] blk_release_queue+0x21/0x65 [] kobject_release+0x51/0x66 [] ? kobject_release+0x0/0x66 [] kref_put+0x43/0x4d [] kobject_put+0x47/0x4b [] blk_cleanup_queue+0x56/0x5b [] loop_exit+0x68/0x844 [loop] [] sys_delete_module+0x1e8/0x25b [] ? trace_hardirqs_on_thunk+0x3a/0x3f [] system_call_fastpath+0x16/0x1b because of an attempt to acquire NULL queue_lock. I added the same lines as in blk_queue_make_request - index 44e18c0..49e6a54 100644`fall back to embedded per-queue lock'. Signed-off-by: Sergey Senozhatsky Signed-off-by: Andrew Morton Signed-off-by: Jens Axboe --- drivers/block/loop.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 44e18c073c4..49e6a545eb6 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1641,6 +1641,9 @@ out: static void loop_free(struct loop_device *lo) { + if (!lo->lo_queue->queue_lock) + lo->lo_queue->queue_lock = &lo->lo_queue->__queue_lock; + blk_cleanup_queue(lo->lo_queue); put_disk(lo->lo_disk); list_del(&lo->lo_list); -- cgit v1.2.3-18-g5258 From a0700bdd0b0150ea445159b1dee587f1507c272f Mon Sep 17 00:00:00 2001 From: Tracey Dent Date: Wed, 19 Jan 2011 08:25:02 -0700 Subject: drivers/block/aoe/Makefile: replace the use of -objs with -y Change Makefile to use -y instead of -objs because -objs is deprecated and should now be switched. According to (documentation/kbuild/makefiles.txt). Signed-off-by: Tracey Dent Cc: "Ed L. Cashin" Signed-off-by: Andrew Morton Signed-off-by: Jens Axboe --- drivers/block/aoe/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/aoe/Makefile b/drivers/block/aoe/Makefile index e76d997183c..06ea82cdf27 100644 --- a/drivers/block/aoe/Makefile +++ b/drivers/block/aoe/Makefile @@ -3,4 +3,4 @@ # obj-$(CONFIG_ATA_OVER_ETH) += aoe.o -aoe-objs := aoeblk.o aoechr.o aoecmd.o aoedev.o aoemain.o aoenet.o +aoe-y := aoeblk.o aoechr.o aoecmd.o aoedev.o aoemain.o aoenet.o -- cgit v1.2.3-18-g5258 From 68264e9d6781f7163e92c517769bb470fa43f6cd Mon Sep 17 00:00:00 2001 From: "Stephen M. Cameron" Date: Wed, 19 Jan 2011 08:25:02 -0700 Subject: cciss: make cciss_revalidate not loop through CISS_MAX_LUNS volumes unnecessarily. Signed-off-by: Stephen M. Cameron Signed-off-by: Andrew Morton Signed-off-by: Jens Axboe --- drivers/block/cciss.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c index 516d5bbec2b..9279272b373 100644 --- a/drivers/block/cciss.c +++ b/drivers/block/cciss.c @@ -2833,7 +2833,7 @@ static int cciss_revalidate(struct gendisk *disk) sector_t total_size; InquiryData_struct *inq_buff = NULL; - for (logvol = 0; logvol < CISS_MAX_LUN; logvol++) { + for (logvol = 0; logvol <= h->highest_lun; logvol++) { if (!h->drv[logvol]) continue; if (memcmp(h->drv[logvol]->LunID, drv->LunID, -- cgit v1.2.3-18-g5258 From ba5bd520f679c450fb6efa439618703bd0956daa Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Wed, 19 Jan 2011 08:25:02 -0700 Subject: cfq: rename a function to give it more appropriate name o Rename a function to give it more approprate name. We are calculating cfq queue slice and function name gives the impression as if cfq group slice length is being calculated. Signed-off-by: Vivek Goyal Signed-off-by: Jens Axboe --- block/cfq-iosched.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 501ffdf0399..ace16865713 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -599,7 +599,7 @@ cfq_group_slice(struct cfq_data *cfqd, struct cfq_group *cfqg) } static inline unsigned -cfq_scaled_group_slice(struct cfq_data *cfqd, struct cfq_queue *cfqq) +cfq_scaled_cfqq_slice(struct cfq_data *cfqd, struct cfq_queue *cfqq) { unsigned slice = cfq_prio_to_slice(cfqd, cfqq); if (cfqd->cfq_latency) { @@ -631,7 +631,7 @@ cfq_scaled_group_slice(struct cfq_data *cfqd, struct cfq_queue *cfqq) static inline void cfq_set_prio_slice(struct cfq_data *cfqd, struct cfq_queue *cfqq) { - unsigned slice = cfq_scaled_group_slice(cfqd, cfqq); + unsigned slice = cfq_scaled_cfqq_slice(cfqd, cfqq); cfqq->slice_start = jiffies; cfqq->slice_end = jiffies + slice; @@ -1671,7 +1671,7 @@ __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq, */ if (timed_out) { if (cfq_cfqq_slice_new(cfqq)) - cfqq->slice_resid = cfq_scaled_group_slice(cfqd, cfqq); + cfqq->slice_resid = cfq_scaled_cfqq_slice(cfqd, cfqq); else cfqq->slice_resid = cfqq->slice_end - jiffies; cfq_log_cfqq(cfqd, cfqq, "resid=%ld", cfqq->slice_resid); -- cgit v1.2.3-18-g5258 From be2c6b1990904dbd43f3d9b90fa2c530504375cd Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Wed, 19 Jan 2011 08:25:02 -0700 Subject: blkio-throttle: Avoid calling blkiocg_lookup_group() for root group o Jeff Moyer was doing some testing on a RAM backed disk and blkiocg_lookup_group() showed up high overhead after memcpy(). Similarly somebody else reported that blkiocg_lookup_group() is eating 6% extra cpu. Though looking at the code I can't think why the overhead of this function is so high. One thing is that it is called with very high frequency (once for every IO). o For lot of folks blkio controller will be compiled in but they might not have actually created cgroups. Hence optimize the case of root cgroup where we can avoid calling blkiocg_lookup_group() if IO is happening in root group (common case). Reported-by: Jeff Moyer Signed-off-by: Vivek Goyal Acked-by: Jeff Moyer Signed-off-by: Jens Axboe --- block/blk-throttle.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 381b09bb562..a89043a3caa 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -168,7 +168,15 @@ static struct throtl_grp * throtl_find_alloc_tg(struct throtl_data *td, * tree of blkg (instead of traversing through hash list all * the time. */ - tg = tg_of_blkg(blkiocg_lookup_group(blkcg, key)); + + /* + * This is the common case when there are no blkio cgroups. + * Avoid lookup in this case + */ + if (blkcg == &blkio_root_cgroup) + tg = &td->root_tg; + else + tg = tg_of_blkg(blkiocg_lookup_group(blkcg, key)); /* Fill in device details for root group */ if (tg && !tg->blkg.dev && bdi->dev && dev_name(bdi->dev)) { -- cgit v1.2.3-18-g5258 From 02a8f01b5a9f396d0327977af4c232d0f94c45fd Mon Sep 17 00:00:00 2001 From: Justin TerAvest Date: Wed, 9 Feb 2011 14:20:03 +0100 Subject: cfq-iosched: Don't wait if queue already has requests. Commit 7667aa0630407bc07dc38dcc79d29cc0a65553c1 added logic to wait for the last queue of the group to become busy (have at least one request), so that the group does not lose out for not being continuously backlogged. The commit did not check for the condition that the last queue already has some requests. As a result, if the queue already has requests, wait_busy is set. Later on, cfq_select_queue() checks the flag, and decides that since the queue has a request now and wait_busy is set, the queue is expired. This results in early expiration of the queue. This patch fixes the problem by adding a check to see if queue already has requests. If it does, wait_busy is not set. As a result, time slices do not expire early. The queues with more than one request are usually buffered writers. Testing shows improvement in isolation between buffered writers. Cc: stable@kernel.org Signed-off-by: Justin TerAvest Reviewed-by: Gui Jianfeng Acked-by: Vivek Goyal Signed-off-by: Jens Axboe --- block/cfq-iosched.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index ace16865713..7be4c795962 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -3432,6 +3432,10 @@ static bool cfq_should_wait_busy(struct cfq_data *cfqd, struct cfq_queue *cfqq) { struct cfq_io_context *cic = cfqd->active_cic; + /* If the queue already has requests, don't wait */ + if (!RB_EMPTY_ROOT(&cfqq->sort_list)) + return false; + /* If there are other queues in the group, don't wait */ if (cfqq->cfqg->nr_cfqq > 1) return false; -- cgit v1.2.3-18-g5258 From b8cf0e0e552ca48e9a00f518aeb4f5e03984022b Mon Sep 17 00:00:00 2001 From: Simon Arlott Date: Wed, 9 Feb 2011 14:21:07 +0100 Subject: cdrom: support devices that have check_events but not media_changed Commit 93aae17af1172c40c6f74b7294e93a90c3cfaa5d ("sr: implement sr_check_events()") replaced the media_changed op with the check_events op in drivers/scsi/sr.c All users that check for the CDC_MEDIA_CHANGED capbility try both the check_events op and the media_changed op, but register_cdrom() was requiring media_changed. This patch fixes the capability checking. The cdrom_select_disc ioctl is also using the two operations, so they should be required for CDC_SELECT_DISC too. Signed-off-by: Simon Arlott Cc: Tejun Heo Cc: Kay Sievers Tested-by: Chris Clayton Signed-off-by: Jens Axboe --- drivers/cdrom/cdrom.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c index 14033a36bcd..e2c48a7eccf 100644 --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -409,7 +409,8 @@ int register_cdrom(struct cdrom_device_info *cdi) } ENSURE(drive_status, CDC_DRIVE_STATUS ); - ENSURE(media_changed, CDC_MEDIA_CHANGED); + if (cdo->check_events == NULL && cdo->media_changed == NULL) + *change_capability = ~(CDC_MEDIA_CHANGED | CDC_SELECT_DISC); ENSURE(tray_move, CDC_CLOSE_TRAY | CDC_OPEN_TRAY); ENSURE(lock_door, CDC_LOCK); ENSURE(select_speed, CDC_SELECT_SPEED); -- cgit v1.2.3-18-g5258