<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/block, branch v3.7-rc5</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/block?h=v3.7-rc5</id>
<link rel='self' href='https://git.amat.us/linux/atom/block?h=v3.7-rc5'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2012-10-25T19:58:17Z</updated>
<entry>
<title>block: Add blk_rq_pos(rq) to sort rq when plushing</title>
<updated>2012-10-25T19:58:17Z</updated>
<author>
<name>Jianpeng Ma</name>
<email>majianpeng@gmail.com</email>
</author>
<published>2012-10-25T19:58:17Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=975927b942c932bd839ed07e5d40b4037d816844'/>
<id>urn:sha1:975927b942c932bd839ed07e5d40b4037d816844</id>
<content type='text'>
My workload is a raid5 which had 16 disks. And used our filesystem to
write using direct-io mode.

I used the blktrace to find those message:
8,16   0     6647     2.453665504  2579  M   W 7493152 + 8 [md0_raid5]
8,16   0     6648     2.453672411  2579  Q   W 7493160 + 8 [md0_raid5]
8,16   0     6649     2.453672606  2579  M   W 7493160 + 8 [md0_raid5]
8,16   0     6650     2.453679255  2579  Q   W 7493168 + 8 [md0_raid5]
8,16   0     6651     2.453679441  2579  M   W 7493168 + 8 [md0_raid5]
8,16   0     6652     2.453685948  2579  Q   W 7493176 + 8 [md0_raid5]
8,16   0     6653     2.453686149  2579  M   W 7493176 + 8 [md0_raid5]
8,16   0     6654     2.453693074  2579  Q   W 7493184 + 8 [md0_raid5]
8,16   0     6655     2.453693254  2579  M   W 7493184 + 8 [md0_raid5]
8,16   0     6656     2.453704290  2579  Q   W 7493192 + 8 [md0_raid5]
8,16   0     6657     2.453704482  2579  M   W 7493192 + 8 [md0_raid5]
8,16   0     6658     2.453715016  2579  Q   W 7493200 + 8 [md0_raid5]
8,16   0     6659     2.453715247  2579  M   W 7493200 + 8 [md0_raid5]
8,16   0     6660     2.453721730  2579  Q   W 7493208 + 8 [md0_raid5]
8,16   0     6661     2.453721974  2579  M   W 7493208 + 8 [md0_raid5]
8,16   0     6662     2.453728202  2579  Q   W 7493216 + 8 [md0_raid5]
8,16   0     6663     2.453728436  2579  M   W 7493216 + 8 [md0_raid5]
8,16   0     6664     2.453734782  2579  Q   W 7493224 + 8 [md0_raid5]
8,16   0     6665     2.453735019  2579  M   W 7493224 + 8 [md0_raid5]
8,16   0     6666     2.453741401  2579  Q   W 7493232 + 8 [md0_raid5]
8,16   0     6667     2.453741632  2579  M   W 7493232 + 8 [md0_raid5]
8,16   0     6668     2.453748148  2579  Q   W 7493240 + 8 [md0_raid5]
8,16   0     6669     2.453748386  2579  M   W 7493240 + 8 [md0_raid5]
8,16   0     6670     2.453851843  2579  I   W 7493144 + 104 [md0_raid5]
8,16   0        0     2.453853661     0  m   N cfq2579 insert_request
8,16   0     6671     2.453854064  2579  I   W 7493120 + 24 [md0_raid5]
8,16   0        0     2.453854439     0  m   N cfq2579 insert_request
8,16   0     6672     2.453854793  2579  U   N [md0_raid5] 2
8,16   0        0     2.453855513     0  m   N cfq2579 Not idling.st-&gt;count:1
8,16   0        0     2.453855927     0  m   N cfq2579 dispatch_insert
8,16   0        0     2.453861771     0  m   N cfq2579 dispatched a request
8,16   0        0     2.453862248     0  m   N cfq2579 activate rq,drv=1
8,16   0     6673     2.453862332  2579  D   W 7493120 + 24 [md0_raid5]
8,16   0        0     2.453865957     0  m   N cfq2579 Not idling.st-&gt;count:1
8,16   0        0     2.453866269     0  m   N cfq2579 dispatch_insert
8,16   0        0     2.453866707     0  m   N cfq2579 dispatched a request
8,16   0        0     2.453867061     0  m   N cfq2579 activate rq,drv=2
8,16   0     6674     2.453867145  2579  D   W 7493144 + 104 [md0_raid5]
8,16   0     6675     2.454147608     0  C   W 7493120 + 24 [0]
8,16   0        0     2.454149357     0  m   N cfq2579 complete rqnoidle 0
8,16   0     6676     2.454791505     0  C   W 7493144 + 104 [0]
8,16   0        0     2.454794803     0  m   N cfq2579 complete rqnoidle 0
8,16   0        0     2.454795160     0  m   N cfq schedule dispatch

From above messages,we can find rq[W 7493144 + 104] and rq[W
7493120 + 24] do not merge.
Because the bio order is:
  8,16   0     6638     2.453619407  2579  Q   W 7493144 + 8 [md0_raid5]
  8,16   0     6639     2.453620460  2579  G   W 7493144 + 8 [md0_raid5]
  8,16   0     6640     2.453639311  2579  Q   W 7493120 + 8 [md0_raid5]
  8,16   0     6641     2.453639842  2579  G   W 7493120 + 8 [md0_raid5]
The bio(7493144) first and bio(7493120) later.So the subsequent
bios will be divided into two parts.
When flushing plug-list,because elv_attempt_insert_merge only support
backmerge,not supporting frontmerge.
So rq[7493120 + 24] can't merge with rq[7493144 + 104].

From my test,i found those situation can count 25% in our system.
Using this patch, there is no this situation.

Signed-off-by: Jianpeng Ma &lt;majianpeng@gmail.com&gt;
CC:Shaohua Li &lt;shli@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: remove CONFIG_EXPERIMENTAL</title>
<updated>2012-10-23T20:30:34Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2012-10-23T20:01:46Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=8e42e0a23d30ba84d8e946042ee82aac4934048a'/>
<id>urn:sha1:8e42e0a23d30ba84d8e946042ee82aac4934048a</id>
<content type='text'>
This config item has not carried much meaning for a while now and is
almost always enabled by default. As agreed during the Linux kernel
summit, remove it.

CC: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blkcg: stop iteration early if root_rl is the only request list</title>
<updated>2012-10-22T20:00:26Z</updated>
<author>
<name>Jun'ichi Nomura</name>
<email>j-nomura@ce.jp.nec.com</email>
</author>
<published>2012-10-22T01:15:37Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=65c77fd9e8a1c8c3da0bbbea6b7efa3d6ef265f8'/>
<id>urn:sha1:65c77fd9e8a1c8c3da0bbbea6b7efa3d6ef265f8</id>
<content type='text'>
__blk_queue_next_rl() finds next request list based on blkg_list
while skipping root_blkg in the list.
OTOH, root_rl is special as it may exist even without root_blkg.

Though the later part of the function handles such a case correctly,
exiting early is good for readability of the code.

Signed-off-by: Jun'ichi Nomura &lt;j-nomura@ce.jp.nec.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blkcg: Fix use-after-free of q-&gt;root_blkg and q-&gt;root_rl.blkg</title>
<updated>2012-10-22T20:00:26Z</updated>
<author>
<name>Jun'ichi Nomura</name>
<email>j-nomura@ce.jp.nec.com</email>
</author>
<published>2012-10-17T08:45:36Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=65635cbc37e011e71b208257a25e7c1078cd039b'/>
<id>urn:sha1:65635cbc37e011e71b208257a25e7c1078cd039b</id>
<content type='text'>
blk_put_rl() does not call blkg_put() for q-&gt;root_rl because we
don't take request list reference on q-&gt;root_blkg.
However, if root_blkg is once attached then detached (freed),
blk_put_rl() is confused by the bogus pointer in q-&gt;root_blkg.

For example, with !CONFIG_BLK_DEV_THROTTLING &amp;&amp;
CONFIG_CFQ_GROUP_IOSCHED,
switching IO scheduler from cfq to deadline will cause system stall
after the following warning with 3.6:

&gt; WARNING: at /work/build/linux/block/blk-cgroup.h:250
&gt; blk_put_rl+0x4d/0x95()
&gt; Modules linked in: bridge stp llc sunrpc acpi_cpufreq freq_table mperf
&gt; ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
&gt; Pid: 0, comm: swapper/0 Not tainted 3.6.0 #1
&gt; Call Trace:
&gt;  &lt;IRQ&gt;  [&lt;ffffffff810453bd&gt;] warn_slowpath_common+0x85/0x9d
&gt;  [&lt;ffffffff810453ef&gt;] warn_slowpath_null+0x1a/0x1c
&gt;  [&lt;ffffffff811d5f8d&gt;] blk_put_rl+0x4d/0x95
&gt;  [&lt;ffffffff811d614a&gt;] __blk_put_request+0xc3/0xcb
&gt;  [&lt;ffffffff811d71a3&gt;] blk_finish_request+0x232/0x23f
&gt;  [&lt;ffffffff811d76c3&gt;] ? blk_end_bidi_request+0x34/0x5d
&gt;  [&lt;ffffffff811d76d1&gt;] blk_end_bidi_request+0x42/0x5d
&gt;  [&lt;ffffffff811d7728&gt;] blk_end_request+0x10/0x12
&gt;  [&lt;ffffffff812cdf16&gt;] scsi_io_completion+0x207/0x4d5
&gt;  [&lt;ffffffff812c6fcf&gt;] scsi_finish_command+0xfa/0x103
&gt;  [&lt;ffffffff812ce2f8&gt;] scsi_softirq_done+0xff/0x108
&gt;  [&lt;ffffffff811dcea5&gt;] blk_done_softirq+0x8d/0xa1
&gt;  [&lt;ffffffff810915d5&gt;] ?
&gt;  generic_smp_call_function_single_interrupt+0x9f/0xd7
&gt;  [&lt;ffffffff8104cf5b&gt;] __do_softirq+0x102/0x213
&gt;  [&lt;ffffffff8108a5ec&gt;] ? lock_release_holdtime+0xb6/0xbb
&gt;  [&lt;ffffffff8104d2b4&gt;] ? raise_softirq_irqoff+0x9/0x3d
&gt;  [&lt;ffffffff81424dfc&gt;] call_softirq+0x1c/0x30
&gt;  [&lt;ffffffff81011beb&gt;] do_softirq+0x4b/0xa3
&gt;  [&lt;ffffffff8104cdb0&gt;] irq_exit+0x53/0xd5
&gt;  [&lt;ffffffff8102d865&gt;] smp_call_function_single_interrupt+0x34/0x36
&gt;  [&lt;ffffffff8142486f&gt;] call_function_single_interrupt+0x6f/0x80
&gt;  &lt;EOI&gt;  [&lt;ffffffff8101800b&gt;] ? mwait_idle+0x94/0xcd
&gt;  [&lt;ffffffff81018002&gt;] ? mwait_idle+0x8b/0xcd
&gt;  [&lt;ffffffff81017811&gt;] cpu_idle+0xbb/0x114
&gt;  [&lt;ffffffff81401fbd&gt;] rest_init+0xc1/0xc8
&gt;  [&lt;ffffffff81401efc&gt;] ? csum_partial_copy_generic+0x16c/0x16c
&gt;  [&lt;ffffffff81cdbd3d&gt;] start_kernel+0x3d4/0x3e1
&gt;  [&lt;ffffffff81cdb79e&gt;] ? kernel_init+0x1f7/0x1f7
&gt;  [&lt;ffffffff81cdb2dd&gt;] x86_64_start_reservations+0xb8/0xbd
&gt;  [&lt;ffffffff81cdb3e3&gt;] x86_64_start_kernel+0x101/0x110

This patch clears q-&gt;root_blkg and q-&gt;root_rl.blkg when root blkg
is destroyed.

Signed-off-by: Jun'ichi Nomura &lt;j-nomura@ce.jp.nec.com&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: stable@kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-3.7/core' of git://git.kernel.dk/linux-block</title>
<updated>2012-10-11T00:04:23Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-10-11T00:04:23Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=ce40be7a820bb393ac4ac69865f018d2f4038cf0'/>
<id>urn:sha1:ce40be7a820bb393ac4ac69865f018d2f4038cf0</id>
<content type='text'>
Pull block IO update from Jens Axboe:
 "Core block IO bits for 3.7.  Not a huge round this time, it contains:

   - First series from Kent cleaning up and generalizing bio allocation
     and freeing.

   - WRITE_SAME support from Martin.

   - Mikulas patches to prevent O_DIRECT crashes when someone changes
     the block size of a device.

   - Make bio_split() work on data-less bio's (like trim/discards).

   - A few other minor fixups."

Fixed up silent semantic mis-merge as per Mikulas Patocka and Andrew
Morton.  It is due to the VM no longer using a prio-tree (see commit
6b2dbba8b6ac: "mm: replace vma prio_tree with an interval tree").

So make set_blocksize() use mapping_mapped() instead of open-coding the
internal VM knowledge that has changed.

* 'for-3.7/core' of git://git.kernel.dk/linux-block: (26 commits)
  block: makes bio_split support bio without data
  scatterlist: refactor the sg_nents
  scatterlist: add sg_nents
  fs: fix include/percpu-rwsem.h export error
  percpu-rw-semaphore: fix documentation typos
  fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared
  blockdev: turn a rw semaphore into a percpu rw semaphore
  Fix a crash when block device is read and block size is changed at the same time
  block: fix request_queue-&gt;flags initialization
  block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()
  block: ioctl to zero block ranges
  block: Make blkdev_issue_zeroout use WRITE SAME
  block: Implement support for WRITE SAME
  block: Consolidate command flag and queue limit checks for merges
  block: Clean up special command handling logic
  block/blk-tag.c: Remove useless kfree
  block: remove the duplicated setting for congestion_threshold
  block: reject invalid queue attribute values
  block: Add bio_clone_bioset(), bio_clone_kmalloc()
  block: Consolidate bio_alloc_bioset(), bio_kmalloc()
  ...
</content>
</entry>
<entry>
<title>Merge branch 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2012-10-02T17:52:28Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-10-02T17:52:28Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=68d47a137c3bef754923bccf73fb639c9b0bbd5e'/>
<id>urn:sha1:68d47a137c3bef754923bccf73fb639c9b0bbd5e</id>
<content type='text'>
Pull cgroup hierarchy update from Tejun Heo:
 "Currently, different cgroup subsystems handle nested cgroups
  completely differently.  There's no consistency among subsystems and
  the behaviors often are outright broken.

  People at least seem to agree that the broken hierarhcy behaviors need
  to be weeded out if any progress is gonna be made on this front and
  that the fallouts from deprecating the broken behaviors should be
  acceptable especially given that the current behaviors don't make much
  sense when nested.

  This patch makes cgroup emit warning messages if cgroups for
  subsystems with broken hierarchy behavior are nested to prepare for
  fixing them in the future.  This was put in a separate branch because
  more related changes were expected (didn't make it this round) and the
  memory cgroup wanted to pull in this and make changes on top."

* 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
</content>
</entry>
<entry>
<title>Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq</title>
<updated>2012-10-02T16:54:49Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-10-02T16:54:49Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=033d9959ed2dc1029217d4165f80a71702dc578e'/>
<id>urn:sha1:033d9959ed2dc1029217d4165f80a71702dc578e</id>
<content type='text'>
Pull workqueue changes from Tejun Heo:
 "This is workqueue updates for v3.7-rc1.  A lot of activities this
  round including considerable API and behavior cleanups.

   * delayed_work combines a timer and a work item.  The handling of the
     timer part has always been a bit clunky leading to confusing
     cancelation API with weird corner-case behaviors.  delayed_work is
     updated to use new IRQ safe timer and cancelation now works as
     expected.

   * Another deficiency of delayed_work was lack of the counterpart of
     mod_timer() which led to cancel+queue combinations or open-coded
     timer+work usages.  mod_delayed_work[_on]() are added.

     These two delayed_work changes make delayed_work provide interface
     and behave like timer which is executed with process context.

   * A work item could be executed concurrently on multiple CPUs, which
     is rather unintuitive and made flush_work() behavior confusing and
     half-broken under certain circumstances.  This problem doesn't
     exist for non-reentrant workqueues.  While non-reentrancy check
     isn't free, the overhead is incurred only when a work item bounces
     across different CPUs and even in simulated pathological scenario
     the overhead isn't too high.

     All workqueues are made non-reentrant.  This removes the
     distinction between flush_[delayed_]work() and
     flush_[delayed_]_work_sync().  The former is now as strong as the
     latter and the specified work item is guaranteed to have finished
     execution of any previous queueing on return.

   * In addition to the various bug fixes, Lai redid and simplified CPU
     hotplug handling significantly.

   * Joonsoo introduced system_highpri_wq and used it during CPU
     hotplug.

  There are two merge commits - one to pull in IRQ safe timer from
  tip/timers/core and the other to pull in CPU hotplug fixes from
  wq/for-3.6-fixes as Lai's hotplug restructuring depended on them."

Fixed a number of trivial conflicts, but the more interesting conflicts
were silent ones where the deprecated interfaces had been used by new
code in the merge window, and thus didn't cause any real data conflicts.

Tejun pointed out a few of them, I fixed a couple more.

* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits)
  workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending()
  workqueue: use cwq_set_max_active() helper for workqueue_set_max_active()
  workqueue: introduce cwq_set_max_active() helper for thaw_workqueues()
  workqueue: remove @delayed from cwq_dec_nr_in_flight()
  workqueue: fix possible stall on try_to_grab_pending() of a delayed work item
  workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback()
  workqueue: use __cpuinit instead of __devinit for cpu callbacks
  workqueue: rename manager_mutex to assoc_mutex
  workqueue: WORKER_REBIND is no longer necessary for idle rebinding
  workqueue: WORKER_REBIND is no longer necessary for busy rebinding
  workqueue: reimplement idle worker rebinding
  workqueue: deprecate __cancel_delayed_work()
  workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()
  workqueue: use mod_delayed_work() instead of __cancel + queue
  workqueue: use irqsafe timer for delayed_work
  workqueue: clean up delayed_work initializers and add missing one
  workqueue: make deferrable delayed_work initializer names consistent
  workqueue: cosmetic whitespace updates for macro definitions
  workqueue: deprecate system_nrt[_freezable]_wq
  workqueue: deprecate flush[_delayed]_work_sync()
  ...
</content>
</entry>
<entry>
<title>s390/partitions: make partition detection independent from DASD ioctls</title>
<updated>2012-09-26T13:45:05Z</updated>
<author>
<name>Stefan Weinhuber</name>
<email>wein@de.ibm.com</email>
</author>
<published>2012-08-31T08:52:13Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=46e8894786327caf52cf686e27ba0795bddfcd63'/>
<id>urn:sha1:46e8894786327caf52cf686e27ba0795bddfcd63</id>
<content type='text'>
In some usage scenarios it is desireable to work with disk images or
virtualized DASD devices. One problem that prevents such applications
is the partition detection in ibm.c. Currently it works only for
devices that support the BIODASDINFO2 ioctl, in other words, it only
works for devices that belong to the DASD device driver.

The information gained from the BIODASDINFO2 ioctl is only for a small
set of legacy cases abolutely necessary. All current VOL1, LNX1 and
CMS1 type of disk labels can be interpreted correctly without this
information, as long as the generic HDIO_GETGEO ioctl works and
provides a correct disk geometry.

This patch makes the ibm.c partition detection as independent as
possible from the BIODASDINFO2 ioctl. Only the following two cases are
still restricted to real DASDs:
- An FBA DASD, or LDL formatted ECKD DASD without any disk label.
- An old style LNX1 label (without large volume support) on a disk
  with inconsistent device geometry.

Signed-off-by: Stefan Weinhuber &lt;wein@de.ibm.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
</content>
</entry>
<entry>
<title>block: fix request_queue-&gt;flags initialization</title>
<updated>2012-09-21T13:33:12Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-09-20T21:09:30Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=60ea8226cbd5c8301f9a39edc574ddabcb8150e0'/>
<id>urn:sha1:60ea8226cbd5c8301f9a39edc574ddabcb8150e0</id>
<content type='text'>
A queue newly allocated with blk_alloc_queue_node() has only
QUEUE_FLAG_BYPASS set.  For request-based drivers,
blk_init_allocated_queue() is called and q-&gt;queue_flags is overwritten
with QUEUE_FLAG_DEFAULT which doesn't include BYPASS even though the
initial bypass is still in effect.

In blk_init_allocated_queue(), or QUEUE_FLAG_DEFAULT to q-&gt;queue_flags
instead of overwriting.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: stable@vger.kernel.org
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()</title>
<updated>2012-09-21T13:32:57Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-09-20T21:08:52Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=749fefe6778e98dfefe3b8bb72a93875196ec554'/>
<id>urn:sha1:749fefe6778e98dfefe3b8bb72a93875196ec554</id>
<content type='text'>
b82d4b197c ("blkcg: make request_queue bypassing on allocation") made
request_queues bypassed on allocation to avoid switching on and off
bypass mode on a queue being initialized.  Some drivers allocate and
then destroy a lot of queues without fully initializing them and
incurring bypass latency overhead on each of them could add upto
significant overhead.

Unfortunately, blk_init_allocated_queue() is never used by queues of
bio-based drivers, which means that all bio-based driver queues are in
bypass mode even after initialization and registration complete
successfully.

Due to the limited way request_queues are used by bio drivers, this
problem is hidden pretty well but it shows up when blk-throttle is
used in combination with a bio-based driver.  Trying to configure
(echoing to cgroupfs file) blk-throttle for a bio-based driver hangs
indefinitely in blkg_conf_prep() waiting for bypass mode to end.

This patch moves the initial blk_queue_bypass_end() call from
blk_init_allocated_queue() to blk_register_queue() which is called for
any userland-visible queues regardless of its type.

I believe this is correct because I don't think there is any block
driver which needs or wants working elevator and blk-cgroup on a queue
which isn't visible to userland.  If there are such users, we need a
different solution.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Joseph Glanville &lt;joseph.glanville@orionvm.com.au&gt;
Cc: stable@vger.kernel.org
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
