<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/block, branch v2.6.35</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/block?h=v2.6.35</id>
<link rel='self' href='https://git.amat.us/linux/atom/block?h=v2.6.35'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2010-06-24T06:14:22Z</updated>
<entry>
<title>block: Don't count_vm_events for discard bio in submit_bio.</title>
<updated>2010-06-24T06:14:22Z</updated>
<author>
<name>Tao Ma</name>
<email>tao.ma@oracle.com</email>
</author>
<published>2010-06-23T23:43:57Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=1b99973f1c82707e46e8cb9416865a1e955e8f8c'/>
<id>urn:sha1:1b99973f1c82707e46e8cb9416865a1e955e8f8c</id>
<content type='text'>
In submit_bio, we count vm events by check READ/WRITE.
But actually DISCARD_NOBARRIER also has the WRITE flag set.
It looks as if in blkdev_issue_discard, we also add a
page as the payload and the bio_has_data check isn't enough.
So add another check for discard bio.

Signed-off-by: Tao Ma &lt;tao.ma@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>cfq: fix recursive call in cfq_blkiocg_update_completion_stats()</title>
<updated>2010-06-21T07:10:55Z</updated>
<author>
<name>Jens Axboe</name>
<email>jaxboe@fusionio.com</email>
</author>
<published>2010-06-21T07:10:55Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9e495db1a1f931e82c9edccd677dd171be5b85d2'/>
<id>urn:sha1:9e495db1a1f931e82c9edccd677dd171be5b85d2</id>
<content type='text'>
e98ef89b has a typo, causing cfq_blkiocg_update_completion_stats()
to call itself instead of blkiocg_update_completion_stats().

Reported-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>cfq-iosched: Fixed boot warning with BLK_CGROUP=y and CFQ_GROUP_IOSCHED=n</title>
<updated>2010-06-18T17:57:47Z</updated>
<author>
<name>Vivek Goyal</name>
<email>vgoyal@redhat.com</email>
</author>
<published>2010-06-18T14:39:47Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=e98ef89b30b8a2e882b11d4965347015770f3627'/>
<id>urn:sha1:e98ef89b30b8a2e882b11d4965347015770f3627</id>
<content type='text'>
Hi Jens,

Few days back Ingo noticed a CFQ boot time warning. This patch fixes it.
The issue here is that with CFQ_GROUP_IOSCHED=n, CFQ should not really
be making blkio stat related calls.

&gt; Hm, it's still not entirely fixed, as of 2.6.35-rc2-00131-g7908a9e. With
&gt; some
&gt; configs i get bad spinlock warnings during bootup:
&gt;
&gt; [   28.968013] initcall net_olddevs_init+0x0/0x82 returned 0 after 93750
&gt; usecs
&gt; [   28.972003] calling  b44_init+0x0/0x55 @ 1
&gt; [   28.976009] bus: 'pci': add driver b44
&gt; [   28.976374]  sda:
&gt; [   28.978157] BUG: spinlock bad magic on CPU#1, async/0/117
&gt; [   28.980000]  lock: 7e1c5bbc, .magic: 00000000, .owner: &lt;none&gt;/-1, +.owner_cpu: 0
&gt; [   28.980000] Pid: 117, comm: async/0 Not tainted +2.6.35-rc2-tip-01092-g010e7ef-dirty #8183
&gt; [   28.980000] Call Trace:
&gt; [   28.980000]  [&lt;41ba6d55&gt;] ? printk+0x20/0x24
&gt; [   28.980000]  [&lt;4134b7b7&gt;] spin_bug+0x7c/0x87
&gt; [   28.980000]  [&lt;4134b853&gt;] do_raw_spin_lock+0x1e/0x123
&gt; [   28.980000]  [&lt;41ba92ca&gt;] ? _raw_spin_lock_irqsave+0x12/0x20
&gt; [   28.980000]  [&lt;41ba92d2&gt;] _raw_spin_lock_irqsave+0x1a/0x20
&gt; [   28.980000]  [&lt;4133476f&gt;] blkiocg_update_io_add_stats+0x25/0xfb
&gt; [   28.980000]  [&lt;41335dae&gt;] ? cfq_prio_tree_add+0xb1/0xc1
&gt; [   28.980000]  [&lt;41337bc7&gt;] cfq_insert_request+0x8c/0x425

Signed-off-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>cfq: Don't allow queue merges for queues that have no process references</title>
<updated>2010-06-17T18:17:35Z</updated>
<author>
<name>Jeff Moyer</name>
<email>jmoyer@redhat.com</email>
</author>
<published>2010-06-17T14:19:11Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=c10b61f0910466b4b99c266a7d76ac4390743fb5'/>
<id>urn:sha1:c10b61f0910466b4b99c266a7d76ac4390743fb5</id>
<content type='text'>
Hi,

A user reported a kernel bug when running a particular program that did
the following:

created 32 threads
- each thread took a mutex, grabbed a global offset, added a buffer size
  to that offset, released the lock
- read from the given offset in the file
- created a new thread to do the same
- exited

The result is that cfq's close cooperator logic would trigger, as the
threads were issuing I/O within the mean seek distance of one another.
This workload managed to routinely trigger a use after free bug when
walking the list of merge candidates for a particular cfqq
(cfqq-&gt;new_cfqq).  The logic used for merging queues looks like this:

static void cfq_setup_merge(struct cfq_queue *cfqq, struct cfq_queue *new_cfqq)
{
	int process_refs, new_process_refs;
	struct cfq_queue *__cfqq;

	/* Avoid a circular list and skip interim queue merges */
	while ((__cfqq = new_cfqq-&gt;new_cfqq)) {
		if (__cfqq == cfqq)
			return;
		new_cfqq = __cfqq;
	}

	process_refs = cfqq_process_refs(cfqq);
	/*
	 * If the process for the cfqq has gone away, there is no
	 * sense in merging the queues.
	 */
	if (process_refs == 0)
		return;

	/*
	 * Merge in the direction of the lesser amount of work.
	 */
	new_process_refs = cfqq_process_refs(new_cfqq);
	if (new_process_refs &gt;= process_refs) {
		cfqq-&gt;new_cfqq = new_cfqq;
		atomic_add(process_refs, &amp;new_cfqq-&gt;ref);
	} else {
		new_cfqq-&gt;new_cfqq = cfqq;
		atomic_add(new_process_refs, &amp;cfqq-&gt;ref);
	}
}

When a merge candidate is found, we add the process references for the
queue with less references to the queue with more.  The actual merging
of queues happens when a new request is issued for a given cfqq.  In the
case of the test program, it only does a single pread call to read in
1MB, so the actual merge never happens.

Normally, this is fine, as when the queue exits, we simply drop the
references we took on the other cfqqs in the merge chain:

	/*
	 * If this queue was scheduled to merge with another queue, be
	 * sure to drop the reference taken on that queue (and others in
	 * the merge chain).  See cfq_setup_merge and cfq_merge_cfqqs.
	 */
	__cfqq = cfqq-&gt;new_cfqq;
	while (__cfqq) {
		if (__cfqq == cfqq) {
			WARN(1, "cfqq-&gt;new_cfqq loop detected\n");
			break;
		}
		next = __cfqq-&gt;new_cfqq;
		cfq_put_queue(__cfqq);
		__cfqq = next;
	}

However, there is a hole in this logic.  Consider the following (and
keep in mind that each I/O keeps a reference to the cfqq):

q1-&gt;new_cfqq = q2   // q2 now has 2 process references
q3-&gt;new_cfqq = q2   // q2 now has 3 process references

// the process associated with q2 exits
// q2 now has 2 process references

// queue 1 exits, drops its reference on q2
// q2 now has 1 process reference

// q3 exits, so has 0 process references, and hence drops its references
// to q2, which leaves q2 also with 0 process references

q4 comes along and wants to merge with q3

q3-&gt;new_cfqq still points at q2!  We follow that link and end up at an
already freed cfqq.

So, the fix is to not follow a merge chain if the top-most queue does
not have a process reference, otherwise any queue in the chain could be
already freed.  I also changed the logic to disallow merging with a
queue that does not have any process references.  Previously, we did
this check for one of the merge candidates, but not the other.  That
doesn't really make sense.

Without the attached patch, my system would BUG within a couple of
seconds of running the reproducer program.  With the patch applied, my
system ran the program for over an hour without issues.

This addresses the following bugzilla:
    https://bugzilla.kernel.org/show_bug.cgi?id=16217

Thanks a ton to Phil Carns for providing the bug report and an excellent
reproducer.

[ Note for stable: this applies to 2.6.32/33/34 ].

Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Reported-by: Phil Carns &lt;carns@mcs.anl.gov&gt;
Cc: stable@kernel.org
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>block: fix DISCARD_BARRIER requests</title>
<updated>2010-06-17T08:10:53Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2010-06-17T07:54:32Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=fbbf055692aeb25c54c49d9ca84532de836fbba0'/>
<id>urn:sha1:fbbf055692aeb25c54c49d9ca84532de836fbba0</id>
<content type='text'>
Filesystems assume that DISCARD_BARRIER are full barriers, so that they
don't have to track in-progress discard operation when submitting new I/O.
But currently we only treat them as elevator barriers, which don't
actually do the nessecary queue drains.

Also remove the unlikely around both the DISCARD and BARRIER requests -
the happen far too often for a static mispredict.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>block: make blk_init_free_list and elevator_init idempotent</title>
<updated>2010-06-04T11:47:06Z</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2010-05-25T17:15:15Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=1abec4fdbb142e3ccb6ce99832fae42129134a96'/>
<id>urn:sha1:1abec4fdbb142e3ccb6ce99832fae42129134a96</id>
<content type='text'>
blk_init_allocated_queue_node may fail and the caller _could_ retry.
Accommodate the unlikely event that blk_init_allocated_queue_node is
called on an already initialized (possibly partially) request_queue.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>block: avoid unconditionally freeing previously allocated request_queue</title>
<updated>2010-06-04T11:47:06Z</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2010-06-03T17:34:52Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=c86d1b8ae622e1ea5d20e98bd72fbd7d9dd69016'/>
<id>urn:sha1:c86d1b8ae622e1ea5d20e98bd72fbd7d9dd69016</id>
<content type='text'>
On blk_init_allocated_queue_node failure, only free the request_queue if
it is wasn't previously allocated outside the block layer
(e.g. blk_init_queue_node was blk_init_allocated_queue_node caller).

This addresses an interface bug introduced by the following commit:
01effb0 block: allow initialization of previously allocated
request_queue

Otherwise the request_queue may be free'd out from underneath a caller
that is managing the request_queue directly (e.g. caller uses
blk_alloc_queue + blk_init_allocated_queue_node).

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>cfq-iosched: fix an oops caused by slab leak</title>
<updated>2010-05-25T08:17:26Z</updated>
<author>
<name>Shaohua Li</name>
<email>shaohua.li@intel.com</email>
</author>
<published>2010-05-25T08:16:53Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d02a2c077fb81f3224c770be62a318165b23b486'/>
<id>urn:sha1:d02a2c077fb81f3224c770be62a318165b23b486</id>
<content type='text'>
I got below oops when unloading cfq-iosched. Considering scenario:
queue A merge to B, C merge to D and B will be merged to D. Before B is merged
to D, we do split B. We should put B's reference for D.

[  807.768536] =============================================================================
[  807.768539] BUG cfq_queue: Objects remaining on kmem_cache_close()
[  807.768541] -----------------------------------------------------------------------------
[  807.768543]
[  807.768546] INFO: Slab 0xffffea0003e6b4e0 objects=26 used=1 fp=0xffff88011d584fd8 flags=0x200000000004082
[  807.768550] Pid: 5946, comm: rmmod Tainted: G        W   2.6.34-07097-gf4b87de-dirty #724
[  807.768552] Call Trace:
[  807.768560]  [&lt;ffffffff81104e8d&gt;] slab_err+0x8f/0x9d
[  807.768564]  [&lt;ffffffff811059e1&gt;] ? flush_cpu_slab+0x0/0x93
[  807.768569]  [&lt;ffffffff8164be52&gt;] ? add_preempt_count+0xe/0xca
[  807.768572]  [&lt;ffffffff8164bd9c&gt;] ? sub_preempt_count+0xe/0xb6
[  807.768577]  [&lt;ffffffff81648871&gt;] ? _raw_spin_unlock+0x15/0x30
[  807.768580]  [&lt;ffffffff8164bd9c&gt;] ? sub_preempt_count+0xe/0xb6
[  807.768584]  [&lt;ffffffff811061bc&gt;] list_slab_objects+0x9b/0x19f
[  807.768588]  [&lt;ffffffff8164bf0a&gt;] ? add_preempt_count+0xc6/0xca
[  807.768591]  [&lt;ffffffff81109e27&gt;] kmem_cache_destroy+0x13f/0x21d
[  807.768597]  [&lt;ffffffffa000ff13&gt;] cfq_slab_kill+0x1a/0x43 [cfq_iosched]
[  807.768601]  [&lt;ffffffffa000ffcf&gt;] cfq_exit+0x93/0x9e [cfq_iosched]
[  807.768606]  [&lt;ffffffff810973a2&gt;] sys_delete_module+0x1b1/0x219
[  807.768612]  [&lt;ffffffff8102fb5b&gt;] system_call_fastpath+0x16/0x1b
[  807.768618] INFO: Object 0xffff88011d584618 @offset=1560
[  807.768622] INFO: Allocated in cfq_get_queue+0x11e/0x274 [cfq_iosched] age=7173 cpu=1 pid=5496
[  807.768626] =============================================================================

Cc: stable@kernel.org
Signed-off-by: Shaohua Li &lt;shaohua.li@intel.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>block: Adjust elv_iosched_show to return "none" for bio-based DM</title>
<updated>2010-05-24T07:07:32Z</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2010-05-24T07:07:32Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=e36f724b4ae70e443a7d152929b60059cbfa1a26'/>
<id>urn:sha1:e36f724b4ae70e443a7d152929b60059cbfa1a26</id>
<content type='text'>
Bio-based DM doesn't use an elevator (queue is !blk_queue_stackable()).

Longer-term DM will not allocate an elevator for bio-based DM.  But even
then there will be small potential for an elevator to be allocated for
a request-based DM table only to have a bio-based table be loaded in the
end.

Displaying "none" for bio-based DM will help avoid user confusion.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>cfq-iosched: compact io_context radix_tree</title>
<updated>2010-05-24T07:06:59Z</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@openvz.org</email>
</author>
<published>2010-05-20T19:21:41Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=80b15c7389caa81a3860f9fc2ee47ec0ea572a63'/>
<id>urn:sha1:80b15c7389caa81a3860f9fc2ee47ec0ea572a63</id>
<content type='text'>
Use small consequent indexes as radix tree keys instead of sparse cfqd address.

This change will reduce radix tree depth from 11 (6 for 32-bit hosts)
to 1 if host have &lt;=64 disks under cfq control, or to 0 if there only one disk.
So, this patch save 10*560 bytes for each process (5*296 for 32-bit hosts)

For each cfqd allocate cic index from ida.
To unlink dead cic from tree without cfqd access store index into -&gt;key.
(bit 0 -- dead mark, bits 1..30 -- index: ida produce id in range 0..2^31-1)

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
</feed>
