<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/block, branch v3.4.96</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/block?h=v3.4.96</id>
<link rel='self' href='https://git.amat.us/linux/atom/block?h=v3.4.96'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2014-05-18T12:25:55Z</updated>
<entry>
<title>blktrace: fix accounting of partially completed requests</title>
<updated>2014-05-18T12:25:55Z</updated>
<author>
<name>Roman Pen</name>
<email>r.peniaev@gmail.com</email>
</author>
<published>2014-03-04T14:13:10Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=14eee5bd065d6aac0acbdc6092a25ba68c55b9c8'/>
<id>urn:sha1:14eee5bd065d6aac0acbdc6092a25ba68c55b9c8</id>
<content type='text'>
commit af5040da01ef980670b3741b3e10733ee3e33566 upstream.

trace_block_rq_complete does not take into account that request can
be partially completed, so we can get the following incorrect output
of blkparser:

  C   R 232 + 240 [0]
  C   R 240 + 232 [0]
  C   R 248 + 224 [0]
  C   R 256 + 216 [0]

but should be:

  C   R 232 + 8 [0]
  C   R 240 + 8 [0]
  C   R 248 + 8 [0]
  C   R 256 + 8 [0]

Also, the whole output summary statistics of completed requests and
final throughput will be incorrect.

This patch takes into account real completion size of the request and
fixes wrong completion accounting.

Signed-off-by: Roman Pen &lt;r.peniaev@gmail.com&gt;
CC: Steven Rostedt &lt;rostedt@goodmis.org&gt;
CC: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
CC: Ingo Molnar &lt;mingo@redhat.com&gt;
CC: linux-kernel@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: Don't access request after it might be freed</title>
<updated>2014-03-11T23:10:06Z</updated>
<author>
<name>Roland Dreier</name>
<email>roland@purestorage.com</email>
</author>
<published>2012-11-22T10:00:11Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=0ef4c881e4ade7a45174d9cf45eca6f9e10f5ccc'/>
<id>urn:sha1:0ef4c881e4ade7a45174d9cf45eca6f9e10f5ccc</id>
<content type='text'>
commit 893d290f1d7496db97c9471bc352ad4a11dc8a25 upstream.

After we've done __elv_add_request() and __blk_run_queue() in
blk_execute_rq_nowait(), the request might finish and be freed
immediately.  Therefore checking if the type is REQ_TYPE_PM_RESUME
isn't safe afterwards, because if it isn't, rq might be gone.
Instead, check beforehand and stash the result in a temporary.

This fixes crashes in blk_execute_rq_nowait() I get occasionally when
running with lots of memory debugging options enabled -- I think this
race is usually harmless because the window for rq to be reallocated
is so small.

Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
[xr: Backported to 3.4: adjust context]
Signed-off-by: Rui Xiang &lt;rui.xiang@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: add cond_resched() to potentially long running ioctl discard loop</title>
<updated>2014-02-22T18:32:46Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2014-02-12T16:34:01Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=c5bdb3a52d40d1ef3b3f6fb3617d20bb47cee46c'/>
<id>urn:sha1:c5bdb3a52d40d1ef3b3f6fb3617d20bb47cee46c</id>
<content type='text'>
commit c8123f8c9cb517403b51aa41c3c46ff5e10b2c17 upstream.

When mkfs issues a full device discard and the device only
supports discards of a smallish size, we can loop in
blkdev_issue_discard() for a long time. If preempt isn't enabled,
this can turn into a softlock situation and the kernel will
start complaining.

Add an explicit cond_resched() at the end of the loop to avoid
that.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>blk-core: Fix memory corruption if blkcg_init_queue fails</title>
<updated>2013-12-08T15:29:43Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2013-10-14T16:11:36Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=78530a1aaf9274a3fb6f958f27f7fab302c4e961'/>
<id>urn:sha1:78530a1aaf9274a3fb6f958f27f7fab302c4e961</id>
<content type='text'>
commit fff4996b7db7955414ac74386efa5e07fd766b50 upstream.

If blkcg_init_queue fails, blk_alloc_queue_node doesn't call bdi_destroy
to clean up structures allocated by the backing dev.

------------[ cut here ]------------
WARNING: at lib/debugobjects.c:260 debug_print_object+0x85/0xa0()
ODEBUG: free active (active state 0) object type: percpu_counter hint:           (null)
Modules linked in: dm_loop dm_mod ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev ipt_MASQUERADE iptable_nat nf_nat_ipv4 msr nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand cpufreq_conservative spadfs fuse hid_generic usbhid hid raid0 md_mod dmi_sysfs nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack lm85 hwmon_vid snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep snd_usbmidi_lib snd_rawmidi snd soundcore acpi_cpufreq freq_table mperf sata_svw serverworks kvm_amd ide_core ehci_pci ohci_hcd libata ehci_hcd kvm usbcore tg3 usb_common libphy k10temp pcspkr ptp i2c_piix4 i2c_core evdev microcode hwmon rtc_cmos pps_core e100 skge floppy mii processor button unix
CPU: 0 PID: 2739 Comm: lvchange Tainted: G        W
3.10.15-devel #14
Hardware name: empty empty/S3992-E, BIOS 'V1.06   ' 06/09/2009
 0000000000000009 ffff88023c3c1ae8 ffffffff813c8fd4 ffff88023c3c1b20
 ffffffff810399eb ffff88043d35cd58 ffffffff81651940 ffff88023c3c1bf8
 ffffffff82479d90 0000000000000005 ffff88023c3c1b80 ffffffff81039a67
Call Trace:
 [&lt;ffffffff813c8fd4&gt;] dump_stack+0x19/0x1b
 [&lt;ffffffff810399eb&gt;] warn_slowpath_common+0x6b/0xa0
 [&lt;ffffffff81039a67&gt;] warn_slowpath_fmt+0x47/0x50
 [&lt;ffffffff8122aaaf&gt;] ? debug_check_no_obj_freed+0xcf/0x250
 [&lt;ffffffff81229a15&gt;] debug_print_object+0x85/0xa0
 [&lt;ffffffff8122abe3&gt;] debug_check_no_obj_freed+0x203/0x250
 [&lt;ffffffff8113c4ac&gt;] kmem_cache_free+0x20c/0x3a0
 [&lt;ffffffff811f6709&gt;] blk_alloc_queue_node+0x2a9/0x2c0
 [&lt;ffffffff811f672e&gt;] blk_alloc_queue+0xe/0x10
 [&lt;ffffffffa04c0093&gt;] dm_create+0x1a3/0x530 [dm_mod]
 [&lt;ffffffffa04c6bb0&gt;] ? list_version_get_info+0xe0/0xe0 [dm_mod]
 [&lt;ffffffffa04c6c07&gt;] dev_create+0x57/0x2b0 [dm_mod]
 [&lt;ffffffffa04c6bb0&gt;] ? list_version_get_info+0xe0/0xe0 [dm_mod]
 [&lt;ffffffffa04c6bb0&gt;] ? list_version_get_info+0xe0/0xe0 [dm_mod]
 [&lt;ffffffffa04c6528&gt;] ctl_ioctl+0x268/0x500 [dm_mod]
 [&lt;ffffffff81097662&gt;] ? get_lock_stats+0x22/0x70
 [&lt;ffffffffa04c67ce&gt;] dm_ctl_ioctl+0xe/0x20 [dm_mod]
 [&lt;ffffffff81161aad&gt;] do_vfs_ioctl+0x2ed/0x520
 [&lt;ffffffff8116cfc7&gt;] ? fget_light+0x377/0x4e0
 [&lt;ffffffff81161d2b&gt;] SyS_ioctl+0x4b/0x90
 [&lt;ffffffff813cff16&gt;] system_call_fastpath+0x1a/0x1f
---[ end trace 4b5ff0d55673d986 ]---
------------[ cut here ]------------

This fix should be backported to stable kernels starting with 2.6.37. Note
that in the kernels prior to 3.5 the affected code is different, but the
bug is still there - bdi_init is called and bdi_destroy isn't.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>elevator: acquire q-&gt;sysfs_lock in elevator_change()</title>
<updated>2013-12-08T15:29:43Z</updated>
<author>
<name>Tomoki Sekiyama</name>
<email>tomoki.sekiyama@hds.com</email>
</author>
<published>2013-10-15T22:42:19Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9e23d8bd64e49062faf4aa4abcedd3943cf1d09d'/>
<id>urn:sha1:9e23d8bd64e49062faf4aa4abcedd3943cf1d09d</id>
<content type='text'>
commit 7c8a3679e3d8e9d92d58f282161760a0e247df97 upstream.

Add locking of q-&gt;sysfs_lock into elevator_change() (an exported function)
to ensure it is held to protect q-&gt;elevator from elevator_init(), even if
elevator_change() is called from non-sysfs paths.
sysfs path (elv_iosched_store) uses __elevator_change(), non-locking
version, as the lock is already taken by elv_iosched_store().

Signed-off-by: Tomoki Sekiyama &lt;tomoki.sekiyama@hds.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Josh Boyer &lt;jwboyer@fedoraproject.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: properly stack underlying max_segment_size to DM device</title>
<updated>2013-11-29T18:50:36Z</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2013-10-18T15:44:49Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=801327994060b0ae3749257d264a7bfea119b46c'/>
<id>urn:sha1:801327994060b0ae3749257d264a7bfea119b46c</id>
<content type='text'>
commit d82ae52e68892338068e7559a0c0657193341ce4 upstream.

Without this patch all DM devices will default to BLK_MAX_SEGMENT_SIZE
(65536) even if the underlying device(s) have a larger value -- this is
due to blk_stack_limits() using min_not_zero() when stacking the
max_segment_size limit.

1073741824

before patch:
65536

after patch:
1073741824

Reported-by: Lukasz Flis &lt;l.flis@cyfronet.pl&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: fix race between request completion and timeout handling</title>
<updated>2013-11-29T18:50:35Z</updated>
<author>
<name>Jeff Moyer</name>
<email>jmoyer@redhat.com</email>
</author>
<published>2013-10-08T18:36:41Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=5413d6c032afde960c16c735eef40f9b85fa9132'/>
<id>urn:sha1:5413d6c032afde960c16c735eef40f9b85fa9132</id>
<content type='text'>
commit 4912aa6c11e6a5d910264deedbec2075c6f1bb73 upstream.

crocode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ioatdma dca be2net sg ses enclosure ext4 mbcache jbd2 sd_mod crc_t10dif ahci megaraid_sas(U) dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 491, comm: scsi_eh_0 Tainted: G        W  ----------------   2.6.32-220.13.1.el6.x86_64 #1 IBM  -[8722PAX]-/00D1461
RIP: 0010:[&lt;ffffffff8124e424&gt;]  [&lt;ffffffff8124e424&gt;] blk_requeue_request+0x94/0xa0
RSP: 0018:ffff881057eefd60  EFLAGS: 00010012
RAX: ffff881d99e3e8a8 RBX: ffff881d99e3e780 RCX: ffff881d99e3e8a8
RDX: ffff881d99e3e8a8 RSI: ffff881d99e3e780 RDI: ffff881d99e3e780
RBP: ffff881057eefd80 R08: ffff881057eefe90 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff881057f92338
R13: 0000000000000000 R14: ffff881057f92338 R15: ffff883058188000
FS:  0000000000000000(0000) GS:ffff880040200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006d3ec0 CR3: 000000302cd7d000 CR4: 00000000000406b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process scsi_eh_0 (pid: 491, threadinfo ffff881057eee000, task ffff881057e29540)
Stack:
 0000000000001057 0000000000000286 ffff8810275efdc0 ffff881057f16000
&lt;0&gt; ffff881057eefdd0 ffffffff81362323 ffff881057eefe20 ffffffff8135f393
&lt;0&gt; ffff881057e29af8 ffff8810275efdc0 ffff881057eefe78 ffff881057eefe90
Call Trace:
 [&lt;ffffffff81362323&gt;] __scsi_queue_insert+0xa3/0x150
 [&lt;ffffffff8135f393&gt;] ? scsi_eh_ready_devs+0x5e3/0x850
 [&lt;ffffffff81362a23&gt;] scsi_queue_insert+0x13/0x20
 [&lt;ffffffff8135e4d4&gt;] scsi_eh_flush_done_q+0x104/0x160
 [&lt;ffffffff8135fb6b&gt;] scsi_error_handler+0x35b/0x660
 [&lt;ffffffff8135f810&gt;] ? scsi_error_handler+0x0/0x660
 [&lt;ffffffff810908c6&gt;] kthread+0x96/0xa0
 [&lt;ffffffff8100c14a&gt;] child_rip+0xa/0x20
 [&lt;ffffffff81090830&gt;] ? kthread+0x0/0xa0
 [&lt;ffffffff8100c140&gt;] ? child_rip+0x0/0x20
Code: 00 00 eb d1 4c 8b 2d 3c 8f 97 00 4d 85 ed 74 bf 49 8b 45 00 49 83 c5 08 48 89 de 4c 89 e7 ff d0 49 8b 45 00 48 85 c0 75 eb eb a4 &lt;0f&gt; 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00
RIP  [&lt;ffffffff8124e424&gt;] blk_requeue_request+0x94/0xa0
 RSP &lt;ffff881057eefd60&gt;

The RIP is this line:
        BUG_ON(blk_queued_rq(rq));

After digging through the code, I think there may be a race between the
request completion and the timer handler running.

A timer is started for each request put on the device's queue (see
blk_start_request-&gt;blk_add_timer).  If the request does not complete
before the timer expires, the timer handler (blk_rq_timed_out_timer)
will mark the request complete atomically:

static inline int blk_mark_rq_complete(struct request *rq)
{
        return test_and_set_bit(REQ_ATOM_COMPLETE, &amp;rq-&gt;atomic_flags);
}

and then call blk_rq_timed_out.  The latter function will call
scsi_times_out, which will return one of BLK_EH_HANDLED,
BLK_EH_RESET_TIMER or BLK_EH_NOT_HANDLED.  If BLK_EH_RESET_TIMER is
returned, blk_clear_rq_complete is called, and blk_add_timer is again
called to simply wait longer for the request to complete.

Now, if the request happens to complete while this is going on, what
happens?  Given that we know the completion handler will bail if it
finds the REQ_ATOM_COMPLETE bit set, we need to focus on the completion
handler running after that bit is cleared.  So, from the above
paragraph, after the call to blk_clear_rq_complete.  If the completion
sets REQ_ATOM_COMPLETE before the BUG_ON in blk_add_timer, we go boom
there (I haven't seen this in the cores).  Next, if we get the
completion before the call to list_add_tail, then the timer will
eventually fire for an old req, which may either be freed or reallocated
(there is evidence that this might be the case).  Finally, if the
completion comes in *after* the addition to the timeout list, I think
it's harmless.  The request will be removed from the timeout list,
req_atom_complete will be set, and all will be well.

This will only actually explain the coredumps *IF* the request
structure was freed, reallocated *and* queued before the error handler
thread had a chance to process it.  That is possible, but it may make
sense to keep digging for another race.  I think that if this is what
was happening, we would see other instances of this problem showing up
as null pointer or garbage pointer dereferences, for example when the
request structure was not re-used.  It looks like we actually do run
into that situation in other reports.

This patch moves the BUG_ON(test_bit(REQ_ATOM_COMPLETE,
&amp;req-&gt;atomic_flags)); from blk_add_timer to the only caller that could
trip over it (blk_start_request).  It then inverts the calls to
blk_clear_rq_complete and blk_add_timer in blk_rq_timed_out to address
the race.  I've boot tested this patch, but nothing more.

Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Acked-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: do not pass disk names as format strings</title>
<updated>2013-07-13T18:03:41Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2013-07-03T22:01:14Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a8139b5b8b1355c4909d90afa58b055aabe1a272'/>
<id>urn:sha1:a8139b5b8b1355c4909d90afa58b055aabe1a272</id>
<content type='text'>
commit ffc8b30866879ed9ba62bd0a86fecdbd51cd3d19 upstream.

Disk names may contain arbitrary strings, so they must not be
interpreted as format strings.  It seems that only md allows arbitrary
strings to be used for disk names, but this could allow for a local
memory corruption from uid 0 into ring 0.

CVE-2013-2851

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: avoid using uninitialized value in from queue_var_store</title>
<updated>2013-04-12T16:38:46Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2013-04-03T19:53:57Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=2570f532003a1e9b955799e3c48041e08e9a189e'/>
<id>urn:sha1:2570f532003a1e9b955799e3c48041e08e9a189e</id>
<content type='text'>
commit c678ef5286ddb5cf70384ad5af286b0afc9b73e1 upstream.

As found by gcc-4.8, the QUEUE_SYSFS_BIT_FNS macro creates functions
that use a value generated by queue_var_store independent of whether
that value was set or not.

block/blk-sysfs.c: In function 'queue_store_nonrot':
block/blk-sysfs.c:244:385: warning: 'val' may be used uninitialized in this function [-Wmaybe-uninitialized]

Unlike most other such warnings, this one is not a false positive,
writing any non-number string into the sysfs files indeed has
an undefined result, rather than returning an error.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>block: fix synchronization and limit check in blk_alloc_devt()</title>
<updated>2013-03-03T22:06:41Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-02-28T01:03:56Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=303ee54c72f488b90c2355977c1619a08db6ed9c'/>
<id>urn:sha1:303ee54c72f488b90c2355977c1619a08db6ed9c</id>
<content type='text'>
commit ce23bba842aee98092225d9576dba47c82352521 upstream.

idr allocation in blk_alloc_devt() wasn't synchronized against lookup
and removal, and its limit check was off by one - 1 &lt;&lt; MINORBITS is
the number of minors allowed, not the maximum allowed minor.

Add locking and rename MAX_EXT_DEVT to NR_EXT_DEVT and fix limit
checking.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
