<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/infiniband, branch v3.2.2</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/drivers/infiniband?h=v3.2.2</id>
<link rel='self' href='https://git.amat.us/linux/atom/drivers/infiniband?h=v3.2.2'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2012-01-12T19:29:24Z</updated>
<entry>
<title>IB/uverbs: Protect QP multicast list</title>
<updated>2012-01-12T19:29:24Z</updated>
<author>
<name>Eli Cohen</name>
<email>eli@dev.mellanox.co.il</email>
</author>
<published>2012-01-04T04:36:48Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=585ea9bc6752c41d669ea0f4cee8f3954df326e0'/>
<id>urn:sha1:585ea9bc6752c41d669ea0f4cee8f3954df326e0</id>
<content type='text'>
commit e214a0fe2b382fa302c036ecd6e6ffe99e3b9875 upstream.

Userspace verbs multicast attach/detach operations on a QP are done
while holding the rwsem of the QP for reading.  That's not sufficient
since a reader lock allows more than one reader to acquire the
lock.  However, multicast attach/detach does list manipulation that
can corrupt the list if multiple threads run in parallel.

Fix this by acquiring the rwsem as a writer to serialize attach/detach
operations.  Add idr_write_qp() and put_qp_write() to encapsulate
this.

This fixes oops seen when running applications that perform multicast
joins/leaves.

Reported by: Mike Dubman &lt;miked@mellanox.com&gt;
Signed-off-by: Eli Cohen &lt;eli@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>IB/qib: Fix a possible data corruption when receiving packets</title>
<updated>2012-01-12T19:29:23Z</updated>
<author>
<name>Ram Vepa</name>
<email>ram.vepa@qlogic.com</email>
</author>
<published>2011-12-23T13:01:43Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=0d878668d0a4663a4807d413532f3a99496eebf4'/>
<id>urn:sha1:0d878668d0a4663a4807d413532f3a99496eebf4</id>
<content type='text'>
commit eddfb675256f49d14e8c5763098afe3eb2c93701 upstream.

Prevent a receive data corruption by ensuring that the write to update
the rcvhdrheadn register to generate an interrupt is at the very end
of the receive processing.

Signed-off-by: Ramkrishna Vepa &lt;ram.vepa@qlogic.com&gt;
Signed-off-by: Mike Marciniszyn &lt;mike.marciniszyn@qlogic.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>Merge branches 'cma', 'mlx4' and 'qib' into for-next</title>
<updated>2011-12-19T17:19:49Z</updated>
<author>
<name>Roland Dreier</name>
<email>roland@purestorage.com</email>
</author>
<published>2011-12-19T17:19:49Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=480390c8f393b3c770b7b71faa094c733bd0ae09'/>
<id>urn:sha1:480390c8f393b3c770b7b71faa094c733bd0ae09</id>
<content type='text'>
</content>
</entry>
<entry>
<title>IB/qib: Correct sense on freectxts increment and decrement</title>
<updated>2011-12-19T17:19:34Z</updated>
<author>
<name>Mike Marciniszyn</name>
<email>mike.marciniszyn@qlogic.com</email>
</author>
<published>2011-12-02T17:41:30Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=29d1b16145e78e0f4af54751965c4a09e83bd872'/>
<id>urn:sha1:29d1b16145e78e0f4af54751965c4a09e83bd872</id>
<content type='text'>
Commit 53ab1c64983 ("IB/qib: Correct nfreectxts for multiple HCAs")
reversed the increments and decrements of dd-&gt;nfreectxts.  Fix it.

Reviewed-by: Ram Vepa &lt;ram.vepa@qlogic.com&gt;
Signed-off-by: Mike Marciniszyn &lt;mike.marciniszyn@qlogic.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
</entry>
<entry>
<title>RDMA/cma: Verify private data length</title>
<updated>2011-12-19T17:15:33Z</updated>
<author>
<name>Sean Hefty</name>
<email>sean.hefty@intel.com</email>
</author>
<published>2011-12-06T21:17:11Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=04ded1672402577cd3f390c764f3046cc704a42a'/>
<id>urn:sha1:04ded1672402577cd3f390c764f3046cc704a42a</id>
<content type='text'>
private_data_len is defined as a u8.  If the user specifies a large
private_data size (&gt; 220 bytes), we will calculate a total length that
exceeds 255, resulting in private_data_len wrapping back to 0.  This
can lead to overwriting random kernel memory.  Avoid this by verifying
that the resulting size fits into a u8.

Reported-by: B. Thery &lt;benjamin.thery@bull.net&gt;
Addresses: &lt;http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2335&gt;
Signed-off-by: Sean Hefty &lt;sean.hefty@intel.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
</entry>
<entry>
<title>IB/mlx4: Fix shutdown crash accessing a non-existent bitmap</title>
<updated>2011-12-06T18:47:37Z</updated>
<author>
<name>Roland Dreier</name>
<email>roland@purestorage.com</email>
</author>
<published>2011-12-06T18:47:37Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=4af3ce0de0c12e5c17811eaefad36ab8e146c0fd'/>
<id>urn:sha1:4af3ce0de0c12e5c17811eaefad36ab8e146c0fd</id>
<content type='text'>
Commit cfcde11c3d7a ("IB/mlx4: Use flow counters on IBoE ports") added
code that sets elements of counters[] to -1 if no counter is allocated,
but then goes ahead and passes every entry to mlx4_counter_free() on
shutdown.  This is a bad idea, especially if MLX4_DEV_CAP_FLAG_COUNTERS
isn't set so there isn't even an underlying bitmap to free from.

Tested-by: Sean Hefty &lt;sean.hefty@intel.com&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
</entry>
<entry>
<title>Merge branches 'cxgb4', 'ipoib', 'misc' and 'qib' into for-next</title>
<updated>2011-11-30T02:01:53Z</updated>
<author>
<name>Roland Dreier</name>
<email>roland@purestorage.com</email>
</author>
<published>2011-11-30T02:01:53Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a493f1a24a496711d96b91c4dc0a1bd35eb6954b'/>
<id>urn:sha1:a493f1a24a496711d96b91c4dc0a1bd35eb6954b</id>
<content type='text'>
</content>
</entry>
<entry>
<title>IB: Fix RCU lockdep splats</title>
<updated>2011-11-29T21:37:11Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-11-29T21:31:23Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=580da35a31f91a594f3090b7a2c39b85cb051a12'/>
<id>urn:sha1:580da35a31f91a594f3090b7a2c39b85cb051a12</id>
<content type='text'>
Commit f2c31e32b37 ("net: fix NULL dereferences in check_peer_redir()")
forgot to take care of infiniband uses of dst neighbours.

Many thanks to Marc Aurele who provided a nice bug report and feedback.

Reported-by: Marc Aurele La France &lt;tsi@ualberta.ca&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
</entry>
<entry>
<title>IB/ipoib: Prevent hung task or softlockup processing multicast response</title>
<updated>2011-11-29T21:20:02Z</updated>
<author>
<name>Mike Marciniszyn</name>
<email>mike.marciniszyn@qlogic.com</email>
</author>
<published>2011-11-21T13:43:54Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=3874397c0bdec3c21ce071711cd105165179b8eb'/>
<id>urn:sha1:3874397c0bdec3c21ce071711cd105165179b8eb</id>
<content type='text'>
This following can occur with ipoib when processing a multicast reponse:

    BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982]
    Modules linked in: ...
    CPU 0:
    Modules linked in: ...
    Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5
    RIP: 0010:[&lt;ffffffff814ddb27&gt;]  [&lt;ffffffff814ddb27&gt;] _spin_unlock_irqrestore+0x17/0x20
    RSP: 0018:ffff8802119ed860  EFLAGS: 00000246
    0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299
    RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246
    RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000
    R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860
    R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003
    FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Call Trace:
    [&lt;ffffffffa032c247&gt;] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib]
    [&lt;ffffffff8100bc8e&gt;] ? apic_timer_interrupt+0xe/0x20
    [&lt;ffffffff8100bc8e&gt;] ? apic_timer_interrupt+0xe/0x20
    [&lt;ffffffffa03283d4&gt;] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib]
    [&lt;ffffffffa03286fc&gt;] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib]
    [&lt;ffffffff8141e758&gt;] ? dev_hard_start_xmit+0x2c8/0x3f0
    [&lt;ffffffff81439d0a&gt;] ? sch_direct_xmit+0x15a/0x1c0
    [&lt;ffffffff81423098&gt;] ? dev_queue_xmit+0x388/0x4d0
    [&lt;ffffffffa032d6b7&gt;] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib]
    [&lt;ffffffffa032dab8&gt;] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib]
    [&lt;ffffffffa02a0946&gt;] ? mcast_work_handler+0x1a6/0x710 [ib_sa]
    [&lt;ffffffffa015f01e&gt;] ? ib_send_mad+0xfe/0x3c0 [ib_mad]
    [&lt;ffffffffa00f6c93&gt;] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core]
    [&lt;ffffffffa02a0f9b&gt;] ? join_handler+0xeb/0x200 [ib_sa]
    [&lt;ffffffffa029e4fc&gt;] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa]
    [&lt;ffffffffa029e79c&gt;] ? recv_handler+0x3c/0x70 [ib_sa]
    [&lt;ffffffffa01603a4&gt;] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad]
    [&lt;ffffffffa015fb60&gt;] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad]
    [&lt;ffffffff81088830&gt;] ? worker_thread+0x170/0x2a0
    [&lt;ffffffff8108e160&gt;] ? autoremove_wake_function+0x0/0x40
    [&lt;ffffffff810886c0&gt;] ? worker_thread+0x0/0x2a0
    [&lt;ffffffff8108ddf6&gt;] ? kthread+0x96/0xa0
    [&lt;ffffffff8100c1ca&gt;] ? child_rip+0xa/0x20

Coinciding with stack trace is the following message:

    ib0: ib_address_create failed

The code below in ipoib_mcast_join_finish() will note the above
failure in the address handle but otherwise continue:

                ah = ipoib_create_ah(dev, priv-&gt;pd, &amp;av);
                if (!ah) {
                        ipoib_warn(priv, "ib_address_create failed\n");
                } else {

The while loop at the bottom of ipoib_mcast_join_finish() will attempt
to send queued multicast packets in mcast-&gt;pkt_queue and eventually
end up in ipoib_mcast_send():

        if (!mcast-&gt;ah) {
                if (skb_queue_len(&amp;mcast-&gt;pkt_queue) &lt; IPOIB_MAX_MCAST_QUEUE)
                        skb_queue_tail(&amp;mcast-&gt;pkt_queue, skb);
                else {
                        ++dev-&gt;stats.tx_dropped;
                        dev_kfree_skb_any(skb);
                }

My read is that the code will requeue the packet and return to the
ipoib_mcast_join_finish() while loop and the stage is set for the
"hung" task diagnostic as the while loop never sees a non-NULL ah, and
will do nothing to resolve.

There are GFP_ATOMIC allocates in the provider routines, so this is
possible and should be dealt with.

The test that induced the failure is associated with a host SM on the
same server during a shutdown.

This patch causes ipoib_mcast_join_finish() to exit with an error
which will flush the queued mcast packets.  Nothing is done to unwind
the QP attached state so that subsequent sends from above will retry
the join.

Reviewed-by: Ram Vepa &lt;ram.vepa@qlogic.com&gt;
Reviewed-by: Gary Leshner &lt;gary.leshner@qlogic.com&gt;
Signed-off-by: Mike Marciniszyn &lt;mike.marciniszyn@qlogic.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
</entry>
<entry>
<title>IB/qib: Fix over-scheduling of QSFP work</title>
<updated>2011-11-28T20:17:33Z</updated>
<author>
<name>Mike Marciniszyn</name>
<email>mike.marciniszyn@qlogic.com</email>
</author>
<published>2011-11-09T22:07:22Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=8ee887d74b3d741991edaa1836d22636c28926d9'/>
<id>urn:sha1:8ee887d74b3d741991edaa1836d22636c28926d9</id>
<content type='text'>
Don't over-schedule QSFP work on driver initialization.  It could end
up being run simultaneously on two different CPUs resulting in bad
EEPROM reads.  In combination with setting the physical IB link state
prior to the IBC being brought out of reset, this can cause the link
state machine to start training early with wrong settings.

Signed-off-by: Mitko Haralanov &lt;mitko@qlogic.com&gt;
Signed-off-by: Mike Marciniszyn &lt;mike.marciniszyn@qlogic.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
</entry>
</feed>
