linux/net/ceph, branch v3.4.25

libceph: check for invalid mapping

2012-11-26T19:38:44Z

(cherry picked from commit d63b77f4c552cc3a20506871046ab0fcbc332609) If we encounter an invalid (e.g., zeroed) mapping, return an error and avoid a divide by zero. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman

libceph: avoid NULL kref_put when osd reset races with alloc_msg

2012-11-26T19:38:43Z

(cherry picked from commit 9bd952615a42d7e2ce3fa2c632e808e804637a1a) The ceph_on_in_msg_alloc() method drops con->mutex while it allocates a message. If that races with a timeout that resends a zillion messages and resets the connection, and the ->alloc_msg() method returns a NULL message, it will call ceph_msg_put(NULL) and BUG. Fix by only calling put if msg is non-NULL. Fixes http://tracker.newdream.net/issues/3142 Signed-off-by: Sage Weil Signed-off-by: Greg Kroah-Hartman

rbd: reset BACKOFF if unable to re-queue

2012-11-26T19:38:43Z

(cherry picked from commit 588377d6199034c36d335e7df5818b731fea072c) If ceph_fault() is unable to queue work after a delay, it sets the BACKOFF connection flag so con_work() will attempt to do so. In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't result in newly-queued work, it simply ignores this condition and proceeds as if no backoff delay were desired. There are two problems with this--one of which is a bug. The first problem is simply that the intended behavior is to back off, and if we aren't able queue the work item to run after a delay we're not doing that. The only reason queue_delayed_work() won't queue work is if the provided work item is already queued. In the messenger, this means that con_work() is already scheduled to be run again. So if we simply set the BACKOFF flag again when this occurs, we know the next con_work() call will again attempt to hold off activity on the connection until after the delay. The second problem--the bug--is a leak of a reference count. If queue_delayed_work() returns 0 in con_work(), con->ops->put() drops the connection reference held on entry to con_work(). However, processing is (was) allowed to continue, and at the end of the function a second con->ops->put() is called. This patch fixes both problems. Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman

libceph: only kunmap kmapped pages

2012-11-26T19:38:43Z

(cherry picked from commit 5ce765a540f34d1e2005e1210f49f67fdf11e997) In write_partial_msg_pages(), pages need to be kmapped in order to perform a CRC-32c calculation on them. As an artifact of the way this code used to be structured, the kunmap() call was separated from the kmap() call and both were done conditionally. But the conditions under which the kmap() and kunmap() calls were made differed, so there was a chance a kunmap() call would be done on a page that had not been mapped. The symptom of this was tripping a BUG() in kunmap_high() when pkmap_count[nr] became 0. Reported-by: Bryan K. Wright Signed-off-by: Alex Elder Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman

libceph: avoid truncation due to racing banners

2012-11-26T19:38:43Z

(cherry picked from commit 6d4221b53707486dfad3f5bfe568d2ce7f4c9863) Because the Ceph client messenger uses a non-blocking connect, it is possible for the sending of the client banner to race with the arrival of the banner sent by the peer. When ceph_sock_state_change() notices the connect has completed, it schedules work to process the socket via con_work(). During this time the peer is writing its banner, and arrival of the peer banner races with con_work(). If con_work() calls try_read() before the peer banner arrives, there is nothing for it to do, after which con_work() calls try_write() to send the client's banner. In this case Ceph's protocol negotiation can complete succesfully. The server-side messenger immediately sends its banner and addresses after accepting a connect request, *before* actually attempting to read or verify the banner from the client. As a result, it is possible for the banner from the server to arrive before con_work() calls try_read(). If that happens, try_read() will read the banner and prepare protocol negotiation info via prepare_write_connect(). prepare_write_connect() calls con_out_kvec_reset(), which discards the as-yet-unsent client banner. Next, con_work() calls try_write(), which sends the protocol negotiation info rather than the banner that the peer is expecting. The result is that the peer sees an invalid banner, and the client reports "negotiation failed". Fix this by moving con_out_kvec_reset() out of prepare_write_connect() to its callers at all locations except the one where the banner might still need to be sent. [elder@inktak.com: added note about server-side behavior] Signed-off-by: Jim Schutt Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman

libceph: delay debugfs initialization until we learn global_id

2012-11-26T19:38:43Z

(cherry picked from commit d1c338a509cea5378df59629ad47382810c38623) The debugfs directory includes the cluster fsid and our unique global_id. We need to delay the initialization of the debug entry until we have learned both the fsid and our global_id from the monitor or else the second client can't create its debugfs entry and will fail (and multiple client instances aren't properly reflected in debugfs). Reported by: Yan, Zheng Signed-off-by: Sage Weil Reviewed-by: Yehuda Sadeh Signed-off-by: Greg Kroah-Hartman

libceph: fix crypto key null deref, memory leak

2012-11-26T19:38:42Z

(cherry picked from commit f0666b1ac875ff32fe290219b150ec62eebbe10e) Avoid crashing if the crypto key payload was NULL, as when it was not correctly allocated and initialized. Also, avoid leaking it. Signed-off-by: Sylvain Munaut Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman

libceph: recheck con state after allocating incoming message

2012-11-26T19:38:42Z

(cherry picked from commit 6139919133377652992a5fe134e22abce3e9c25e) We drop the lock when calling the ->alloc_msg() con op, which means we need to (a) not clobber con->in_msg without the mutex held, and (b) we need to verify that we are still in the OPEN state when we retake it to avoid causing any mayhem. If the state does change, -EAGAIN will get us back to con_work() and loop. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman

libceph: change ceph_con_in_msg_alloc convention to be less weird

2012-11-26T19:38:42Z

(cherry picked from commit 4740a623d20c51d167da7f752b63e2b8714b2543) This function's calling convention is very limiting. In particular, we can't return any error other than ENOMEM (and only implicitly), which is a problem (see next patch). Instead, return an normal 0 or error code, and make the skip a pointer output parameter. Drop the useless in_hdr argument (we have the con pointer). Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman

libceph: avoid dropping con mutex before fault

2012-11-26T19:38:42Z

(cherry picked from commit 8636ea672f0c5ab7478c42c5b6705ebd1db7eb6a) The ceph_fault() function takes the con mutex, so we should avoid dropping it before calling it. This fixes a potential race with another thread calling ceph_con_close(), or _open(), or similar (we don't reverify con->state after retaking the lock). Add annotation so that lockdep realizes we will drop the mutex before returning. Signed-off-by: Sage Weil Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman