aboutsummaryrefslogtreecommitdiff
path: root/net/rds/ib.h
AgeCommit message (Collapse)Author
2012-05-29rds_rdma: don't assume infiniband device is PCIThadeu Lima de Souza Cascardo
RDS code assumes that the struct ib_device dma_device member, which is a pointer, points to a struct device embedded in a struct pci_dev. This is not the case for ehca, for example, which is a OF driver, and makes dma_device point to a struct device embedded in a struct platform_device. This will make the system crash when rds_rdma is loaded in a system with ehca, since it will try to access the bus member of a non-existent struct pci_dev. The only reason rds_rdma uses the struct pci_dev is to get the NUMA node the device is attached to. Using dev_to_node for that is much better, since it won't assume which bus the infiniband is attached to. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Cc: dledford@redhat.com Cc: Jes.Sorensen@redhat.com Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Acked-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-06net: remove interrupt.h inclusion from netdevice.hAlexey Dobriyan
* remove interrupt.g inclusion from netdevice.h -- not needed * fixup fallout, add interrupt.h and hardirq.h back where needed. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-01rds/ib: use system_wq instead of rds_ib_fmr_wqTejun Heo
With cmwq, there's no reason to use dedicated rds_ib_fmr_wq - it's not in the memory reclaim path and the maximum number of concurrent work items is bound by the number of devices. Drop it and use system_wq instead. This rds_ib_fmr_init/exit() noops. Both removed. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andy Grover <andy.grover@oracle.com>
2010-10-21rds: make local functions/variables staticstephen hemminger
The RDS protocol has lots of functions that should be declared static. rds_message_get/add_version_extension is removed since it defined but never used. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08RDS/IB: print string constants in more placesZach Brown
This prints the constant identifier for work completion status and rdma cm event types, like we already do for IB event types. A core string array helper is added that each string type uses. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: protect the list of IB devicesZach Brown
The RDS IB device list wasn't protected by any locking. Traversal in both the get_mr and FMR flushing paths could race with additon and removal. List manipulation is done with RCU primatives and is protected by the write side of a rwsem. The list traversal in the get_mr fast path is protected by a rcu read critical section. The FMR list traversal is more problematic because it can block while traversing the list. We protect this with the read side of the rwsem. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: track signaled sendsZach Brown
We're seeing bugs today where IB connection shutdown clears the send ring while the tasklet is processing completed sends. Implementation details cause this to dereference a null pointer. Shutdown needs to wait for send completion to stop before tearing down the connection. We can't simply wait for the ring to empty because it may contain unsignaled sends that will never be processed. This patch tracks the number of signaled sends that we've posted and waits for them to complete. It also makes sure that the tasklet has finished executing. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS: remove __init and __exit annotationZach Brown
The trivial amount of memory saved isn't worth the cost of dealing with section mismatches. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: create a work queue for FMR flushingZach Brown
This patch moves the FMR flushing work in to its own mult-threaded work queue. This is to maintain performance in preparation for returning the main krdsd work queue back to a single threaded work queue to avoid deep-rooted concurrency bugs. This is also good because it further separates FMRs, which might be removed some day, from the rest of the code base. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: destroy connections on rmmodZach Brown
IB connections were not being destroyed during rmmod. First, recently IB device removal callback was changed to disconnect connections that used the removing device rather than destroying them. So connections with devices during rmmod were not being destroyed. Second, rds_ib_destroy_nodev_conns() was being called before connections are disassociated with devices. It would almost never find connections in the nodev list. We first get rid of rds_ib_destroy_conns(), which is no longer called, and refactor the existing caller into the main body of the function and get rid of the list and lock wrappers. Then we call rds_ib_destroy_nodev_conns() *after* ib_unregister_client() has removed the IB device from all the conns and put the conns on the nodev list. The result is that IB connections are destroyed by rmmod. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: Make ib_recv_refill return voidAndy Grover
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08rds: more FMRs are fasterChris Mason
When we add more FMRs, we flush them less often and so we go faster. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-09-08RDS/IB: Add caching of frags and incsChris Mason
This patch is based heavily on an initial patch by Chris Mason. Instead of freeing slab memory and pages, it keeps them, and funnels them back to be reused. The lock minimization strategy uses xchg and cmpxchg atomic ops for manipulation of pointers to list heads. We anchor the lists with a pointer to a list_head struct instead of a static list_head struct. We just have to carefully use the existing primitives with the difference between a pointer and a static head struct. For example, 'list_empty()' means that our anchor pointer points to a list with a single item instead of meaning that our static head element doesn't point to any list items. Original patch by Chris, with significant mods and fixes by Andy and Zach. Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS: Use page_remainder_alloc() for recv bufsAndy Grover
Instead of splitting up a page into RDS_FRAG_SIZE chunks ourselves, ask rds_page_remainder_alloc() to do it. While it is possible PAGE_SIZE > FRAG_SIZE, on x86en it isn't, so having duplicate "carve up a page into buffers" code seems excessive. The other modification this spawns is the use of a single struct scatterlist in rds_page_frag instead of a bare page ptr. This causes verbosity to increase in some places, and decrease in others. Finally, I decided to unify the lifetimes and alloc/free of rds_page_frag and its page. This is a nice simplification in itself, but will be extra-nice once we come to adding cmason's recycling patch. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS/IB: add refcount tracking to struct rds_ib_deviceZach Brown
The RDS IB client .remove callback used to free the rds_ibdev for the given device unconditionally. This could race other users of the struct. This patch adds refcounting so that we only free the rds_ibdev once all of its users are done. Many rds_ibdev users are tied to connections. We give the connection a reference and change these users to reference the device in the connection instead of looking it up in the IB client data. The only user of the IB client data remaining is the first lookup of the device as connections are built up. Incrementing the reference count of a device found in the IB client data could race with final freeing so we use an RCU grace period to make sure that freeing won't happen until those lookups are done. MRs need the rds_ibdev to get at the pool that they're freed in to. They exist outside a connection and many MRs can reference different devices from one socket, so it was natural to have each MR hold a reference. MR refs can be dropped from interrupt handlers and final device teardown can block so we push it off to a work struct. Pool teardown had to be fixed to cancel its pending work instead of deadlocking waiting for all queued work, including itself, to finish. MRs get their reference from the global device list, which gets a reference. It is left unprotected by locks and remains racy. A simple global lock would be a significant bottleneck. More scalable (complicated) locking should be done carefully in a later patch. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: add _to_node() macros for numa and use {k,v}malloc_node()Andy Grover
Allocate send/recv rings in memory that is node-local to the HCA. This significantly helps performance. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Move atomic stats from general to ib-specific areaAndy Grover
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Refill recv ring directly from taskletAndy Grover
Performance is better if we use allocations that don't block to refill the receive ring. Since the whole reason we were kicking out to the worker thread was so we could do blocking allocs, we no longer need to do this. Remove gfp params from rds_ib_recv_refill(); we always use GFP_NOWAIT. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Perform unmapping ops in stagesAndy Grover
Previously, RDS would wait until the final send WR had completed and then handle cleanup. With silent ops, we do not know if an atomic, rdma, or data op will be last. This patch handles any of these cases by keeping a pointer to the last op in the message in m_last_op. When the TX completion event fires, rds dispatches to per-op-type cleanup functions, and then does whole-message cleanup, if the last op equalled m_last_op. This patch also moves towards having op-specific functions take the op struct, instead of the overall rm struct. rds_ib_connection has a pointer to keep track of a a partially- completed data send operation. This patch changes it from an rds_message pointer to the narrower rm_data_op pointer, and modifies places that use this pointer as needed. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Remove struct rds_rdma_opAndy Grover
A big changeset, but it's all pretty dumb. struct rds_rdma_op was already embedded in struct rm_rdma_op. Remove rds_rdma_op and put its members in rm_rdma_op. Rename members with "op_" prefix instead of "r_", for consistency. Of course this breaks a lot, so fixup the code accordingly. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Implement silent atomicsAndy Grover
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Remove unsignaled_bytes sysctlAndy Grover
Removed unsignaled_bytes sysctl and code to signal based on it. I believe unsignaled_wrs is more than sufficient for our purposes. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS/IB: Remove ib_[header/data]_sge() functionsAndy Grover
These functions were to cope with differently ordered sg entries depending on RDS 3.0 or 3.1+. Now that we've dropped 3.0 compatibility we no longer need them. Also, modify usage sites for these to refer to sge[0] or [1] directly. Reorder code to initialize header sgs first. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: inc_purge() transport function unused - remove itAndy Grover
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Base init_depth and responder_resources on hw valuesAndy Grover
Instead of using a constant for initiator_depth and responder_resources, read the per-QP values when the device is enumerated, and then use these values when creating the connection. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Implement atomic operationsAndy Grover
Implement a CMSG-based interface to do FADD and CSWP ops. Alter send routines to handle atomic ops. Add atomic counters to stats. Add xmit_atomic() to struct rds_transport Inline rds_ib_send_unmap_rdma into unmap_rm Signed-off-by: Andy Grover <andy.grover@oracle.com>
2009-10-30RDS/IB+IW: Move recv processing to a taskletAndy Grover
Move receive processing from event handler to a tasklet. This should help prevent hangcheck timer from going off when RDS is under heavy load. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-20RDS/IB: Always use PAGE_SIZE for FMR page sizeAndy Grover
While FMRs allow significant flexibility in what size of pages they can use, we really just want FMR pages to match CPU page size. Roland says we can count on this always being supported, so this simplifies things. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-20RDS/IB: Handle connections using RDS 3.0 wire protocolAndy Grover
The big differences between RDS 3.0 and 3.1 are protocol-level flow control, and with 3.1 the header is in front of the data. The header always ends up in the header buffer, and the data goes in the data page. In 3.0 our "header" is a trailer, and will end up either in the data page, the header buffer, or split across the two. Since 3.1 is backwards- compatible with 3.0, we need to continue to support these cases. This patch does that -- if using RDS 3.0 wire protocol, it will copy the header from wherever it ended up into the header buffer. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-20RDS: Set retry_count to 2 and make modifiable via modparamAndy Grover
This will be default cause IB connections to failover faster, but allow a longer retry count to be used if desired. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-09RDS/IW+IB: Allow max credit advertise window.Steve Wise
Fix hack that restricts the credit advertisement to 127. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-02RDS: Use spinlock to protect 64b value update on 32b archsAndy Grover
We have a 64bit value that needs to be set atomically. This is easy and quick on all 64bit archs, and can also be done on x86/32 with set_64bit() (uses cmpxchg8b). However other 32b archs don't have this. I actually changed this to the current state in preparation for mainline because the old way (using a spinlock on 32b) resulted in unsightly #ifdefs in the code. But obviously, being correct takes precedence. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-02RDS: Rewrite connection cleanup, fixing oops on rmmodAndy Grover
This fixes a bug where a connection was unexpectedly not on *any* list while being destroyed. It also cleans up some code duplication and regularizes some function names. * Grab appropriate lock in conn_free() and explain in comment * Ensure via locking that a conn is never not on either a dev's list or the nodev list * Add rds_xx_remove_conn() to match rds_xx_add_conn() * Make rds_xx_add_conn() return void * Rename remove_{,nodev_}conns() to destroy_{,nodev_}conns() and unify their implementation in a helper function * Document lock ordering as nodev conn_lock before dev_conn_lock Reported-by: Yosef Etigin <yosefe@voltaire.com> Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26RDS/IB: Infiniband transportAndy Grover
Registers as an RDS transport and an IB client, and uses IB CM API to allocate ids, queue pairs, and the rest of that fun stuff. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>