Commit 26d2177e authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull inifiniband/rdma updates from Doug Ledford:
 "This is a fairly sizeable set of changes.  I've put them through a
  decent amount of testing prior to sending the pull request due to
  that.

  There are still a few fixups that I know are coming, but I wanted to
  go ahead and get the big, sizable chunk into your hands sooner rather
  than waiting for those last few fixups.

  Of note is the fact that this creates what is intended to be a
  temporary area in the drivers/staging tree specifically for some
  cleanups and additions that are coming for the RDMA stack.  We
  deprecated two drivers (ipath and amso1100) and are waiting to hear
  back if we can deprecate another one (ehca).  We also put Intel's new
  hfi1 driver into this area because it needs to be refactored and a
  transfer library created out of the factored out code, and then it and
  the qib driver and the soft-roce driver should all be modified to use
  that library.

  I expect drivers/staging/rdma to be around for three or four kernel
  releases and then to go away as all of the work is completed and final
  deletions of deprecated drivers are done.

  Summary of changes for 4.3:

   - Create drivers/staging/rdma
   - Move amso1100 driver to staging/rdma and schedule for deletion
   - Move ipath driver to staging/rdma and schedule for deletion
   - Add hfi1 driver to staging/rdma and set TODO for move to regular
     tree
   - Initial support for namespaces to be used on RDMA devices
   - Add RoCE GID table handling to the RDMA core caching code
   - Infrastructure to support handling of devices with differing read
     and write scatter gather capabilities
   - Various iSER updates
   - Kill off unsafe usage of global mr registrations
   - Update SRP driver
   - Misc  mlx4 driver updates
   - Support for the mr_alloc verb
   - Support for a netlink interface between kernel and user space cache
     daemon to speed path record queries and route resolution
   - Ininitial support for safe hot removal of verbs devices"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits)
  IB/ipoib: Suppress warning for send only join failures
  IB/ipoib: Clean up send-only multicast joins
  IB/srp: Fix possible protection fault
  IB/core: Move SM class defines from ib_mad.h to ib_smi.h
  IB/core: Remove unnecessary defines from ib_mad.h
  IB/hfi1: Add PSM2 user space header to header_install
  IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY
  mlx5: Fix incorrect wc pkey_index assignment for GSI messages
  IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow
  IB/uverbs: reject invalid or unknown opcodes
  IB/cxgb4: Fix if statement in pick_local_ip6adddrs
  IB/sa: Fix rdma netlink message flags
  IB/ucma: HW Device hot-removal support
  IB/mlx4_ib: Disassociate support
  IB/uverbs: Enable device removal when there are active user space applications
  IB/uverbs: Explicitly pass ib_dev to uverbs commands
  IB/uverbs: Fix race between ib_uverbs_open and remove_one
  IB/uverbs: Fix reference counting usage of event files
  IB/core: Make ib_dealloc_pd return void
  IB/srp: Create an insecure all physical rkey only if needed
  ...
parents a794b4f3 d1178cbc
......@@ -64,3 +64,23 @@ MTHCA
fw_ver - Firmware version
hca_type - HCA type: "MT23108", "MT25208 (MT23108 compat mode)",
or "MT25208"
HFI1
The hfi1 driver also creates these additional files:
hw_rev - hardware revision
board_id - manufacturing board id
tempsense - thermal sense information
serial - board serial number
nfreectxts - number of free user contexts
nctxts - number of allowed contexts (PSM2)
chip_reset - diagnostic (root only)
boardversion - board version
ports/1/
CMgtA/
cc_settings_bin - CCA tables used by PSM2
cc_table_bin
sc2v/ - 32 files (0 - 31) used to translate sl->vl
sl2sc/ - 32 files (0 - 31) used to translate sl->sc
vl2mtu/ - 16 (0 - 15) files used to determine MTU for vl
......@@ -5341,6 +5341,7 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git
S: Supported
F: Documentation/infiniband/
F: drivers/infiniband/
F: drivers/staging/rdma/
F: include/uapi/linux/if_infiniband.h
F: include/uapi/rdma/
F: include/rdma/
......@@ -5598,7 +5599,7 @@ IPATH DRIVER
M: Mike Marciniszyn <infinipath@intel.com>
L: linux-rdma@vger.kernel.org
S: Maintained
F: drivers/infiniband/hw/ipath/
F: drivers/staging/rdma/ipath/
IPMI SUBSYSTEM
M: Corey Minyard <minyard@acm.org>
......@@ -9976,6 +9977,12 @@ M: Arnaud Patard <arnaud.patard@rtp-net.org>
S: Odd Fixes
F: drivers/staging/xgifb/
HFI1 DRIVER
M: Mike Marciniszyn <infinipath@intel.com>
L: linux-rdma@vger.kernel.org
S: Supported
F: drivers/staging/rdma/hfi1
STARFIRE/DURALAN NETWORK DRIVER
M: Ion Badulescu <ionut@badula.org>
S: Odd Fixes
......
......@@ -55,10 +55,8 @@ config INFINIBAND_ADDR_TRANS
default y
source "drivers/infiniband/hw/mthca/Kconfig"
source "drivers/infiniband/hw/ipath/Kconfig"
source "drivers/infiniband/hw/qib/Kconfig"
source "drivers/infiniband/hw/ehca/Kconfig"
source "drivers/infiniband/hw/amso1100/Kconfig"
source "drivers/infiniband/hw/cxgb3/Kconfig"
source "drivers/infiniband/hw/cxgb4/Kconfig"
source "drivers/infiniband/hw/mlx4/Kconfig"
......
......@@ -9,7 +9,8 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \
$(user_access-y)
ib_core-y := packer.o ud_header.o verbs.o sysfs.o \
device.o fmr_pool.o cache.o netlink.o
device.o fmr_pool.o cache.o netlink.o \
roce_gid_mgmt.o
ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -43,12 +43,58 @@ int ib_device_register_sysfs(struct ib_device *device,
u8, struct kobject *));
void ib_device_unregister_sysfs(struct ib_device *device);
int ib_sysfs_setup(void);
void ib_sysfs_cleanup(void);
int ib_cache_setup(void);
void ib_cache_setup(void);
void ib_cache_cleanup(void);
int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
struct ib_qp_attr *qp_attr, int *qp_attr_mask);
typedef void (*roce_netdev_callback)(struct ib_device *device, u8 port,
struct net_device *idev, void *cookie);
typedef int (*roce_netdev_filter)(struct ib_device *device, u8 port,
struct net_device *idev, void *cookie);
void ib_enum_roce_netdev(struct ib_device *ib_dev,
roce_netdev_filter filter,
void *filter_cookie,
roce_netdev_callback cb,
void *cookie);
void ib_enum_all_roce_netdevs(roce_netdev_filter filter,
void *filter_cookie,
roce_netdev_callback cb,
void *cookie);
int ib_cache_gid_find_by_port(struct ib_device *ib_dev,
const union ib_gid *gid,
u8 port, struct net_device *ndev,
u16 *index);
enum ib_cache_gid_default_mode {
IB_CACHE_GID_DEFAULT_MODE_SET,
IB_CACHE_GID_DEFAULT_MODE_DELETE
};
void ib_cache_gid_set_default_gid(struct ib_device *ib_dev, u8 port,
struct net_device *ndev,
enum ib_cache_gid_default_mode mode);
int ib_cache_gid_add(struct ib_device *ib_dev, u8 port,
union ib_gid *gid, struct ib_gid_attr *attr);
int ib_cache_gid_del(struct ib_device *ib_dev, u8 port,
union ib_gid *gid, struct ib_gid_attr *attr);
int ib_cache_gid_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
struct net_device *ndev);
int roce_gid_mgmt_init(void);
void roce_gid_mgmt_cleanup(void);
int roce_rescan_device(struct ib_device *ib_dev);
int ib_cache_setup_one(struct ib_device *device);
void ib_cache_cleanup_one(struct ib_device *device);
void ib_cache_release_one(struct ib_device *device);
#endif /* _CORE_PRIV_H */
This diff is collapsed.
......@@ -338,13 +338,6 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
goto error1;
}
mad_agent_priv->agent.mr = ib_get_dma_mr(port_priv->qp_info[qpn].qp->pd,
IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(mad_agent_priv->agent.mr)) {
ret = ERR_PTR(-ENOMEM);
goto error2;
}
if (mad_reg_req) {
reg_req = kmemdup(mad_reg_req, sizeof *reg_req, GFP_KERNEL);
if (!reg_req) {
......@@ -429,8 +422,6 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
spin_unlock_irqrestore(&port_priv->reg_lock, flags);
kfree(reg_req);
error3:
ib_dereg_mr(mad_agent_priv->agent.mr);
error2:
kfree(mad_agent_priv);
error1:
return ret;
......@@ -590,7 +581,6 @@ static void unregister_mad_agent(struct ib_mad_agent_private *mad_agent_priv)
wait_for_completion(&mad_agent_priv->comp);
kfree(mad_agent_priv->reg_req);
ib_dereg_mr(mad_agent_priv->agent.mr);
kfree(mad_agent_priv);
}
......@@ -1038,7 +1028,7 @@ struct ib_mad_send_buf * ib_create_send_mad(struct ib_mad_agent *mad_agent,
mad_send_wr->mad_agent_priv = mad_agent_priv;
mad_send_wr->sg_list[0].length = hdr_len;
mad_send_wr->sg_list[0].lkey = mad_agent->mr->lkey;
mad_send_wr->sg_list[0].lkey = mad_agent->qp->pd->local_dma_lkey;
/* OPA MADs don't have to be the full 2048 bytes */
if (opa && base_version == OPA_MGMT_BASE_VERSION &&
......@@ -1047,7 +1037,7 @@ struct ib_mad_send_buf * ib_create_send_mad(struct ib_mad_agent *mad_agent,
else
mad_send_wr->sg_list[1].length = mad_size - hdr_len;
mad_send_wr->sg_list[1].lkey = mad_agent->mr->lkey;
mad_send_wr->sg_list[1].lkey = mad_agent->qp->pd->local_dma_lkey;
mad_send_wr->send_wr.wr_id = (unsigned long) mad_send_wr;
mad_send_wr->send_wr.sg_list = mad_send_wr->sg_list;
......@@ -2885,7 +2875,7 @@ static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
struct ib_mad_queue *recv_queue = &qp_info->recv_queue;
/* Initialize common scatter list fields */
sg_list.lkey = (*qp_info->port_priv->mr).lkey;
sg_list.lkey = qp_info->port_priv->pd->local_dma_lkey;
/* Initialize common receive WR fields */
recv_wr.next = NULL;
......@@ -3201,13 +3191,6 @@ static int ib_mad_port_open(struct ib_device *device,
goto error4;
}
port_priv->mr = ib_get_dma_mr(port_priv->pd, IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(port_priv->mr)) {
dev_err(&device->dev, "Couldn't get ib_mad DMA MR\n");
ret = PTR_ERR(port_priv->mr);
goto error5;
}
if (has_smi) {
ret = create_mad_qp(&port_priv->qp_info[0], IB_QPT_SMI);
if (ret)
......@@ -3248,8 +3231,6 @@ static int ib_mad_port_open(struct ib_device *device,
error7:
destroy_mad_qp(&port_priv->qp_info[0]);
error6:
ib_dereg_mr(port_priv->mr);
error5:
ib_dealloc_pd(port_priv->pd);
error4:
ib_destroy_cq(port_priv->cq);
......@@ -3284,7 +3265,6 @@ static int ib_mad_port_close(struct ib_device *device, int port_num)
destroy_workqueue(port_priv->wq);
destroy_mad_qp(&port_priv->qp_info[1]);
destroy_mad_qp(&port_priv->qp_info[0]);
ib_dereg_mr(port_priv->mr);
ib_dealloc_pd(port_priv->pd);
ib_destroy_cq(port_priv->cq);
cleanup_recv_queue(&port_priv->qp_info[1]);
......@@ -3335,7 +3315,7 @@ static void ib_mad_init_device(struct ib_device *device)
}
}
static void ib_mad_remove_device(struct ib_device *device)
static void ib_mad_remove_device(struct ib_device *device, void *client_data)
{
int i;
......
......@@ -199,7 +199,6 @@ struct ib_mad_port_private {
int port_num;
struct ib_cq *cq;
struct ib_pd *pd;
struct ib_mr *mr;
spinlock_t reg_lock;
struct ib_mad_mgmt_version_table version[MAX_MGMT_VERSION];
......
......@@ -43,7 +43,7 @@
#include "sa.h"
static void mcast_add_one(struct ib_device *device);
static void mcast_remove_one(struct ib_device *device);
static void mcast_remove_one(struct ib_device *device, void *client_data);
static struct ib_client mcast_client = {
.name = "ib_multicast",
......@@ -840,13 +840,12 @@ static void mcast_add_one(struct ib_device *device)
ib_register_event_handler(&dev->event_handler);
}
static void mcast_remove_one(struct ib_device *device)
static void mcast_remove_one(struct ib_device *device, void *client_data)
{
struct mcast_device *dev;
struct mcast_device *dev = client_data;
struct mcast_port *port;
int i;
dev = ib_get_client_data(device, &mcast_client);
if (!dev)
return;
......
......@@ -49,6 +49,14 @@ static DEFINE_MUTEX(ibnl_mutex);
static struct sock *nls;
static LIST_HEAD(client_list);
int ibnl_chk_listeners(unsigned int group)
{
if (netlink_has_listeners(nls, group) == 0)
return -1;
return 0;
}
EXPORT_SYMBOL(ibnl_chk_listeners);
int ibnl_add_client(int index, int nops,
const struct ibnl_client_cbs cb_table[])
{
......@@ -151,6 +159,23 @@ static int ibnl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
!client->cb_table[op].dump)
return -EINVAL;
/*
* For response or local service set_timeout request,
* there is no need to use netlink_dump_start.
*/
if (!(nlh->nlmsg_flags & NLM_F_REQUEST) ||
(index == RDMA_NL_LS &&
op == RDMA_NL_LS_OP_SET_TIMEOUT)) {
struct netlink_callback cb = {
.skb = skb,
.nlh = nlh,
.dump = client->cb_table[op].dump,
.module = client->cb_table[op].module,
};
return cb.dump(skb, &cb);
}
{
struct netlink_dump_control c = {
.dump = client->cb_table[op].dump,
......@@ -165,9 +190,39 @@ static int ibnl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
return -EINVAL;
}
static void ibnl_rcv_reply_skb(struct sk_buff *skb)
{
struct nlmsghdr *nlh;
int msglen;
/*
* Process responses until there is no more message or the first
* request. Generally speaking, it is not recommended to mix responses
* with requests.
*/
while (skb->len >= nlmsg_total_size(0)) {
nlh = nlmsg_hdr(skb);
if (nlh->nlmsg_len < NLMSG_HDRLEN || skb->len < nlh->nlmsg_len)
return;
/* Handle response only */
if (nlh->nlmsg_flags & NLM_F_REQUEST)
return;
ibnl_rcv_msg(skb, nlh);
msglen = NLMSG_ALIGN(nlh->nlmsg_len);
if (msglen > skb->len)
msglen = skb->len;
skb_pull(skb, msglen);
}
}
static void ibnl_rcv(struct sk_buff *skb)
{
mutex_lock(&ibnl_mutex);
ibnl_rcv_reply_skb(skb);
netlink_rcv_skb(skb, &ibnl_rcv_msg);
mutex_unlock(&ibnl_mutex);
}
......
This diff is collapsed.
This diff is collapsed.
......@@ -457,29 +457,6 @@ static struct kobj_type port_type = {
.default_attrs = port_default_attrs
};
static void ib_device_release(struct device *device)
{
struct ib_device *dev = container_of(device, struct ib_device, dev);
kfree(dev->port_immutable);
kfree(dev);
}
static int ib_device_uevent(struct device *device,
struct kobj_uevent_env *env)
{
struct ib_device *dev = container_of(device, struct ib_device, dev);
if (add_uevent_var(env, "NAME=%s", dev->name))
return -ENOMEM;
/*
* It would be nice to pass the node GUID with the event...
*/
return 0;
}
static struct attribute **
alloc_group_attrs(ssize_t (*show)(struct ib_port *,
struct port_attribute *, char *buf),
......@@ -702,12 +679,6 @@ static struct device_attribute *ib_class_attributes[] = {
&dev_attr_node_desc
};
static struct class ib_class = {
.name = "infiniband",
.dev_release = ib_device_release,
.dev_uevent = ib_device_uevent,
};
/* Show a given an attribute in the statistics group */
static ssize_t show_protocol_stat(const struct device *device,
struct device_attribute *attr, char *buf,
......@@ -846,14 +817,12 @@ int ib_device_register_sysfs(struct ib_device *device,
int ret;
int i;
class_dev->class = &ib_class;
class_dev->parent = device->dma_device;
dev_set_name(class_dev, "%s", device->name);
dev_set_drvdata(class_dev, device);
INIT_LIST_HEAD(&device->port_list);
device->dev.parent = device->dma_device;
ret = dev_set_name(class_dev, "%s", device->name);
if (ret)
return ret;
ret = device_register(class_dev);
ret = device_add(class_dev);
if (ret)
goto err;
......@@ -916,13 +885,3 @@ void ib_device_unregister_sysfs(struct ib_device *device)
device_unregister(&device->dev);
}
int ib_sysfs_setup(void)
{
return class_register(&ib_class);
}
void ib_sysfs_cleanup(void)
{
class_unregister(&ib_class);
}
......@@ -109,7 +109,7 @@ enum {
#define IB_UCM_BASE_DEV MKDEV(IB_UCM_MAJOR, IB_UCM_BASE_MINOR)
static void ib_ucm_add_one(struct ib_device *device);
static void ib_ucm_remove_one(struct ib_device *device);
static void ib_ucm_remove_one(struct ib_device *device, void *client_data);
static struct ib_client ucm_client = {
.name = "ucm",
......@@ -658,8 +658,7 @@ static ssize_t ib_ucm_listen(struct ib_ucm_file *file,
if (result)
goto out;
result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask,
NULL);
result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask);
out:
ib_ucm_ctx_put(ctx);
return result;
......@@ -1310,9 +1309,9 @@ static void ib_ucm_add_one(struct ib_device *device)
return;
}
static void ib_ucm_remove_one(struct ib_device *device)
static void ib_ucm_remove_one(struct ib_device *device, void *client_data)
{
struct ib_ucm_device *ucm_dev = ib_get_client_data(device, &ucm_client);
struct ib_ucm_device *ucm_dev = client_data;
if (!ucm_dev)
return;
......
......@@ -74,6 +74,7 @@ struct ucma_file {
struct list_head ctx_list;
struct list_head event_list;
wait_queue_head_t poll_wait;
struct workqueue_struct *close_wq;
};
struct ucma_context {
......@@ -89,6 +90,13 @@ struct ucma_context {
struct list_head list;
struct list_head mc_list;
/* mark that device is in process of destroying the internal HW
* resources, protected by the global mut
*/
int closing;
/* sync between removal event and id destroy, protected by file mut */
int destroying;
struct work_struct close_work;
};
struct ucma_multicast {
......@@ -107,6 +115,7 @@ struct ucma_event {
struct list_head list;
struct rdma_cm_id *cm_id;
struct rdma_ucm_event_resp resp;
struct work_struct close_work;
};
static DEFINE_MUTEX(mut);
......@@ -132,8 +141,12 @@ static struct ucma_context *ucma_get_ctx(struct ucma_file *file, int id)
mutex_lock(&mut);
ctx = _ucma_find_context(id, file);
if (!IS_ERR(ctx))
atomic_inc(&ctx->ref);
if (!IS_ERR(ctx)) {
if (ctx->closing)
ctx = ERR_PTR(-EIO);
else
atomic_inc(&ctx->ref);
}
mutex_unlock(&mut);
return ctx;
}
......@@ -144,6 +157,28 @@ static void ucma_put_ctx(struct ucma_context *ctx)
complete(&ctx->comp);
}
static void ucma_close_event_id(struct work_struct *work)
{
struct ucma_event *uevent_close = container_of(work, struct ucma_event, close_work);
rdma_destroy_id(uevent_close->cm_id);
kfree(uevent_close);
}
static void ucma_close_id(struct work_struct *work)
{
struct ucma_context *ctx = container_of(work, struct ucma_context, close_work);
/* once all inflight tasks are finished, we close all underlying
* resources. The context is still alive till its explicit destryoing
* by its creator.
*/
ucma_put_ctx(ctx);
wait_for_completion(&ctx->comp);
/* No new events will be generated after destroying the id. */
rdma_destroy_id(ctx->cm_id);
}
static struct ucma_context *ucma_alloc_ctx(struct ucma_file *file)
{
struct ucma_context *ctx;
......@@ -152,6 +187,7 @@ static struct ucma_context *ucma_alloc_ctx(struct ucma_file *file)
if (!ctx)
return NULL;
INIT_WORK(&ctx->close_work, ucma_close_id);
atomic_set(&ctx->ref, 1);
init_completion(&ctx->comp);
INIT_LIST_HEAD(&ctx->mc_list);
......@@ -242,6 +278,44 @@ static void ucma_set_event_context(struct ucma_context *ctx,
}
}
/* Called with file->mut locked for the relevant context. */
static void ucma_removal_event_handler(struct rdma_cm_id *cm_id)
{
struct ucma_context *ctx = cm_id->context;
struct ucma_event *con_req_eve;
int event_found = 0;
if (ctx->destroying)
return;
/* only if context is pointing to cm_id that it owns it and can be
* queued to be closed, otherwise that cm_id is an inflight one that
* is part of that context event list pending to be detached and
* reattached to its new context as part of ucma_get_event,
* handled separately below.
*/
if (ctx->cm_id == cm_id) {
mutex_lock(&mut);
ctx->closing = 1;
mutex_unlock(&mut);
queue_work(ctx->file->close_wq, &ctx->close_work);
return;
}
list_for_each_entry(con_req_eve, &ctx->file->event_list, list) {
if (con_req_eve->cm_id == cm_id &&
con_req_eve->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST) {
list_del(&con_req_eve->list);
INIT_WORK(&con_req_eve->close_work, ucma_close_event_id);
queue_work(ctx->file->close_wq, &con_req_eve->close_work);
event_found = 1;
break;
}
}
if (!event_found)
printk(KERN_ERR "ucma_removal_event_handler: warning: connect request event wasn't found\n");
}
static int ucma_event_handler(struct rdma_cm_id *cm_id,
struct rdma_cm_event *event)
{
......@@ -276,14 +350,21 @@ static int ucma_event_handler(struct rdma_cm_id *cm_id,
* We ignore events for new connections until userspace has set
* their context. This can only happen if an error occurs on a
* new connection before the user accepts it. This is okay,
* since the accept will just fail later.
* since the accept will just fail later. However, we do need
* to release the underlying HW resources in case of a device
* removal event.
*/
if (event->event == RDMA_CM_EVENT_DEVICE_REMOVAL)
ucma_removal_event_handler(cm_id);
kfree(uevent);
goto out;
}
list_add_tail(&uevent->list, &ctx->file->event_list);
wake_up_interruptible(&ctx->file->poll_wait);
if (event->event == RDMA_CM_EVENT_DEVICE_REMOVAL)
ucma_removal_event_handler(cm_id);
out:
mutex_unlock(&ctx->file->mut);
return ret;
......@@ -442,9 +523,15 @@ static void ucma_cleanup_mc_events(struct ucma_multicast *mc)
}
/*
* We cannot hold file->mut when calling rdma_destroy_id() or we can
* deadlock. We also acquire file->mut in ucma_event_handler(), and
* rdma_destroy_id() will wait until all callbacks have completed.
* ucma_free_ctx is called after the underlying rdma CM-ID is destroyed. At
* this point, no new events will be reported from the hardware. However, we
* still need to cleanup the UCMA context for this ID. Specifically, there
* might be events that have not yet been consumed by the user space software.
* These might include pending connect requests which we have not completed
* processing. We cannot call rdma_destroy_id while holding the lock of the
* context (file->mut), as it might cause a deadlock. We therefore extract all
* relevant events from the context pending events list while holding the
* mutex. After that we release them as needed.
*/
static int ucma_free_ctx(struct ucma_context *ctx)
{
......@@ -452,8 +539,6 @@ static int ucma_free_ctx(struct ucma_context *ctx)
struct ucma_event *uevent, *tmp;
LIST_HEAD(list);
/* No new events will be generated after destroying the id. */
rdma_destroy_id(ctx->cm_id);
ucma_cleanup_multicast(ctx);
......@@ -501,10 +586,24 @@ static ssize_t ucma_destroy_id(struct ucma_file *file, const char __user *inbuf,
if (IS_ERR(ctx))
return PTR_ERR(ctx);
ucma_put_ctx(ctx);
wait_for_completion(&ctx->comp);
resp.events_reported = ucma_free_ctx(ctx);
mutex_lock(&ctx->file->mut);
ctx->destroying = 1;
mutex_unlock(&ctx->file->mut);
flush_workqueue(ctx->file->close_wq);
/* At this point it's guaranteed that there is no inflight
* closing task */
mutex_lock(&mut);
if (!ctx->closing) {
mutex_unlock(&mut);
ucma_put_ctx(ctx);
wait_for_completion(&ctx->comp);
rdma_destroy_id(ctx->cm_id);
} else {
mutex_unlock(&mut);
}
resp.events_reported = ucma_free_ctx(ctx);
if (copy_to_user((void __user *)(unsigned long)cmd.response,
&resp, sizeof(resp)))
ret = -EFAULT;
......@@ -1321,10 +1420,10 @@ static ssize_t ucma_leave_multicast(struct ucma_file *file,
mc = ERR_PTR(-ENOENT);
else if (mc->ctx->file != file)
mc = ERR_PTR(-EINVAL);
else {
else if (!atomic_inc_not_zero(&mc->ctx->ref))
mc = ERR_PTR(-ENXIO);
else
idr_remove(&multicast_idr, mc->id);
atomic_inc(&mc->ctx->ref);
}
mutex_unlock(&mut);
if (IS_ERR(mc)) {
......@@ -1529,6 +1628,7 @@ static int ucma_open(struct inode *inode, struct file *filp)
INIT_LIST_HEAD(&file->ctx_list);
init_waitqueue_head(&file->poll_wait);
mutex_init(&file->mut);
file->close_wq = create_singlethread_workqueue("ucma_close_id");
filp->private_data = file;
file->filp = filp;
......@@ -1543,16 +1643,34 @@ static int ucma_close(struct inode *inode, struct file *filp)
mutex_lock(&file->mut);
list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
ctx->destroying = 1;
mutex_unlock(&file->mut);
mutex_lock(&mut);
idr_remove(&ctx_idr, ctx->id);
mutex_unlock(&mut);
flush_workqueue(file->close_wq);
/* At that step once ctx was marked as destroying and workqueue
* was flushed we are safe from any inflights handlers that
* might put other closing task.
*/
mutex_lock(&mut);
if (!ctx->closing) {
mutex_unlock(&mut);
/* rdma_destroy_id ensures that no event handlers are
* inflight for that id before releasing it.
*/
rdma_destroy_id(ctx->cm_id);
} else {
mutex_unlock(&mut);
}
ucma_free_ctx(ctx);
mutex_lock(&file->mut);
}
mutex_unlock(&file->mut);
destroy_workqueue(file->close_wq);
kfree(file);
return 0;
}
......
......@@ -133,7 +133,7 @@ static DEFINE_SPINLOCK(port_lock);
static DECLARE_BITMAP(dev_map, IB_UMAD_MAX_PORTS);
static void ib_umad_add_one(struct ib_device *device);
static void ib_umad_remove_one(struct ib_device *device);
static void ib_umad_remove_one(struct ib_device *device, void *client_data);
static void ib_umad_release_dev(struct kobject *kobj)
{
......@@ -1322,9 +1322,9 @@ static void ib_umad_add_one(struct ib_device *device)
kobject_put(&umad_dev->kobj);
}
static void ib_umad_remove_one(struct ib_device *device)
static void ib_umad_remove_one(struct ib_device *device, void *client_data)
{
struct ib_umad_device *umad_dev = ib_get_client_data(device, &umad_client);
struct ib_umad_device *umad_dev = client_data;
int i;
if (!umad_dev)
......
......@@ -85,15 +85,20 @@
*/
struct ib_uverbs_device {
struct kref ref;
atomic_t refcount;
int num_comp_vectors;
struct completion comp;
struct device *dev;
struct ib_device *ib_dev;
struct ib_device __rcu *ib_dev;
int devnum;
struct cdev cdev;
struct rb_root xrcd_tree;
struct mutex xrcd_tree_mutex;
struct kobject kobj;
struct srcu_struct disassociate_srcu;
struct mutex lists_mutex; /* protect lists */
struct list_head uverbs_file_list;
struct list_head uverbs_events_file_list;
};
struct ib_uverbs_event_file {
......@@ -105,6 +110,7 @@ struct ib_uverbs_event_file {
wait_queue_head_t poll_wait;
struct fasync_struct *async_queue;
struct list_head event_list;
struct list_head list;
};
struct ib_uverbs_file {
......@@ -114,6 +120,8 @@ struct ib_uverbs_file {
struct ib_ucontext *ucontext;
struct ib_event_handler event_handler;
struct ib_uverbs_event_file *async_file;
struct list_head list;
int is_closed;
};
struct ib_uverbs_event {
......@@ -177,7 +185,9 @@ extern struct idr ib_uverbs_rule_idr;
void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj);
struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file *uverbs_file,
struct ib_device *ib_dev,
int is_async);
void ib_uverbs_free_async_event_file(struct ib_uverbs_file *uverbs_file);
struct ib_uverbs_event_file *ib_uverbs_lookup_comp_file(int fd);
void ib_uverbs_release_ucq(struct ib_uverbs_file *file,
......@@ -212,6 +222,7 @@ struct ib_uverbs_flow_spec {
#define IB_UVERBS_DECLARE_CMD(name) \
ssize_t ib_uverbs_##name(struct ib_uverbs_file *file, \
struct ib_device *ib_dev, \
const char __user *buf, int in_len, \
int out_len)
......@@ -253,6 +264,7 @@ IB_UVERBS_DECLARE_CMD(close_xrcd);
#define IB_UVERBS_DECLARE_EX_CMD(name) \
int ib_uverbs_ex_##name(struct ib_uverbs_file *file, \
struct ib_device *ib_dev, \
struct ib_udata *ucore, \
struct ib_udata *uhw)
......
This diff is collapsed.
This diff is collapsed.
......@@ -213,28 +213,79 @@ EXPORT_SYMBOL(rdma_port_get_link_layer);
/* Protection domains */
/**
* ib_alloc_pd - Allocates an unused protection domain.
* @device: The device on which to allocate the protection domain.
*
* A protection domain object provides an association between QPs, shared
* receive queues, address handles, memory regions, and memory windows.
*
* Every PD has a local_dma_lkey which can be used as the lkey value for local
* memory operations.
*/
struct ib_pd *ib_alloc_pd(struct ib_device *device)
{
struct ib_pd *pd;
struct ib_device_attr devattr;
int rc;
rc = ib_query_device(device, &devattr);
if (rc)
return ERR_PTR(rc);
pd = device->alloc_pd(device, NULL, NULL);
if (IS_ERR(pd))
return pd;
pd->device = device;
pd->uobject = NULL;
pd->local_mr = NULL;
atomic_set(&pd->usecnt, 0);
if (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)
pd->local_dma_lkey = device->local_dma_lkey;
else {
struct ib_mr *mr;
mr = ib_get_dma_mr(pd, IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(mr)) {
ib_dealloc_pd(pd);
return (struct ib_pd *)mr;
}
if (!IS_ERR(pd)) {
pd->device = device;
pd->uobject = NULL;
atomic_set(&pd->usecnt, 0);
pd->local_mr = mr;
pd->local_dma_lkey = pd->local_mr->lkey;
}
return pd;
}
EXPORT_SYMBOL(ib_alloc_pd);
int ib_dealloc_pd(struct ib_pd *pd)
/**
* ib_dealloc_pd - Deallocates a protection domain.
* @pd: The protection domain to deallocate.
*
* It is an error to call this function while any resources in the pd still
* exist. The caller is responsible to synchronously destroy them and
* guarantee no new allocations will happen.
*/
void ib_dealloc_pd(struct ib_pd *pd)
{
if (atomic_read(&pd->usecnt))
return -EBUSY;
int ret;
if (pd->local_mr) {
ret = ib_dereg_mr(pd->local_mr);
WARN_ON(ret);
pd->local_mr = NULL;
}
return pd->device->dealloc_pd(pd);
/* uverbs manipulates usecnt with proper locking, while the kabi
requires the caller to guarantee we can't race here. */
WARN_ON(atomic_read(&pd->usecnt));
/* Making delalloc_pd a void return is a WIP, no driver should return
an error here. */
ret = pd->device->dealloc_pd(pd);
WARN_ONCE(ret, "Infiniband HW driver failed dealloc_pd");
}
EXPORT_SYMBOL(ib_dealloc_pd);
......@@ -1168,54 +1219,28 @@ int ib_dereg_mr(struct ib_mr *mr)
}
EXPORT_SYMBOL(ib_dereg_mr);
struct ib_mr *ib_create_mr(struct ib_pd *pd,
struct ib_mr_init_attr *mr_init_attr)
{
struct ib_mr *mr;
if (!pd->device->create_mr)
return ERR_PTR(-ENOSYS);
mr = pd->device->create_mr(pd, mr_init_attr);
if (!IS_ERR(mr)) {
mr->device = pd->device;
mr->pd = pd;
mr->uobject = NULL;
atomic_inc(&pd->usecnt);
atomic_set(&mr->usecnt, 0);
}
return mr;
}
EXPORT_SYMBOL(ib_create_mr);
int ib_destroy_mr(struct ib_mr *mr)
{
struct ib_pd *pd;
int ret;
if (atomic_read(&mr->usecnt))
return -EBUSY;
pd = mr->pd;
ret = mr->device->destroy_mr(mr);
if (!ret)
atomic_dec(&pd->usecnt);
return ret;
}
EXPORT_SYMBOL(ib_destroy_mr);
struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
/**
* ib_alloc_mr() - Allocates a memory region
* @pd: protection domain associated with the region
* @mr_type: memory region type
* @max_num_sg: maximum sg entries available for registration.
*
* Notes:
* Memory registeration page/sg lists must not exceed max_num_sg.
* For mr_type IB_MR_TYPE_MEM_REG, the total length cannot exceed
* max_num_sg * used_page_size.
*
*/
struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg)
{
struct ib_mr *mr;
if (!pd->device->alloc_fast_reg_mr)
if (!pd->device->alloc_mr)
return ERR_PTR(-ENOSYS);
mr = pd->device->alloc_fast_reg_mr(pd, max_page_list_len);
mr = pd->device->alloc_mr(pd, mr_type, max_num_sg);
if (!IS_ERR(mr)) {
mr->device = pd->device;
mr->pd = pd;
......@@ -1226,7 +1251,7 @@ struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
return mr;
}
EXPORT_SYMBOL(ib_alloc_fast_reg_mr);
EXPORT_SYMBOL(ib_alloc_mr);
struct ib_fast_reg_page_list *ib_alloc_fast_reg_page_list(struct ib_device *device,
int max_page_list_len)
......
obj-$(CONFIG_INFINIBAND_MTHCA) += mthca/
obj-$(CONFIG_INFINIBAND_IPATH) += ipath/
obj-$(CONFIG_INFINIBAND_QIB) += qib/
obj-$(CONFIG_INFINIBAND_EHCA) += ehca/
obj-$(CONFIG_INFINIBAND_AMSO1100) += amso1100/
obj-$(CONFIG_INFINIBAND_CXGB3) += cxgb3/
obj-$(CONFIG_INFINIBAND_CXGB4) += cxgb4/
obj-$(CONFIG_MLX4_INFINIBAND) += mlx4/
......
This diff is collapsed.
This diff is collapsed.
......@@ -970,7 +970,9 @@ void c4iw_free_fastreg_pbl(struct ib_fast_reg_page_list *page_list);
struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(
struct ib_device *device,
int page_list_len);
struct ib_mr *c4iw_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth);
struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
u32 max_num_sg);
int c4iw_dealloc_mw(struct ib_mw *mw);
struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type);
struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start,
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
# Entries for RDMA_STAGING tree
obj-$(CONFIG_INFINIBAND_AMSO1100) += amso1100/
obj-$(CONFIG_INFINIBAND_HFI1) += hfi1/
obj-$(CONFIG_INFINIBAND_IPATH) += ipath/
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment