Commit 2a3c389a authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A smaller cycle this time. Notably we see another new driver, 'Soft
  iWarp', and the deletion of an ancient unused driver for nes.

   - Revise and simplify the signature offload RDMA MR APIs

   - More progress on hoisting object allocation boiler plate code out
     of the drivers

   - Driver bug fixes and revisions for hns, hfi1, efa, cxgb4, qib,
     i40iw

   - Tree wide cleanups: struct_size, put_user_page, xarray, rst doc
     conversion

   - Removal of obsolete ib_ucm chardev and nes driver

   - netlink based discovery of chardevs and autoloading of the modules
     providing them

   - Move more of the rdamvt/hfi1 uapi to include/uapi/rdma

   - New driver 'siw' for software based iWarp running on top of netdev,
     much like rxe's software RoCE.

   - mlx5 feature to report events in their raw devx format to userspace

   - Expose per-object counters through rdma tool

   - Adaptive interrupt moderation for RDMA (DIM), sharing the DIM core
     from netdev"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (194 commits)
  RMDA/siw: Require a 64 bit arch
  RDMA/siw: Mark expected switch fall-throughs
  RDMA/core: Fix -Wunused-const-variable warnings
  rdma/siw: Remove set but not used variable 's'
  rdma/siw: Add missing dependencies on LIBCRC32C and DMA_VIRT_OPS
  RDMA/siw: Add missing rtnl_lock around access to ifa
  rdma/siw: Use proper enumerated type in map_cqe_status
  RDMA/siw: Remove unnecessary kthread create/destroy printouts
  IB/rdmavt: Fix variable shadowing issue in rvt_create_cq
  RDMA/core: Fix race when resolving IP address
  RDMA/core: Make rdma_counter.h compile stand alone
  IB/core: Work on the caller socket net namespace in nldev_newlink()
  RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMM
  RDMA/mlx5: Set RDMA DIM to be enabled by default
  RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlink
  RDMA/core: Provide RDMA DIM support for ULPs
  linux/dim: Implement RDMA adaptive moderation (DIM)
  IB/mlx5: Report correctly tag matching rendezvous capability
  docs: infiniband: add it to the driver-api bookset
  IB/mlx5: Implement VHCA tunnel mechanism in DEVX
  ...
parents 8de26253 0b043644
...@@ -423,23 +423,6 @@ Description: ...@@ -423,23 +423,6 @@ Description:
(e.g. driver restart on the VM which owns the VF). (e.g. driver restart on the VM which owns the VF).
sysfs interface for NetEffect RNIC Low-Level iWARP driver (nes)
---------------------------------------------------------------
What: /sys/class/infiniband/nesX/hw_rev
What: /sys/class/infiniband/nesX/hca_type
What: /sys/class/infiniband/nesX/board_id
Date: Feb, 2008
KernelVersion: v2.6.25
Contact: linux-rdma@vger.kernel.org
Description:
hw_rev: (RO) Hardware revision number
hca_type: (RO) Host Channel Adapter type (NEX020)
board_id: (RO) Manufacturing board id
sysfs interface for Chelsio T4/T5 RDMA driver (cxgb4) sysfs interface for Chelsio T4/T5 RDMA driver (cxgb4)
----------------------------------------------------- -----------------------------------------------------
......
...@@ -90,6 +90,7 @@ needed). ...@@ -90,6 +90,7 @@ needed).
driver-api/index driver-api/index
core-api/index core-api/index
infiniband/index
media/index media/index
networking/index networking/index
input/index input/index
......
INFINIBAND MIDLAYER LOCKING ===========================
InfiniBand Midlayer Locking
===========================
This guide is an attempt to make explicit the locking assumptions This guide is an attempt to make explicit the locking assumptions
made by the InfiniBand midlayer. It describes the requirements on made by the InfiniBand midlayer. It describes the requirements on
...@@ -6,45 +8,47 @@ INFINIBAND MIDLAYER LOCKING ...@@ -6,45 +8,47 @@ INFINIBAND MIDLAYER LOCKING
protocols that use the midlayer. protocols that use the midlayer.
Sleeping and interrupt context Sleeping and interrupt context
==============================
With the following exceptions, a low-level driver implementation of With the following exceptions, a low-level driver implementation of
all of the methods in struct ib_device may sleep. The exceptions all of the methods in struct ib_device may sleep. The exceptions
are any methods from the list: are any methods from the list:
create_ah - create_ah
modify_ah - modify_ah
query_ah - query_ah
destroy_ah - destroy_ah
post_send - post_send
post_recv - post_recv
poll_cq - poll_cq
req_notify_cq - req_notify_cq
map_phys_fmr - map_phys_fmr
which may not sleep and must be callable from any context. which may not sleep and must be callable from any context.
The corresponding functions exported to upper level protocol The corresponding functions exported to upper level protocol
consumers: consumers:
ib_create_ah - ib_create_ah
ib_modify_ah - ib_modify_ah
ib_query_ah - ib_query_ah
ib_destroy_ah - ib_destroy_ah
ib_post_send - ib_post_send
ib_post_recv - ib_post_recv
ib_req_notify_cq - ib_req_notify_cq
ib_map_phys_fmr - ib_map_phys_fmr
are therefore safe to call from any context. are therefore safe to call from any context.
In addition, the function In addition, the function
ib_dispatch_event - ib_dispatch_event
used by low-level drivers to dispatch asynchronous events through used by low-level drivers to dispatch asynchronous events through
the midlayer is also safe to call from any context. the midlayer is also safe to call from any context.
Reentrancy Reentrancy
----------
All of the methods in struct ib_device exported by a low-level All of the methods in struct ib_device exported by a low-level
driver must be fully reentrant. The low-level driver is required to driver must be fully reentrant. The low-level driver is required to
...@@ -62,6 +66,7 @@ Reentrancy ...@@ -62,6 +66,7 @@ Reentrancy
information between different calls of ib_poll_cq() is not defined. information between different calls of ib_poll_cq() is not defined.
Callbacks Callbacks
---------
A low-level driver must not perform a callback directly from the A low-level driver must not perform a callback directly from the
same callchain as an ib_device method call. For example, it is not same callchain as an ib_device method call. For example, it is not
...@@ -74,7 +79,7 @@ Callbacks ...@@ -74,7 +79,7 @@ Callbacks
completion event handlers for the same CQ are not called completion event handlers for the same CQ are not called
simultaneously. The driver must guarantee that only one CQ event simultaneously. The driver must guarantee that only one CQ event
handler for a given CQ is running at a time. In other words, the handler for a given CQ is running at a time. In other words, the
following situation is not allowed: following situation is not allowed::
CPU1 CPU2 CPU1 CPU2
...@@ -93,6 +98,7 @@ Callbacks ...@@ -93,6 +98,7 @@ Callbacks
Upper level protocol consumers may not sleep in a callback. Upper level protocol consumers may not sleep in a callback.
Hot-plug Hot-plug
--------
A low-level driver announces that a device is ready for use by A low-level driver announces that a device is ready for use by
consumers when it calls ib_register_device(), all initialization consumers when it calls ib_register_device(), all initialization
......
.. SPDX-License-Identifier: GPL-2.0
==========
InfiniBand
==========
.. toctree::
:maxdepth: 1
core_locking
ipoib
opa_vnic
sysfs
tag_matching
user_mad
user_verbs
.. only:: subproject and html
Indices
=======
* :ref:`genindex`
IP OVER INFINIBAND ==================
IP over InfiniBand
==================
The ib_ipoib driver is an implementation of the IP over InfiniBand The ib_ipoib driver is an implementation of the IP over InfiniBand
protocol as specified by RFC 4391 and 4392, issued by the IETF ipoib protocol as specified by RFC 4391 and 4392, issued by the IETF ipoib
...@@ -8,16 +10,17 @@ IP OVER INFINIBAND ...@@ -8,16 +10,17 @@ IP OVER INFINIBAND
masqueraded to the kernel as ethernet interfaces). masqueraded to the kernel as ethernet interfaces).
Partitions and P_Keys Partitions and P_Keys
=====================
When the IPoIB driver is loaded, it creates one interface for each When the IPoIB driver is loaded, it creates one interface for each
port using the P_Key at index 0. To create an interface with a port using the P_Key at index 0. To create an interface with a
different P_Key, write the desired P_Key into the main interface's different P_Key, write the desired P_Key into the main interface's
/sys/class/net/<intf name>/create_child file. For example: /sys/class/net/<intf name>/create_child file. For example::
echo 0x8001 > /sys/class/net/ib0/create_child echo 0x8001 > /sys/class/net/ib0/create_child
This will create an interface named ib0.8001 with P_Key 0x8001. To This will create an interface named ib0.8001 with P_Key 0x8001. To
remove a subinterface, use the "delete_child" file: remove a subinterface, use the "delete_child" file::
echo 0x8001 > /sys/class/net/ib0/delete_child echo 0x8001 > /sys/class/net/ib0/delete_child
...@@ -28,6 +31,7 @@ Partitions and P_Keys ...@@ -28,6 +31,7 @@ Partitions and P_Keys
rtnl_link_ops, where children created using either way behave the same. rtnl_link_ops, where children created using either way behave the same.
Datagram vs Connected modes Datagram vs Connected modes
===========================
The IPoIB driver supports two modes of operation: datagram and The IPoIB driver supports two modes of operation: datagram and
connected. The mode is set and read through an interface's connected. The mode is set and read through an interface's
...@@ -51,6 +55,7 @@ Datagram vs Connected modes ...@@ -51,6 +55,7 @@ Datagram vs Connected modes
networking stack to use the smaller UD MTU for these neighbours. networking stack to use the smaller UD MTU for these neighbours.
Stateless offloads Stateless offloads
==================
If the IB HW supports IPoIB stateless offloads, IPoIB advertises If the IB HW supports IPoIB stateless offloads, IPoIB advertises
TCP/IP checksum and/or Large Send (LSO) offloading capability to the TCP/IP checksum and/or Large Send (LSO) offloading capability to the
...@@ -63,6 +68,7 @@ Stateless offloads ...@@ -63,6 +68,7 @@ Stateless offloads
Stateless offloads are supported only in datagram mode. Stateless offloads are supported only in datagram mode.
Interrupt moderation Interrupt moderation
====================
If the underlying IB device supports CQ event moderation, one can If the underlying IB device supports CQ event moderation, one can
use ethtool to set interrupt mitigation parameters and thus reduce use ethtool to set interrupt mitigation parameters and thus reduce
...@@ -71,6 +77,7 @@ Interrupt moderation ...@@ -71,6 +77,7 @@ Interrupt moderation
moderation is supported. moderation is supported.
Debugging Information Debugging Information
=====================
By compiling the IPoIB driver with CONFIG_INFINIBAND_IPOIB_DEBUG set By compiling the IPoIB driver with CONFIG_INFINIBAND_IPOIB_DEBUG set
to 'y', tracing messages are compiled into the driver. They are to 'y', tracing messages are compiled into the driver. They are
...@@ -79,7 +86,7 @@ Debugging Information ...@@ -79,7 +86,7 @@ Debugging Information
runtime through files in /sys/module/ib_ipoib/. runtime through files in /sys/module/ib_ipoib/.
CONFIG_INFINIBAND_IPOIB_DEBUG also enables files in the debugfs CONFIG_INFINIBAND_IPOIB_DEBUG also enables files in the debugfs
virtual filesystem. By mounting this filesystem, for example with virtual filesystem. By mounting this filesystem, for example with::
mount -t debugfs none /sys/kernel/debug mount -t debugfs none /sys/kernel/debug
...@@ -96,10 +103,13 @@ Debugging Information ...@@ -96,10 +103,13 @@ Debugging Information
performance, because it adds tests to the fast path. performance, because it adds tests to the fast path.
References References
==========
Transmission of IP over InfiniBand (IPoIB) (RFC 4391) Transmission of IP over InfiniBand (IPoIB) (RFC 4391)
http://ietf.org/rfc/rfc4391.txt http://ietf.org/rfc/rfc4391.txt
IP over InfiniBand (IPoIB) Architecture (RFC 4392) IP over InfiniBand (IPoIB) Architecture (RFC 4392)
http://ietf.org/rfc/rfc4392.txt http://ietf.org/rfc/rfc4392.txt
IP over InfiniBand: Connected Mode (RFC 4755) IP over InfiniBand: Connected Mode (RFC 4755)
http://ietf.org/rfc/rfc4755.txt http://ietf.org/rfc/rfc4755.txt
=================================================================
Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC)
=================================================================
Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature
supports Ethernet functionality over Omni-Path fabric by encapsulating supports Ethernet functionality over Omni-Path fabric by encapsulating
the Ethernet packets between HFI nodes. the Ethernet packets between HFI nodes.
...@@ -17,7 +21,7 @@ an independent Ethernet network. The configuration is performed by an ...@@ -17,7 +21,7 @@ an independent Ethernet network. The configuration is performed by an
Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM) Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM)
application. HFI nodes can have multiple VNICs each connected to a application. HFI nodes can have multiple VNICs each connected to a
different virtual Ethernet switch. The below diagram presents a case different virtual Ethernet switch. The below diagram presents a case
of two virtual Ethernet switches with two HFI nodes. of two virtual Ethernet switches with two HFI nodes::
+-------------------+ +-------------------+
| Subnet/ | | Subnet/ |
...@@ -28,11 +32,11 @@ of two virtual Ethernet switches with two HFI nodes. ...@@ -28,11 +32,11 @@ of two virtual Ethernet switches with two HFI nodes.
/ / / /
/ / / /
/ / / /
+-----------------------------+ +------------------------------+ +-----------------------------+ +------------------------------+
| Virtual Ethernet Switch | | Virtual Ethernet Switch | | Virtual Ethernet Switch | | Virtual Ethernet Switch |
| +---------+ +---------+ | | +---------+ +---------+ | | +---------+ +---------+ | | +---------+ +---------+ |
| | VPORT | | VPORT | | | | VPORT | | VPORT | | | | VPORT | | VPORT | | | | VPORT | | VPORT | |
+--+---------+----+---------+-+ +-+---------+----+---------+---+ +--+---------+----+---------+-+ +-+---------+----+---------+---+
| \ / | | \ / |
| \ / | | \ / |
| \/ | | \/ |
...@@ -47,8 +51,9 @@ of two virtual Ethernet switches with two HFI nodes. ...@@ -47,8 +51,9 @@ of two virtual Ethernet switches with two HFI nodes.
The Omni-Path encapsulated Ethernet packet format is as described below. The Omni-Path encapsulated Ethernet packet format is as described below.
==================== ================================
Bits Field Bits Field
------------------------------------ ==================== ================================
Quad Word 0: Quad Word 0:
0-19 SLID (lower 20 bits) 0-19 SLID (lower 20 bits)
20-30 Length (in Quad Words) 20-30 Length (in Quad Words)
...@@ -81,6 +86,7 @@ Quad Word N (last): ...@@ -81,6 +86,7 @@ Quad Word N (last):
24-55 ICRC 24-55 ICRC
56-61 Tail 56-61 Tail
62-63 LT (=01, Link Transfer Tail Flit) 62-63 LT (=01, Link Transfer Tail Flit)
==================== ================================
Ethernet packet is padded on the transmit side to ensure that the VNIC OPA Ethernet packet is padded on the transmit side to ensure that the VNIC OPA
packet is quad word aligned. The 'Tail' field contains the number of bytes packet is quad word aligned. The 'Tail' field contains the number of bytes
...@@ -123,7 +129,7 @@ operation. It also handles the encapsulation of Ethernet packets with an ...@@ -123,7 +129,7 @@ operation. It also handles the encapsulation of Ethernet packets with an
Omni-Path header in the transmit path. For each VNIC interface, the Omni-Path header in the transmit path. For each VNIC interface, the
information required for encapsulation is configured by the EM via VEMA MAD information required for encapsulation is configured by the EM via VEMA MAD
interface. It also passes any control information to the HW dependent driver interface. It also passes any control information to the HW dependent driver
by invoking the RDMA netdev control operations. by invoking the RDMA netdev control operations::
+-------------------+ +----------------------+ +-------------------+ +----------------------+
| | | Linux | | | | Linux |
......
SYSFS FILES ===========
Sysfs files
===========
The sysfs interface has moved to The sysfs interface has moved to
Documentation/ABI/stable/sysfs-class-infiniband. Documentation/ABI/stable/sysfs-class-infiniband.
==================
Tag matching logic Tag matching logic
==================
The MPI standard defines a set of rules, known as tag-matching, for matching The MPI standard defines a set of rules, known as tag-matching, for matching
source send operations to destination receives. The following parameters must source send operations to destination receives. The following parameters must
match the following source and destination parameters: match the following source and destination parameters:
* Communicator * Communicator
* User tag - wild card may be specified by the receiver * User tag - wild card may be specified by the receiver
* Source rank – wild car may be specified by the receiver * Source rank – wild car may be specified by the receiver
* Destination rank – wild * Destination rank – wild
The ordering rules require that when more than one pair of send and receive The ordering rules require that when more than one pair of send and receive
message envelopes may match, the pair that includes the earliest posted-send message envelopes may match, the pair that includes the earliest posted-send
and the earliest posted-receive is the pair that must be used to satisfy the and the earliest posted-receive is the pair that must be used to satisfy the
...@@ -35,6 +39,7 @@ the header to initiate an RDMA READ operation directly to the matching buffer. ...@@ -35,6 +39,7 @@ the header to initiate an RDMA READ operation directly to the matching buffer.
A fin message needs to be received in order for the buffer to be reused. A fin message needs to be received in order for the buffer to be reused.
Tag matching implementation Tag matching implementation
===========================
There are two types of matching objects used, the posted receive list and the There are two types of matching objects used, the posted receive list and the
unexpected message list. The application posts receive buffers through calls unexpected message list. The application posts receive buffers through calls
......
USERSPACE MAD ACCESS ====================
Userspace MAD access
====================
Device files Device files
============
Each port of each InfiniBand device has a "umad" device and an Each port of each InfiniBand device has a "umad" device and an
"issm" device attached. For example, a two-port HCA will have two "issm" device attached. For example, a two-port HCA will have two
...@@ -8,12 +11,13 @@ Device files ...@@ -8,12 +11,13 @@ Device files
device of each type (for switch port 0). device of each type (for switch port 0).
Creating MAD agents Creating MAD agents
===================
A MAD agent can be created by filling in a struct ib_user_mad_reg_req A MAD agent can be created by filling in a struct ib_user_mad_reg_req
and then calling the IB_USER_MAD_REGISTER_AGENT ioctl on a file and then calling the IB_USER_MAD_REGISTER_AGENT ioctl on a file
descriptor for the appropriate device file. If the registration descriptor for the appropriate device file. If the registration
request succeeds, a 32-bit id will be returned in the structure. request succeeds, a 32-bit id will be returned in the structure.
For example: For example::
struct ib_user_mad_reg_req req = { /* ... */ }; struct ib_user_mad_reg_req req = { /* ... */ };
ret = ioctl(fd, IB_USER_MAD_REGISTER_AGENT, (char *) &req); ret = ioctl(fd, IB_USER_MAD_REGISTER_AGENT, (char *) &req);
...@@ -26,12 +30,14 @@ Creating MAD agents ...@@ -26,12 +30,14 @@ Creating MAD agents
ioctl. Also, all agents registered through a file descriptor will ioctl. Also, all agents registered through a file descriptor will
be unregistered when the descriptor is closed. be unregistered when the descriptor is closed.
2014 -- a new registration ioctl is now provided which allows additional 2014
a new registration ioctl is now provided which allows additional
fields to be provided during registration. fields to be provided during registration.
Users of this registration call are implicitly setting the use of Users of this registration call are implicitly setting the use of
pkey_index (see below). pkey_index (see below).
Receiving MADs Receiving MADs
==============
MADs are received using read(). The receive side now supports MADs are received using read(). The receive side now supports
RMPP. The buffer passed to read() must be at least one RMPP. The buffer passed to read() must be at least one
...@@ -41,7 +47,8 @@ Receiving MADs ...@@ -41,7 +47,8 @@ Receiving MADs
MAD (RMPP), the errno is set to ENOSPC and the length of the MAD (RMPP), the errno is set to ENOSPC and the length of the
buffer needed is set in mad.length. buffer needed is set in mad.length.
Example for normal MAD (non RMPP) reads: Example for normal MAD (non RMPP) reads::
struct ib_user_mad *mad; struct ib_user_mad *mad;
mad = malloc(sizeof *mad + 256); mad = malloc(sizeof *mad + 256);
ret = read(fd, mad, sizeof *mad + 256); ret = read(fd, mad, sizeof *mad + 256);
...@@ -50,7 +57,8 @@ Receiving MADs ...@@ -50,7 +57,8 @@ Receiving MADs
free(mad); free(mad);
} }
Example for RMPP reads: Example for RMPP reads::
struct ib_user_mad *mad; struct ib_user_mad *mad;
mad = malloc(sizeof *mad + 256); mad = malloc(sizeof *mad + 256);
ret = read(fd, mad, sizeof *mad + 256); ret = read(fd, mad, sizeof *mad + 256);
...@@ -76,11 +84,12 @@ Receiving MADs ...@@ -76,11 +84,12 @@ Receiving MADs
poll()/select() may be used to wait until a MAD can be read. poll()/select() may be used to wait until a MAD can be read.
Sending MADs Sending MADs
============
MADs are sent using write(). The agent ID for sending should be MADs are sent using write(). The agent ID for sending should be
filled into the id field of the MAD, the destination LID should be filled into the id field of the MAD, the destination LID should be
filled into the lid field, and so on. The send side does support filled into the lid field, and so on. The send side does support
RMPP so arbitrary length MAD can be sent. For example: RMPP so arbitrary length MAD can be sent. For example::
struct ib_user_mad *mad; struct ib_user_mad *mad;
...@@ -97,6 +106,7 @@ Sending MADs ...@@ -97,6 +106,7 @@ Sending MADs
perror("write"); perror("write");
Transaction IDs Transaction IDs
===============
Users of the umad devices can use the lower 32 bits of the Users of the umad devices can use the lower 32 bits of the
transaction ID field (that is, the least significant half of the transaction ID field (that is, the least significant half of the
...@@ -105,6 +115,7 @@ Transaction IDs ...@@ -105,6 +115,7 @@ Transaction IDs
the kernel and will be overwritten before a MAD is sent. the kernel and will be overwritten before a MAD is sent.
P_Key Index Handling P_Key Index Handling
====================
The old ib_umad interface did not allow setting the P_Key index for The old ib_umad interface did not allow setting the P_Key index for
MADs that are sent and did not provide a way for obtaining the P_Key MADs that are sent and did not provide a way for obtaining the P_Key
...@@ -119,6 +130,7 @@ P_Key Index Handling ...@@ -119,6 +130,7 @@ P_Key Index Handling
default, and the IB_USER_MAD_ENABLE_PKEY ioctl will be removed. default, and the IB_USER_MAD_ENABLE_PKEY ioctl will be removed.
Setting IsSM Capability Bit Setting IsSM Capability Bit
===========================
To set the IsSM capability bit for a port, simply open the To set the IsSM capability bit for a port, simply open the
corresponding issm device file. If the IsSM bit is already set, corresponding issm device file. If the IsSM bit is already set,
...@@ -129,25 +141,26 @@ Setting IsSM Capability Bit ...@@ -129,25 +141,26 @@ Setting IsSM Capability Bit
the issm file. the issm file.
/dev files /dev files
==========
To create the appropriate character device files automatically with To create the appropriate character device files automatically with
udev, a rule like udev, a rule like::
KERNEL=="umad*", NAME="infiniband/%k" KERNEL=="umad*", NAME="infiniband/%k"
KERNEL=="issm*", NAME="infiniband/%k" KERNEL=="issm*", NAME="infiniband/%k"
can be used. This will create device nodes named can be used. This will create device nodes named::
/dev/infiniband/umad0 /dev/infiniband/umad0
/dev/infiniband/issm0 /dev/infiniband/issm0
for the first port, and so on. The InfiniBand device and port for the first port, and so on. The InfiniBand device and port
associated with these devices can be determined from the files associated with these devices can be determined from the files::
/sys/class/infiniband_mad/umad0/ibdev /sys/class/infiniband_mad/umad0/ibdev
/sys/class/infiniband_mad/umad0/port /sys/class/infiniband_mad/umad0/port
and and::
/sys/class/infiniband_mad/issm0/ibdev /sys/class/infiniband_mad/issm0/ibdev
/sys/class/infiniband_mad/issm0/port /sys/class/infiniband_mad/issm0/port
USERSPACE VERBS ACCESS ======================
Userspace verbs access
======================
The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS, The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS,
enables direct userspace access to IB hardware via "verbs," as enables direct userspace access to IB hardware via "verbs," as
...@@ -13,6 +15,7 @@ USERSPACE VERBS ACCESS ...@@ -13,6 +15,7 @@ USERSPACE VERBS ACCESS
libmthca userspace driver be installed. libmthca userspace driver be installed.
User-kernel communication User-kernel communication
=========================
Userspace communicates with the kernel for slow path, resource Userspace communicates with the kernel for slow path, resource
management operations via the /dev/infiniband/uverbsN character management operations via the /dev/infiniband/uverbsN character
...@@ -28,6 +31,7 @@ User-kernel communication ...@@ -28,6 +31,7 @@ User-kernel communication
system call. system call.
Resource management Resource management
===================
Since creation and destruction of all IB resources is done by Since creation and destruction of all IB resources is done by
commands passed through a file descriptor, the kernel can keep track commands passed through a file descriptor, the kernel can keep track
...@@ -41,6 +45,7 @@ Resource management ...@@ -41,6 +45,7 @@ Resource management
prevent one process from touching another process's resources. prevent one process from touching another process's resources.
Memory pinning Memory pinning
==============
Direct userspace I/O requires that memory regions that are potential Direct userspace I/O requires that memory regions that are potential
I/O targets be kept resident at the same physical address. The I/O targets be kept resident at the same physical address. The
...@@ -54,13 +59,14 @@ Memory pinning ...@@ -54,13 +59,14 @@ Memory pinning
number of pages pinned by a process. number of pages pinned by a process.
/dev files /dev files
==========
To create the appropriate character device files automatically with To create the appropriate character device files automatically with
udev, a rule like udev, a rule like::
KERNEL=="uverbs*", NAME="infiniband/%k" KERNEL=="uverbs*", NAME="infiniband/%k"
can be used. This will create device nodes named can be used. This will create device nodes named::
/dev/infiniband/uverbs0 /dev/infiniband/uverbs0
......
...@@ -11018,14 +11018,6 @@ F: driver/net/net_failover.c ...@@ -11018,14 +11018,6 @@ F: driver/net/net_failover.c
F: include/net/net_failover.h F: include/net/net_failover.h
F: Documentation/networking/net_failover.rst F: Documentation/networking/net_failover.rst
NETEFFECT IWARP RNIC DRIVER (IW_NES)
M: Faisal Latif <faisal.latif@intel.com>
L: linux-rdma@vger.kernel.org
W: http://www.intel.com/Products/Server/Adapters/Server-Cluster/Server-Cluster-overview.htm
S: Supported
F: drivers/infiniband/hw/nes/
F: include/uapi/rdma/nes-abi.h
NETEM NETWORK EMULATOR NETEM NETWORK EMULATOR
M: Stephen Hemminger <stephen@networkplumber.org> M: Stephen Hemminger <stephen@networkplumber.org>
L: netem@lists.linux-foundation.org (moderated for non-subscribers) L: netem@lists.linux-foundation.org (moderated for non-subscribers)
...@@ -14755,6 +14747,13 @@ M: Chris Boot <bootc@bootc.net> ...@@ -14755,6 +14747,13 @@ M: Chris Boot <bootc@bootc.net>
S: Maintained S: Maintained
F: drivers/leds/leds-net48xx.c F: drivers/leds/leds-net48xx.c
SOFT-IWARP DRIVER (siw)
M: Bernard Metzler <bmt@zurich.ibm.com>
L: linux-rdma@vger.kernel.org
S: Supported
F: drivers/infiniband/sw/siw/
F: include/uapi/rdma/siw-abi.h
SOFT-ROCE DRIVER (rxe) SOFT-ROCE DRIVER (rxe)
M: Moni Shoua <monis@mellanox.com> M: Moni Shoua <monis@mellanox.com>
L: linux-rdma@vger.kernel.org L: linux-rdma@vger.kernel.org
......
...@@ -7,6 +7,7 @@ menuconfig INFINIBAND ...@@ -7,6 +7,7 @@ menuconfig INFINIBAND
depends on m || IPV6 != m depends on m || IPV6 != m
depends on !ALPHA depends on !ALPHA
select IRQ_POLL select IRQ_POLL
select DIMLIB
---help--- ---help---
Core support for InfiniBand (IB). Make sure to also select Core support for InfiniBand (IB). Make sure to also select
any protocols you wish to use as well as drivers for your any protocols you wish to use as well as drivers for your
...@@ -36,17 +37,6 @@ config INFINIBAND_USER_ACCESS ...@@ -36,17 +37,6 @@ config INFINIBAND_USER_ACCESS
libibverbs, libibcm and a hardware driver library from libibverbs, libibcm and a hardware driver library from
rdma-core <https://github.com/linux-rdma/rdma-core>. rdma-core <https://github.com/linux-rdma/rdma-core>.
config INFINIBAND_USER_ACCESS_UCM
tristate "Userspace CM (UCM, DEPRECATED)"
depends on BROKEN || COMPILE_TEST
depends on INFINIBAND_USER_ACCESS
help
The UCM module has known security flaws, which no one is
interested to fix. The user-space part of this code was
dropped from the upstream a long time ago.
This option is DEPRECATED and planned to be removed.
config INFINIBAND_EXP_LEGACY_VERBS_NEW_UAPI config INFINIBAND_EXP_LEGACY_VERBS_NEW_UAPI
bool "Allow experimental legacy verbs in new ioctl uAPI (EXPERIMENTAL)" bool "Allow experimental legacy verbs in new ioctl uAPI (EXPERIMENTAL)"
depends on INFINIBAND_USER_ACCESS depends on INFINIBAND_USER_ACCESS
...@@ -98,7 +88,6 @@ source "drivers/infiniband/hw/efa/Kconfig" ...@@ -98,7 +88,6 @@ source "drivers/infiniband/hw/efa/Kconfig"
source "drivers/infiniband/hw/i40iw/Kconfig" source "drivers/infiniband/hw/i40iw/Kconfig"
source "drivers/infiniband/hw/mlx4/Kconfig" source "drivers/infiniband/hw/mlx4/Kconfig"
source "drivers/infiniband/hw/mlx5/Kconfig" source "drivers/infiniband/hw/mlx5/Kconfig"
source "drivers/infiniband/hw/nes/Kconfig"
source "drivers/infiniband/hw/ocrdma/Kconfig" source "drivers/infiniband/hw/ocrdma/Kconfig"
source "drivers/infiniband/hw/vmw_pvrdma/Kconfig" source "drivers/infiniband/hw/vmw_pvrdma/Kconfig"
source "drivers/infiniband/hw/usnic/Kconfig" source "drivers/infiniband/hw/usnic/Kconfig"
...@@ -108,6 +97,7 @@ source "drivers/infiniband/hw/hfi1/Kconfig" ...@@ -108,6 +97,7 @@ source "drivers/infiniband/hw/hfi1/Kconfig"
source "drivers/infiniband/hw/qedr/Kconfig" source "drivers/infiniband/hw/qedr/Kconfig"
source "drivers/infiniband/sw/rdmavt/Kconfig" source "drivers/infiniband/sw/rdmavt/Kconfig"
source "drivers/infiniband/sw/rxe/Kconfig" source "drivers/infiniband/sw/rxe/Kconfig"
source "drivers/infiniband/sw/siw/Kconfig"
endif endif
source "drivers/infiniband/ulp/ipoib/Kconfig" source "drivers/infiniband/ulp/ipoib/Kconfig"
......
...@@ -6,13 +6,12 @@ obj-$(CONFIG_INFINIBAND) += ib_core.o ib_cm.o iw_cm.o \ ...@@ -6,13 +6,12 @@ obj-$(CONFIG_INFINIBAND) += ib_core.o ib_cm.o iw_cm.o \
$(infiniband-y) $(infiniband-y)
obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o
obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o $(user_access-y) obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o $(user_access-y)
obj-$(CONFIG_INFINIBAND_USER_ACCESS_UCM) += ib_ucm.o $(user_access-y)
ib_core-y := packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \ ib_core-y := packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
device.o fmr_pool.o cache.o netlink.o \ device.o fmr_pool.o cache.o netlink.o \
roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \ roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
multicast.o mad.o smi.o agent.o mad_rmpp.o \ multicast.o mad.o smi.o agent.o mad_rmpp.o \
nldev.o restrack.o nldev.o restrack.o counters.o
ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
...@@ -29,8 +28,6 @@ rdma_ucm-y := ucma.o ...@@ -29,8 +28,6 @@ rdma_ucm-y := ucma.o
ib_umad-y := user_mad.o ib_umad-y := user_mad.o
ib_ucm-y := ucm.o
ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o \ ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
rdma_core.o uverbs_std_types.o uverbs_ioctl.o \ rdma_core.o uverbs_std_types.o uverbs_ioctl.o \
uverbs_std_types_cq.o \ uverbs_std_types_cq.o \
......
...@@ -337,7 +337,7 @@ static int dst_fetch_ha(const struct dst_entry *dst, ...@@ -337,7 +337,7 @@ static int dst_fetch_ha(const struct dst_entry *dst,
neigh_event_send(n, NULL); neigh_event_send(n, NULL);
ret = -ENODATA; ret = -ENODATA;
} else { } else {
memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN); neigh_ha_snapshot(dev_addr->dst_dev_addr, n, dst->dev);
} }
neigh_release(n); neigh_release(n);
......
...@@ -60,6 +60,7 @@ extern bool ib_devices_shared_netns; ...@@ -60,6 +60,7 @@ extern bool ib_devices_shared_netns;
int ib_device_register_sysfs(struct ib_device *device); int ib_device_register_sysfs(struct ib_device *device);
void ib_device_unregister_sysfs(struct ib_device *device); void ib_device_unregister_sysfs(struct ib_device *device);
int ib_device_rename(struct ib_device *ibdev, const char *name); int ib_device_rename(struct ib_device *ibdev, const char *name);
int ib_device_set_dim(struct ib_device *ibdev, u8 use_dim);
typedef void (*roce_netdev_callback)(struct ib_device *device, u8 port, typedef void (*roce_netdev_callback)(struct ib_device *device, u8 port,
struct net_device *idev, void *cookie); struct net_device *idev, void *cookie);
...@@ -88,6 +89,15 @@ typedef int (*nldev_callback)(struct ib_device *device, ...@@ -88,6 +89,15 @@ typedef int (*nldev_callback)(struct ib_device *device,
int ib_enum_all_devs(nldev_callback nldev_cb, struct sk_buff *skb, int ib_enum_all_devs(nldev_callback nldev_cb, struct sk_buff *skb,
struct netlink_callback *cb); struct netlink_callback *cb);
struct ib_client_nl_info {
struct sk_buff *nl_msg;
struct device *cdev;
unsigned int port;
u64 abi;
};
int ib_get_client_nl_info(struct ib_device *ibdev, const char *client_name,
struct ib_client_nl_info *res);
enum ib_cache_gid_default_mode { enum ib_cache_gid_default_mode {
IB_CACHE_GID_DEFAULT_MODE_SET, IB_CACHE_GID_DEFAULT_MODE_SET,
IB_CACHE_GID_DEFAULT_MODE_DELETE IB_CACHE_GID_DEFAULT_MODE_DELETE
......
This diff is collapsed.
...@@ -18,6 +18,53 @@ ...@@ -18,6 +18,53 @@
#define IB_POLL_FLAGS \ #define IB_POLL_FLAGS \
(IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS) (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)
static const struct dim_cq_moder
rdma_dim_prof[RDMA_DIM_PARAMS_NUM_PROFILES] = {
{1, 0, 1, 0},
{1, 0, 4, 0},
{2, 0, 4, 0},
{2, 0, 8, 0},
{4, 0, 8, 0},
{16, 0, 8, 0},
{16, 0, 16, 0},
{32, 0, 16, 0},
{32, 0, 32, 0},
};
static void ib_cq_rdma_dim_work(struct work_struct *w)
{
struct dim *dim = container_of(w, struct dim, work);
struct ib_cq *cq = dim->priv;
u16 usec = rdma_dim_prof[dim->profile_ix].usec;
u16 comps = rdma_dim_prof[dim->profile_ix].comps;
dim->state = DIM_START_MEASURE;
cq->device->ops.modify_cq(cq, comps, usec);
}
static void rdma_dim_init(struct ib_cq *cq)
{
struct dim *dim;
if (!cq->device->ops.modify_cq || !cq->device->use_cq_dim ||
cq->poll_ctx == IB_POLL_DIRECT)
return;
dim = kzalloc(sizeof(struct dim), GFP_KERNEL);
if (!dim)
return;
dim->state = DIM_START_MEASURE;
dim->tune_state = DIM_GOING_RIGHT;
dim->profile_ix = RDMA_DIM_START_PROFILE;
dim->priv = cq;
cq->dim = dim;
INIT_WORK(&dim->work, ib_cq_rdma_dim_work);
}
static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *wcs, static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *wcs,
int batch) int batch)
{ {
...@@ -78,6 +125,7 @@ static void ib_cq_completion_direct(struct ib_cq *cq, void *private) ...@@ -78,6 +125,7 @@ static void ib_cq_completion_direct(struct ib_cq *cq, void *private)
static int ib_poll_handler(struct irq_poll *iop, int budget) static int ib_poll_handler(struct irq_poll *iop, int budget)
{ {
struct ib_cq *cq = container_of(iop, struct ib_cq, iop); struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
struct dim *dim = cq->dim;
int completed; int completed;
completed = __ib_process_cq(cq, budget, cq->wc, IB_POLL_BATCH); completed = __ib_process_cq(cq, budget, cq->wc, IB_POLL_BATCH);
...@@ -87,6 +135,9 @@ static int ib_poll_handler(struct irq_poll *iop, int budget) ...@@ -87,6 +135,9 @@ static int ib_poll_handler(struct irq_poll *iop, int budget)
irq_poll_sched(&cq->iop); irq_poll_sched(&cq->iop);
} }
if (dim)
rdma_dim(dim, completed);
return completed; return completed;
} }
...@@ -105,6 +156,8 @@ static void ib_cq_poll_work(struct work_struct *work) ...@@ -105,6 +156,8 @@ static void ib_cq_poll_work(struct work_struct *work)
if (completed >= IB_POLL_BUDGET_WORKQUEUE || if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0) ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
queue_work(cq->comp_wq, &cq->work); queue_work(cq->comp_wq, &cq->work);
else if (cq->dim)
rdma_dim(cq->dim, completed);
} }
static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private) static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
...@@ -113,7 +166,7 @@ static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private) ...@@ -113,7 +166,7 @@ static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
} }
/** /**
* __ib_alloc_cq - allocate a completion queue * __ib_alloc_cq_user - allocate a completion queue
* @dev: device to allocate the CQ for * @dev: device to allocate the CQ for
* @private: driver private data, accessible from cq->cq_context * @private: driver private data, accessible from cq->cq_context
* @nr_cqe: number of CQEs to allocate * @nr_cqe: number of CQEs to allocate
...@@ -139,25 +192,30 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private, ...@@ -139,25 +192,30 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private,
struct ib_cq *cq; struct ib_cq *cq;
int ret = -ENOMEM; int ret = -ENOMEM;
cq = dev->ops.create_cq(dev, &cq_attr, NULL); cq = rdma_zalloc_drv_obj(dev, ib_cq);
if (IS_ERR(cq)) if (!cq)
return cq; return ERR_PTR(ret);
cq->device = dev; cq->device = dev;
cq->uobject = NULL;
cq->event_handler = NULL;
cq->cq_context = private; cq->cq_context = private;
cq->poll_ctx = poll_ctx; cq->poll_ctx = poll_ctx;
atomic_set(&cq->usecnt, 0); atomic_set(&cq->usecnt, 0);
cq->wc = kmalloc_array(IB_POLL_BATCH, sizeof(*cq->wc), GFP_KERNEL); cq->wc = kmalloc_array(IB_POLL_BATCH, sizeof(*cq->wc), GFP_KERNEL);
if (!cq->wc) if (!cq->wc)
goto out_destroy_cq; goto out_free_cq;
cq->res.type = RDMA_RESTRACK_CQ; cq->res.type = RDMA_RESTRACK_CQ;
rdma_restrack_set_task(&cq->res, caller); rdma_restrack_set_task(&cq->res, caller);
ret = dev->ops.create_cq(cq, &cq_attr, NULL);
if (ret)
goto out_free_wc;
rdma_restrack_kadd(&cq->res); rdma_restrack_kadd(&cq->res);
rdma_dim_init(cq);
switch (cq->poll_ctx) { switch (cq->poll_ctx) {
case IB_POLL_DIRECT: case IB_POLL_DIRECT:
cq->comp_handler = ib_cq_completion_direct; cq->comp_handler = ib_cq_completion_direct;
...@@ -178,29 +236,29 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private, ...@@ -178,29 +236,29 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private,
break; break;
default: default:
ret = -EINVAL; ret = -EINVAL;
goto out_free_wc; goto out_destroy_cq;
} }
return cq; return cq;
out_free_wc:
kfree(cq->wc);
rdma_restrack_del(&cq->res);
out_destroy_cq: out_destroy_cq:
rdma_restrack_del(&cq->res);
cq->device->ops.destroy_cq(cq, udata); cq->device->ops.destroy_cq(cq, udata);
out_free_wc:
kfree(cq->wc);
out_free_cq:
kfree(cq);
return ERR_PTR(ret); return ERR_PTR(ret);
} }
EXPORT_SYMBOL(__ib_alloc_cq_user); EXPORT_SYMBOL(__ib_alloc_cq_user);
/** /**
* ib_free_cq - free a completion queue * ib_free_cq_user - free a completion queue
* @cq: completion queue to free. * @cq: completion queue to free.
* @udata: User data or NULL for kernel object * @udata: User data or NULL for kernel object
*/ */
void ib_free_cq_user(struct ib_cq *cq, struct ib_udata *udata) void ib_free_cq_user(struct ib_cq *cq, struct ib_udata *udata)
{ {
int ret;
if (WARN_ON_ONCE(atomic_read(&cq->usecnt))) if (WARN_ON_ONCE(atomic_read(&cq->usecnt)))
return; return;
...@@ -218,9 +276,12 @@ void ib_free_cq_user(struct ib_cq *cq, struct ib_udata *udata) ...@@ -218,9 +276,12 @@ void ib_free_cq_user(struct ib_cq *cq, struct ib_udata *udata)
WARN_ON_ONCE(1); WARN_ON_ONCE(1);
} }
kfree(cq->wc);
rdma_restrack_del(&cq->res); rdma_restrack_del(&cq->res);
ret = cq->device->ops.destroy_cq(cq, udata); cq->device->ops.destroy_cq(cq, udata);
WARN_ON_ONCE(ret); if (cq->dim)
cancel_work_sync(&cq->dim->work);
kfree(cq->dim);
kfree(cq->wc);
kfree(cq);
} }
EXPORT_SYMBOL(ib_free_cq_user); EXPORT_SYMBOL(ib_free_cq_user);
...@@ -46,6 +46,7 @@ ...@@ -46,6 +46,7 @@
#include <rdma/rdma_netlink.h> #include <rdma/rdma_netlink.h>
#include <rdma/ib_addr.h> #include <rdma/ib_addr.h>
#include <rdma/ib_cache.h> #include <rdma/ib_cache.h>
#include <rdma/rdma_counter.h>
#include "core_priv.h" #include "core_priv.h"
#include "restrack.h" #include "restrack.h"
...@@ -270,7 +271,7 @@ struct ib_port_data_rcu { ...@@ -270,7 +271,7 @@ struct ib_port_data_rcu {
struct ib_port_data pdata[]; struct ib_port_data pdata[];
}; };
static int ib_device_check_mandatory(struct ib_device *device) static void ib_device_check_mandatory(struct ib_device *device)
{ {
#define IB_MANDATORY_FUNC(x) { offsetof(struct ib_device_ops, x), #x } #define IB_MANDATORY_FUNC(x) { offsetof(struct ib_device_ops, x), #x }
static const struct { static const struct {
...@@ -305,8 +306,6 @@ static int ib_device_check_mandatory(struct ib_device *device) ...@@ -305,8 +306,6 @@ static int ib_device_check_mandatory(struct ib_device *device)
break; break;
} }
} }
return 0;
} }
/* /*
...@@ -375,7 +374,7 @@ struct ib_device *ib_device_get_by_name(const char *name, ...@@ -375,7 +374,7 @@ struct ib_device *ib_device_get_by_name(const char *name,
down_read(&devices_rwsem); down_read(&devices_rwsem);
device = __ib_device_get_by_name(name); device = __ib_device_get_by_name(name);
if (device && driver_id != RDMA_DRIVER_UNKNOWN && if (device && driver_id != RDMA_DRIVER_UNKNOWN &&
device->driver_id != driver_id) device->ops.driver_id != driver_id)
device = NULL; device = NULL;
if (device) { if (device) {
...@@ -449,6 +448,15 @@ int ib_device_rename(struct ib_device *ibdev, const char *name) ...@@ -449,6 +448,15 @@ int ib_device_rename(struct ib_device *ibdev, const char *name)
return 0; return 0;
} }
int ib_device_set_dim(struct ib_device *ibdev, u8 use_dim)
{
if (use_dim > 1)
return -EINVAL;
ibdev->use_cq_dim = use_dim;
return 0;
}
static int alloc_name(struct ib_device *ibdev, const char *name) static int alloc_name(struct ib_device *ibdev, const char *name)
{ {
struct ib_device *device; struct ib_device *device;
...@@ -494,10 +502,12 @@ static void ib_device_release(struct device *device) ...@@ -494,10 +502,12 @@ static void ib_device_release(struct device *device)
if (dev->port_data) { if (dev->port_data) {
ib_cache_release_one(dev); ib_cache_release_one(dev);
ib_security_release_port_pkey_list(dev); ib_security_release_port_pkey_list(dev);
rdma_counter_release(dev);
kfree_rcu(container_of(dev->port_data, struct ib_port_data_rcu, kfree_rcu(container_of(dev->port_data, struct ib_port_data_rcu,
pdata[0]), pdata[0]),
rcu_head); rcu_head);
} }
xa_destroy(&dev->compat_devs); xa_destroy(&dev->compat_devs);
xa_destroy(&dev->client_data); xa_destroy(&dev->client_data);
kfree_rcu(dev, rcu_head); kfree_rcu(dev, rcu_head);
...@@ -1193,10 +1203,7 @@ static int setup_device(struct ib_device *device) ...@@ -1193,10 +1203,7 @@ static int setup_device(struct ib_device *device)
int ret; int ret;
setup_dma_device(device); setup_dma_device(device);
ib_device_check_mandatory(device);
ret = ib_device_check_mandatory(device);
if (ret)
return ret;
ret = setup_port_data(device); ret = setup_port_data(device);
if (ret) { if (ret) {
...@@ -1321,6 +1328,8 @@ int ib_register_device(struct ib_device *device, const char *name) ...@@ -1321,6 +1328,8 @@ int ib_register_device(struct ib_device *device, const char *name)
ib_device_register_rdmacg(device); ib_device_register_rdmacg(device);
rdma_counter_init(device);
/* /*
* Ensure that ADD uevent is not fired because it * Ensure that ADD uevent is not fired because it
* is too early amd device is not initialized yet. * is too early amd device is not initialized yet.
...@@ -1479,7 +1488,7 @@ void ib_unregister_driver(enum rdma_driver_id driver_id) ...@@ -1479,7 +1488,7 @@ void ib_unregister_driver(enum rdma_driver_id driver_id)
down_read(&devices_rwsem); down_read(&devices_rwsem);
xa_for_each (&devices, index, ib_dev) { xa_for_each (&devices, index, ib_dev) {
if (ib_dev->driver_id != driver_id) if (ib_dev->ops.driver_id != driver_id)
continue; continue;
get_device(&ib_dev->dev); get_device(&ib_dev->dev);
...@@ -1749,6 +1758,104 @@ void ib_unregister_client(struct ib_client *client) ...@@ -1749,6 +1758,104 @@ void ib_unregister_client(struct ib_client *client)
} }
EXPORT_SYMBOL(ib_unregister_client); EXPORT_SYMBOL(ib_unregister_client);
static int __ib_get_global_client_nl_info(const char *client_name,
struct ib_client_nl_info *res)
{
struct ib_client *client;
unsigned long index;
int ret = -ENOENT;
down_read(&clients_rwsem);
xa_for_each_marked (&clients, index, client, CLIENT_REGISTERED) {
if (strcmp(client->name, client_name) != 0)
continue;
if (!client->get_global_nl_info) {
ret = -EOPNOTSUPP;
break;
}
ret = client->get_global_nl_info(res);
if (WARN_ON(ret == -ENOENT))
ret = -EINVAL;
if (!ret && res->cdev)
get_device(res->cdev);
break;
}
up_read(&clients_rwsem);
return ret;
}
static int __ib_get_client_nl_info(struct ib_device *ibdev,
const char *client_name,
struct ib_client_nl_info *res)
{
unsigned long index;
void *client_data;
int ret = -ENOENT;
down_read(&ibdev->client_data_rwsem);
xan_for_each_marked (&ibdev->client_data, index, client_data,
CLIENT_DATA_REGISTERED) {
struct ib_client *client = xa_load(&clients, index);
if (!client || strcmp(client->name, client_name) != 0)
continue;
if (!client->get_nl_info) {
ret = -EOPNOTSUPP;
break;
}
ret = client->get_nl_info(ibdev, client_data, res);
if (WARN_ON(ret == -ENOENT))
ret = -EINVAL;
/*
* The cdev is guaranteed valid as long as we are inside the
* client_data_rwsem as remove_one can't be called. Keep it
* valid for the caller.
*/
if (!ret && res->cdev)
get_device(res->cdev);
break;
}
up_read(&ibdev->client_data_rwsem);
return ret;
}
/**
* ib_get_client_nl_info - Fetch the nl_info from a client
* @device - IB device
* @client_name - Name of the client
* @res - Result of the query
*/
int ib_get_client_nl_info(struct ib_device *ibdev, const char *client_name,
struct ib_client_nl_info *res)
{
int ret;
if (ibdev)
ret = __ib_get_client_nl_info(ibdev, client_name, res);
else
ret = __ib_get_global_client_nl_info(client_name, res);
#ifdef CONFIG_MODULES
if (ret == -ENOENT) {
request_module("rdma-client-%s", client_name);
if (ibdev)
ret = __ib_get_client_nl_info(ibdev, client_name, res);
else
ret = __ib_get_global_client_nl_info(client_name, res);
}
#endif
if (ret) {
if (ret == -ENOENT)
return -EOPNOTSUPP;
return ret;
}
if (WARN_ON(!res->cdev))
return -EINVAL;
return 0;
}
/** /**
* ib_set_client_data - Set IB client context * ib_set_client_data - Set IB client context
* @device:Device to set context for * @device:Device to set context for
...@@ -2039,7 +2146,7 @@ struct ib_device *ib_device_get_by_netdev(struct net_device *ndev, ...@@ -2039,7 +2146,7 @@ struct ib_device *ib_device_get_by_netdev(struct net_device *ndev,
(uintptr_t)ndev) { (uintptr_t)ndev) {
if (rcu_access_pointer(cur->netdev) == ndev && if (rcu_access_pointer(cur->netdev) == ndev &&
(driver_id == RDMA_DRIVER_UNKNOWN || (driver_id == RDMA_DRIVER_UNKNOWN ||
cur->ib_dev->driver_id == driver_id) && cur->ib_dev->ops.driver_id == driver_id) &&
ib_device_try_get(cur->ib_dev)) { ib_device_try_get(cur->ib_dev)) {
res = cur->ib_dev; res = cur->ib_dev;
break; break;
...@@ -2344,12 +2451,28 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops) ...@@ -2344,12 +2451,28 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
#define SET_OBJ_SIZE(ptr, name) SET_DEVICE_OP(ptr, size_##name) #define SET_OBJ_SIZE(ptr, name) SET_DEVICE_OP(ptr, size_##name)
if (ops->driver_id != RDMA_DRIVER_UNKNOWN) {
WARN_ON(dev_ops->driver_id != RDMA_DRIVER_UNKNOWN &&
dev_ops->driver_id != ops->driver_id);
dev_ops->driver_id = ops->driver_id;
}
if (ops->owner) {
WARN_ON(dev_ops->owner && dev_ops->owner != ops->owner);
dev_ops->owner = ops->owner;
}
if (ops->uverbs_abi_ver)
dev_ops->uverbs_abi_ver = ops->uverbs_abi_ver;
dev_ops->uverbs_no_driver_id_binding |=
ops->uverbs_no_driver_id_binding;
SET_DEVICE_OP(dev_ops, add_gid); SET_DEVICE_OP(dev_ops, add_gid);
SET_DEVICE_OP(dev_ops, advise_mr); SET_DEVICE_OP(dev_ops, advise_mr);
SET_DEVICE_OP(dev_ops, alloc_dm); SET_DEVICE_OP(dev_ops, alloc_dm);
SET_DEVICE_OP(dev_ops, alloc_fmr); SET_DEVICE_OP(dev_ops, alloc_fmr);
SET_DEVICE_OP(dev_ops, alloc_hw_stats); SET_DEVICE_OP(dev_ops, alloc_hw_stats);
SET_DEVICE_OP(dev_ops, alloc_mr); SET_DEVICE_OP(dev_ops, alloc_mr);
SET_DEVICE_OP(dev_ops, alloc_mr_integrity);
SET_DEVICE_OP(dev_ops, alloc_mw); SET_DEVICE_OP(dev_ops, alloc_mw);
SET_DEVICE_OP(dev_ops, alloc_pd); SET_DEVICE_OP(dev_ops, alloc_pd);
SET_DEVICE_OP(dev_ops, alloc_rdma_netdev); SET_DEVICE_OP(dev_ops, alloc_rdma_netdev);
...@@ -2357,6 +2480,11 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops) ...@@ -2357,6 +2480,11 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, alloc_xrcd); SET_DEVICE_OP(dev_ops, alloc_xrcd);
SET_DEVICE_OP(dev_ops, attach_mcast); SET_DEVICE_OP(dev_ops, attach_mcast);
SET_DEVICE_OP(dev_ops, check_mr_status); SET_DEVICE_OP(dev_ops, check_mr_status);
SET_DEVICE_OP(dev_ops, counter_alloc_stats);
SET_DEVICE_OP(dev_ops, counter_bind_qp);
SET_DEVICE_OP(dev_ops, counter_dealloc);
SET_DEVICE_OP(dev_ops, counter_unbind_qp);
SET_DEVICE_OP(dev_ops, counter_update_stats);
SET_DEVICE_OP(dev_ops, create_ah); SET_DEVICE_OP(dev_ops, create_ah);
SET_DEVICE_OP(dev_ops, create_counters); SET_DEVICE_OP(dev_ops, create_counters);
SET_DEVICE_OP(dev_ops, create_cq); SET_DEVICE_OP(dev_ops, create_cq);
...@@ -2409,6 +2537,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops) ...@@ -2409,6 +2537,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, iw_reject); SET_DEVICE_OP(dev_ops, iw_reject);
SET_DEVICE_OP(dev_ops, iw_rem_ref); SET_DEVICE_OP(dev_ops, iw_rem_ref);
SET_DEVICE_OP(dev_ops, map_mr_sg); SET_DEVICE_OP(dev_ops, map_mr_sg);
SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
SET_DEVICE_OP(dev_ops, map_phys_fmr); SET_DEVICE_OP(dev_ops, map_phys_fmr);
SET_DEVICE_OP(dev_ops, mmap); SET_DEVICE_OP(dev_ops, mmap);
SET_DEVICE_OP(dev_ops, modify_ah); SET_DEVICE_OP(dev_ops, modify_ah);
...@@ -2445,6 +2574,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops) ...@@ -2445,6 +2574,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, unmap_fmr); SET_DEVICE_OP(dev_ops, unmap_fmr);
SET_OBJ_SIZE(dev_ops, ib_ah); SET_OBJ_SIZE(dev_ops, ib_ah);
SET_OBJ_SIZE(dev_ops, ib_cq);
SET_OBJ_SIZE(dev_ops, ib_pd); SET_OBJ_SIZE(dev_ops, ib_pd);
SET_OBJ_SIZE(dev_ops, ib_srq); SET_OBJ_SIZE(dev_ops, ib_srq);
SET_OBJ_SIZE(dev_ops, ib_ucontext); SET_OBJ_SIZE(dev_ops, ib_ucontext);
......
...@@ -34,13 +34,17 @@ void ib_mr_pool_put(struct ib_qp *qp, struct list_head *list, struct ib_mr *mr) ...@@ -34,13 +34,17 @@ void ib_mr_pool_put(struct ib_qp *qp, struct list_head *list, struct ib_mr *mr)
EXPORT_SYMBOL(ib_mr_pool_put); EXPORT_SYMBOL(ib_mr_pool_put);
int ib_mr_pool_init(struct ib_qp *qp, struct list_head *list, int nr, int ib_mr_pool_init(struct ib_qp *qp, struct list_head *list, int nr,
enum ib_mr_type type, u32 max_num_sg) enum ib_mr_type type, u32 max_num_sg, u32 max_num_meta_sg)
{ {
struct ib_mr *mr; struct ib_mr *mr;
unsigned long flags; unsigned long flags;
int ret, i; int ret, i;
for (i = 0; i < nr; i++) { for (i = 0; i < nr; i++) {
if (type == IB_MR_TYPE_INTEGRITY)
mr = ib_alloc_mr_integrity(qp->pd, max_num_sg,
max_num_meta_sg);
else
mr = ib_alloc_mr(qp->pd, type, max_num_sg); mr = ib_alloc_mr(qp->pd, type, max_num_sg);
if (IS_ERR(mr)) { if (IS_ERR(mr)) {
ret = PTR_ERR(mr); ret = PTR_ERR(mr);
......
This diff is collapsed.
...@@ -6,6 +6,7 @@ ...@@ -6,6 +6,7 @@
#include <rdma/rdma_cm.h> #include <rdma/rdma_cm.h>
#include <rdma/ib_verbs.h> #include <rdma/ib_verbs.h>
#include <rdma/restrack.h> #include <rdma/restrack.h>
#include <rdma/rdma_counter.h>
#include <linux/mutex.h> #include <linux/mutex.h>
#include <linux/sched/task.h> #include <linux/sched/task.h>
#include <linux/pid_namespace.h> #include <linux/pid_namespace.h>
...@@ -45,6 +46,7 @@ static const char *type2str(enum rdma_restrack_type type) ...@@ -45,6 +46,7 @@ static const char *type2str(enum rdma_restrack_type type)
[RDMA_RESTRACK_CM_ID] = "CM_ID", [RDMA_RESTRACK_CM_ID] = "CM_ID",
[RDMA_RESTRACK_MR] = "MR", [RDMA_RESTRACK_MR] = "MR",
[RDMA_RESTRACK_CTX] = "CTX", [RDMA_RESTRACK_CTX] = "CTX",
[RDMA_RESTRACK_COUNTER] = "COUNTER",
}; };
return names[type]; return names[type];
...@@ -169,6 +171,8 @@ static struct ib_device *res_to_dev(struct rdma_restrack_entry *res) ...@@ -169,6 +171,8 @@ static struct ib_device *res_to_dev(struct rdma_restrack_entry *res)
return container_of(res, struct ib_mr, res)->device; return container_of(res, struct ib_mr, res)->device;
case RDMA_RESTRACK_CTX: case RDMA_RESTRACK_CTX:
return container_of(res, struct ib_ucontext, res)->device; return container_of(res, struct ib_ucontext, res)->device;
case RDMA_RESTRACK_COUNTER:
return container_of(res, struct rdma_counter, res)->device;
default: default:
WARN_ONCE(true, "Wrong resource tracking type %u\n", res->type); WARN_ONCE(true, "Wrong resource tracking type %u\n", res->type);
return NULL; return NULL;
...@@ -190,6 +194,20 @@ void rdma_restrack_set_task(struct rdma_restrack_entry *res, ...@@ -190,6 +194,20 @@ void rdma_restrack_set_task(struct rdma_restrack_entry *res,
} }
EXPORT_SYMBOL(rdma_restrack_set_task); EXPORT_SYMBOL(rdma_restrack_set_task);
/**
* rdma_restrack_attach_task() - attach the task onto this resource
* @res: resource entry
* @task: the task to attach, the current task will be used if it is NULL.
*/
void rdma_restrack_attach_task(struct rdma_restrack_entry *res,
struct task_struct *task)
{
if (res->task)
put_task_struct(res->task);
get_task_struct(task);
res->task = task;
}
static void rdma_restrack_add(struct rdma_restrack_entry *res) static void rdma_restrack_add(struct rdma_restrack_entry *res)
{ {
struct ib_device *dev = res_to_dev(res); struct ib_device *dev = res_to_dev(res);
...@@ -203,15 +221,22 @@ static void rdma_restrack_add(struct rdma_restrack_entry *res) ...@@ -203,15 +221,22 @@ static void rdma_restrack_add(struct rdma_restrack_entry *res)
kref_init(&res->kref); kref_init(&res->kref);
init_completion(&res->comp); init_completion(&res->comp);
if (res->type != RDMA_RESTRACK_QP) if (res->type == RDMA_RESTRACK_QP) {
ret = xa_alloc_cyclic(&rt->xa, &res->id, res, xa_limit_32b,
&rt->next_id, GFP_KERNEL);
else {
/* Special case to ensure that LQPN points to right QP */ /* Special case to ensure that LQPN points to right QP */
struct ib_qp *qp = container_of(res, struct ib_qp, res); struct ib_qp *qp = container_of(res, struct ib_qp, res);
ret = xa_insert(&rt->xa, qp->qp_num, res, GFP_KERNEL); ret = xa_insert(&rt->xa, qp->qp_num, res, GFP_KERNEL);
res->id = ret ? 0 : qp->qp_num; res->id = ret ? 0 : qp->qp_num;
} else if (res->type == RDMA_RESTRACK_COUNTER) {
/* Special case to ensure that cntn points to right counter */
struct rdma_counter *counter;
counter = container_of(res, struct rdma_counter, res);
ret = xa_insert(&rt->xa, counter->id, res, GFP_KERNEL);
res->id = ret ? 0 : counter->id;
} else {
ret = xa_alloc_cyclic(&rt->xa, &res->id, res, xa_limit_32b,
&rt->next_id, GFP_KERNEL);
} }
if (!ret) if (!ret)
...@@ -237,7 +262,8 @@ EXPORT_SYMBOL(rdma_restrack_kadd); ...@@ -237,7 +262,8 @@ EXPORT_SYMBOL(rdma_restrack_kadd);
*/ */
void rdma_restrack_uadd(struct rdma_restrack_entry *res) void rdma_restrack_uadd(struct rdma_restrack_entry *res)
{ {
if (res->type != RDMA_RESTRACK_CM_ID) if ((res->type != RDMA_RESTRACK_CM_ID) &&
(res->type != RDMA_RESTRACK_COUNTER))
res->task = NULL; res->task = NULL;
if (!res->task) if (!res->task)
...@@ -323,3 +349,16 @@ void rdma_restrack_del(struct rdma_restrack_entry *res) ...@@ -323,3 +349,16 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
} }
} }
EXPORT_SYMBOL(rdma_restrack_del); EXPORT_SYMBOL(rdma_restrack_del);
bool rdma_is_visible_in_pid_ns(struct rdma_restrack_entry *res)
{
/*
* 1. Kern resources should be visible in init
* namespace only
* 2. Present only resources visible in the current
* namespace
*/
if (rdma_is_kernel_res(res))
return task_active_pid_ns(current) == &init_pid_ns;
return task_active_pid_ns(current) == task_active_pid_ns(res->task);
}
...@@ -25,4 +25,7 @@ struct rdma_restrack_root { ...@@ -25,4 +25,7 @@ struct rdma_restrack_root {
int rdma_restrack_init(struct ib_device *dev); int rdma_restrack_init(struct ib_device *dev);
void rdma_restrack_clean(struct ib_device *dev); void rdma_restrack_clean(struct ib_device *dev);
void rdma_restrack_attach_task(struct rdma_restrack_entry *res,
struct task_struct *task);
bool rdma_is_visible_in_pid_ns(struct rdma_restrack_entry *res);
#endif /* _RDMA_CORE_RESTRACK_H_ */ #endif /* _RDMA_CORE_RESTRACK_H_ */
This diff is collapsed.
...@@ -43,6 +43,7 @@ ...@@ -43,6 +43,7 @@
#include <rdma/ib_mad.h> #include <rdma/ib_mad.h>
#include <rdma/ib_pma.h> #include <rdma/ib_pma.h>
#include <rdma/ib_cache.h> #include <rdma/ib_cache.h>
#include <rdma/rdma_counter.h>
struct ib_port; struct ib_port;
...@@ -800,9 +801,12 @@ static int update_hw_stats(struct ib_device *dev, struct rdma_hw_stats *stats, ...@@ -800,9 +801,12 @@ static int update_hw_stats(struct ib_device *dev, struct rdma_hw_stats *stats,
return 0; return 0;
} }
static ssize_t print_hw_stat(struct rdma_hw_stats *stats, int index, char *buf) static ssize_t print_hw_stat(struct ib_device *dev, int port_num,
struct rdma_hw_stats *stats, int index, char *buf)
{ {
return sprintf(buf, "%llu\n", stats->value[index]); u64 v = rdma_counter_get_hwstat_value(dev, port_num, index);
return sprintf(buf, "%llu\n", stats->value[index] + v);
} }
static ssize_t show_hw_stats(struct kobject *kobj, struct attribute *attr, static ssize_t show_hw_stats(struct kobject *kobj, struct attribute *attr,
...@@ -828,7 +832,7 @@ static ssize_t show_hw_stats(struct kobject *kobj, struct attribute *attr, ...@@ -828,7 +832,7 @@ static ssize_t show_hw_stats(struct kobject *kobj, struct attribute *attr,
ret = update_hw_stats(dev, stats, hsa->port_num, hsa->index); ret = update_hw_stats(dev, stats, hsa->port_num, hsa->index);
if (ret) if (ret)
goto unlock; goto unlock;
ret = print_hw_stat(stats, hsa->index, buf); ret = print_hw_stat(dev, hsa->port_num, stats, hsa->index, buf);
unlock: unlock:
mutex_unlock(&stats->lock); mutex_unlock(&stats->lock);
...@@ -999,6 +1003,8 @@ static void setup_hw_stats(struct ib_device *device, struct ib_port *port, ...@@ -999,6 +1003,8 @@ static void setup_hw_stats(struct ib_device *device, struct ib_port *port,
goto err; goto err;
port->hw_stats_ag = hsag; port->hw_stats_ag = hsag;
port->hw_stats = stats; port->hw_stats = stats;
if (device->port_data)
device->port_data[port_num].hw_stats = stats;
} else { } else {
struct kobject *kobj = &device->dev.kobj; struct kobject *kobj = &device->dev.kobj;
ret = sysfs_create_group(kobj, hsag); ret = sysfs_create_group(kobj, hsag);
...@@ -1289,6 +1295,8 @@ const struct attribute_group ib_dev_attr_group = { ...@@ -1289,6 +1295,8 @@ const struct attribute_group ib_dev_attr_group = {
void ib_free_port_attrs(struct ib_core_device *coredev) void ib_free_port_attrs(struct ib_core_device *coredev)
{ {
struct ib_device *device = rdma_device_to_ibdev(&coredev->dev);
bool is_full_dev = &device->coredev == coredev;
struct kobject *p, *t; struct kobject *p, *t;
list_for_each_entry_safe(p, t, &coredev->port_list, entry) { list_for_each_entry_safe(p, t, &coredev->port_list, entry) {
...@@ -1298,6 +1306,8 @@ void ib_free_port_attrs(struct ib_core_device *coredev) ...@@ -1298,6 +1306,8 @@ void ib_free_port_attrs(struct ib_core_device *coredev)
if (port->hw_stats_ag) if (port->hw_stats_ag)
free_hsag(&port->kobj, port->hw_stats_ag); free_hsag(&port->kobj, port->hw_stats_ag);
kfree(port->hw_stats); kfree(port->hw_stats);
if (device->port_data && is_full_dev)
device->port_data[port->port_num].hw_stats = NULL;
if (port->pma_table) if (port->pma_table)
sysfs_remove_group(p, port->pma_table); sysfs_remove_group(p, port->pma_table);
......
This diff is collapsed.
...@@ -52,6 +52,8 @@ ...@@ -52,6 +52,8 @@
#include <rdma/rdma_cm_ib.h> #include <rdma/rdma_cm_ib.h>
#include <rdma/ib_addr.h> #include <rdma/ib_addr.h>
#include <rdma/ib.h> #include <rdma/ib.h>
#include <rdma/rdma_netlink.h>
#include "core_priv.h"
MODULE_AUTHOR("Sean Hefty"); MODULE_AUTHOR("Sean Hefty");
MODULE_DESCRIPTION("RDMA Userspace Connection Manager Access"); MODULE_DESCRIPTION("RDMA Userspace Connection Manager Access");
...@@ -81,7 +83,7 @@ struct ucma_file { ...@@ -81,7 +83,7 @@ struct ucma_file {
}; };
struct ucma_context { struct ucma_context {
int id; u32 id;
struct completion comp; struct completion comp;
atomic_t ref; atomic_t ref;
int events_reported; int events_reported;
...@@ -94,7 +96,7 @@ struct ucma_context { ...@@ -94,7 +96,7 @@ struct ucma_context {
struct list_head list; struct list_head list;
struct list_head mc_list; struct list_head mc_list;
/* mark that device is in process of destroying the internal HW /* mark that device is in process of destroying the internal HW
* resources, protected by the global mut * resources, protected by the ctx_table lock
*/ */
int closing; int closing;
/* sync between removal event and id destroy, protected by file mut */ /* sync between removal event and id destroy, protected by file mut */
...@@ -104,7 +106,7 @@ struct ucma_context { ...@@ -104,7 +106,7 @@ struct ucma_context {
struct ucma_multicast { struct ucma_multicast {
struct ucma_context *ctx; struct ucma_context *ctx;
int id; u32 id;
int events_reported; int events_reported;
u64 uid; u64 uid;
...@@ -122,9 +124,8 @@ struct ucma_event { ...@@ -122,9 +124,8 @@ struct ucma_event {
struct work_struct close_work; struct work_struct close_work;
}; };
static DEFINE_MUTEX(mut); static DEFINE_XARRAY_ALLOC(ctx_table);
static DEFINE_IDR(ctx_idr); static DEFINE_XARRAY_ALLOC(multicast_table);
static DEFINE_IDR(multicast_idr);
static const struct file_operations ucma_fops; static const struct file_operations ucma_fops;
...@@ -133,7 +134,7 @@ static inline struct ucma_context *_ucma_find_context(int id, ...@@ -133,7 +134,7 @@ static inline struct ucma_context *_ucma_find_context(int id,
{ {
struct ucma_context *ctx; struct ucma_context *ctx;
ctx = idr_find(&ctx_idr, id); ctx = xa_load(&ctx_table, id);
if (!ctx) if (!ctx)
ctx = ERR_PTR(-ENOENT); ctx = ERR_PTR(-ENOENT);
else if (ctx->file != file || !ctx->cm_id) else if (ctx->file != file || !ctx->cm_id)
...@@ -145,7 +146,7 @@ static struct ucma_context *ucma_get_ctx(struct ucma_file *file, int id) ...@@ -145,7 +146,7 @@ static struct ucma_context *ucma_get_ctx(struct ucma_file *file, int id)
{ {
struct ucma_context *ctx; struct ucma_context *ctx;
mutex_lock(&mut); xa_lock(&ctx_table);
ctx = _ucma_find_context(id, file); ctx = _ucma_find_context(id, file);
if (!IS_ERR(ctx)) { if (!IS_ERR(ctx)) {
if (ctx->closing) if (ctx->closing)
...@@ -153,7 +154,7 @@ static struct ucma_context *ucma_get_ctx(struct ucma_file *file, int id) ...@@ -153,7 +154,7 @@ static struct ucma_context *ucma_get_ctx(struct ucma_file *file, int id)
else else
atomic_inc(&ctx->ref); atomic_inc(&ctx->ref);
} }
mutex_unlock(&mut); xa_unlock(&ctx_table);
return ctx; return ctx;
} }
...@@ -216,10 +217,7 @@ static struct ucma_context *ucma_alloc_ctx(struct ucma_file *file) ...@@ -216,10 +217,7 @@ static struct ucma_context *ucma_alloc_ctx(struct ucma_file *file)
INIT_LIST_HEAD(&ctx->mc_list); INIT_LIST_HEAD(&ctx->mc_list);
ctx->file = file; ctx->file = file;
mutex_lock(&mut); if (xa_alloc(&ctx_table, &ctx->id, ctx, xa_limit_32b, GFP_KERNEL))
ctx->id = idr_alloc(&ctx_idr, ctx, 0, 0, GFP_KERNEL);
mutex_unlock(&mut);
if (ctx->id < 0)
goto error; goto error;
list_add_tail(&ctx->list, &file->ctx_list); list_add_tail(&ctx->list, &file->ctx_list);
...@@ -238,13 +236,10 @@ static struct ucma_multicast* ucma_alloc_multicast(struct ucma_context *ctx) ...@@ -238,13 +236,10 @@ static struct ucma_multicast* ucma_alloc_multicast(struct ucma_context *ctx)
if (!mc) if (!mc)
return NULL; return NULL;
mutex_lock(&mut); mc->ctx = ctx;
mc->id = idr_alloc(&multicast_idr, NULL, 0, 0, GFP_KERNEL); if (xa_alloc(&multicast_table, &mc->id, NULL, xa_limit_32b, GFP_KERNEL))
mutex_unlock(&mut);
if (mc->id < 0)
goto error; goto error;
mc->ctx = ctx;
list_add_tail(&mc->list, &ctx->mc_list); list_add_tail(&mc->list, &ctx->mc_list);
return mc; return mc;
...@@ -319,9 +314,9 @@ static void ucma_removal_event_handler(struct rdma_cm_id *cm_id) ...@@ -319,9 +314,9 @@ static void ucma_removal_event_handler(struct rdma_cm_id *cm_id)
* handled separately below. * handled separately below.
*/ */
if (ctx->cm_id == cm_id) { if (ctx->cm_id == cm_id) {
mutex_lock(&mut); xa_lock(&ctx_table);
ctx->closing = 1; ctx->closing = 1;
mutex_unlock(&mut); xa_unlock(&ctx_table);
queue_work(ctx->file->close_wq, &ctx->close_work); queue_work(ctx->file->close_wq, &ctx->close_work);
return; return;
} }
...@@ -523,9 +518,7 @@ static ssize_t ucma_create_id(struct ucma_file *file, const char __user *inbuf, ...@@ -523,9 +518,7 @@ static ssize_t ucma_create_id(struct ucma_file *file, const char __user *inbuf,
err2: err2:
rdma_destroy_id(cm_id); rdma_destroy_id(cm_id);
err1: err1:
mutex_lock(&mut); xa_erase(&ctx_table, ctx->id);
idr_remove(&ctx_idr, ctx->id);
mutex_unlock(&mut);
mutex_lock(&file->mut); mutex_lock(&file->mut);
list_del(&ctx->list); list_del(&ctx->list);
mutex_unlock(&file->mut); mutex_unlock(&file->mut);
...@@ -537,13 +530,13 @@ static void ucma_cleanup_multicast(struct ucma_context *ctx) ...@@ -537,13 +530,13 @@ static void ucma_cleanup_multicast(struct ucma_context *ctx)
{ {
struct ucma_multicast *mc, *tmp; struct ucma_multicast *mc, *tmp;
mutex_lock(&mut); mutex_lock(&ctx->file->mut);
list_for_each_entry_safe(mc, tmp, &ctx->mc_list, list) { list_for_each_entry_safe(mc, tmp, &ctx->mc_list, list) {
list_del(&mc->list); list_del(&mc->list);
idr_remove(&multicast_idr, mc->id); xa_erase(&multicast_table, mc->id);
kfree(mc); kfree(mc);
} }
mutex_unlock(&mut); mutex_unlock(&ctx->file->mut);
} }
static void ucma_cleanup_mc_events(struct ucma_multicast *mc) static void ucma_cleanup_mc_events(struct ucma_multicast *mc)
...@@ -614,11 +607,11 @@ static ssize_t ucma_destroy_id(struct ucma_file *file, const char __user *inbuf, ...@@ -614,11 +607,11 @@ static ssize_t ucma_destroy_id(struct ucma_file *file, const char __user *inbuf,
if (copy_from_user(&cmd, inbuf, sizeof(cmd))) if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
return -EFAULT; return -EFAULT;
mutex_lock(&mut); xa_lock(&ctx_table);
ctx = _ucma_find_context(cmd.id, file); ctx = _ucma_find_context(cmd.id, file);
if (!IS_ERR(ctx)) if (!IS_ERR(ctx))
idr_remove(&ctx_idr, ctx->id); __xa_erase(&ctx_table, ctx->id);
mutex_unlock(&mut); xa_unlock(&ctx_table);
if (IS_ERR(ctx)) if (IS_ERR(ctx))
return PTR_ERR(ctx); return PTR_ERR(ctx);
...@@ -630,14 +623,14 @@ static ssize_t ucma_destroy_id(struct ucma_file *file, const char __user *inbuf, ...@@ -630,14 +623,14 @@ static ssize_t ucma_destroy_id(struct ucma_file *file, const char __user *inbuf,
flush_workqueue(ctx->file->close_wq); flush_workqueue(ctx->file->close_wq);
/* At this point it's guaranteed that there is no inflight /* At this point it's guaranteed that there is no inflight
* closing task */ * closing task */
mutex_lock(&mut); xa_lock(&ctx_table);
if (!ctx->closing) { if (!ctx->closing) {
mutex_unlock(&mut); xa_unlock(&ctx_table);
ucma_put_ctx(ctx); ucma_put_ctx(ctx);
wait_for_completion(&ctx->comp); wait_for_completion(&ctx->comp);
rdma_destroy_id(ctx->cm_id); rdma_destroy_id(ctx->cm_id);
} else { } else {
mutex_unlock(&mut); xa_unlock(&ctx_table);
} }
resp.events_reported = ucma_free_ctx(ctx); resp.events_reported = ucma_free_ctx(ctx);
...@@ -951,8 +944,7 @@ static ssize_t ucma_query_path(struct ucma_context *ctx, ...@@ -951,8 +944,7 @@ static ssize_t ucma_query_path(struct ucma_context *ctx,
} }
} }
if (copy_to_user(response, resp, if (copy_to_user(response, resp, struct_size(resp, path_data, i)))
sizeof(*resp) + (i * sizeof(struct ib_path_rec_data))))
ret = -EFAULT; ret = -EFAULT;
kfree(resp); kfree(resp);
...@@ -1432,9 +1424,7 @@ static ssize_t ucma_process_join(struct ucma_file *file, ...@@ -1432,9 +1424,7 @@ static ssize_t ucma_process_join(struct ucma_file *file,
goto err3; goto err3;
} }
mutex_lock(&mut); xa_store(&multicast_table, mc->id, mc, 0);
idr_replace(&multicast_idr, mc, mc->id);
mutex_unlock(&mut);
mutex_unlock(&file->mut); mutex_unlock(&file->mut);
ucma_put_ctx(ctx); ucma_put_ctx(ctx);
...@@ -1444,9 +1434,7 @@ static ssize_t ucma_process_join(struct ucma_file *file, ...@@ -1444,9 +1434,7 @@ static ssize_t ucma_process_join(struct ucma_file *file,
rdma_leave_multicast(ctx->cm_id, (struct sockaddr *) &mc->addr); rdma_leave_multicast(ctx->cm_id, (struct sockaddr *) &mc->addr);
ucma_cleanup_mc_events(mc); ucma_cleanup_mc_events(mc);
err2: err2:
mutex_lock(&mut); xa_erase(&multicast_table, mc->id);
idr_remove(&multicast_idr, mc->id);
mutex_unlock(&mut);
list_del(&mc->list); list_del(&mc->list);
kfree(mc); kfree(mc);
err1: err1:
...@@ -1508,8 +1496,8 @@ static ssize_t ucma_leave_multicast(struct ucma_file *file, ...@@ -1508,8 +1496,8 @@ static ssize_t ucma_leave_multicast(struct ucma_file *file,
if (copy_from_user(&cmd, inbuf, sizeof(cmd))) if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
return -EFAULT; return -EFAULT;
mutex_lock(&mut); xa_lock(&multicast_table);
mc = idr_find(&multicast_idr, cmd.id); mc = xa_load(&multicast_table, cmd.id);
if (!mc) if (!mc)
mc = ERR_PTR(-ENOENT); mc = ERR_PTR(-ENOENT);
else if (mc->ctx->file != file) else if (mc->ctx->file != file)
...@@ -1517,8 +1505,8 @@ static ssize_t ucma_leave_multicast(struct ucma_file *file, ...@@ -1517,8 +1505,8 @@ static ssize_t ucma_leave_multicast(struct ucma_file *file,
else if (!atomic_inc_not_zero(&mc->ctx->ref)) else if (!atomic_inc_not_zero(&mc->ctx->ref))
mc = ERR_PTR(-ENXIO); mc = ERR_PTR(-ENXIO);
else else
idr_remove(&multicast_idr, mc->id); __xa_erase(&multicast_table, mc->id);
mutex_unlock(&mut); xa_unlock(&multicast_table);
if (IS_ERR(mc)) { if (IS_ERR(mc)) {
ret = PTR_ERR(mc); ret = PTR_ERR(mc);
...@@ -1615,14 +1603,14 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file, ...@@ -1615,14 +1603,14 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
* events being added before existing events. * events being added before existing events.
*/ */
ucma_lock_files(cur_file, new_file); ucma_lock_files(cur_file, new_file);
mutex_lock(&mut); xa_lock(&ctx_table);
list_move_tail(&ctx->list, &new_file->ctx_list); list_move_tail(&ctx->list, &new_file->ctx_list);
ucma_move_events(ctx, new_file); ucma_move_events(ctx, new_file);
ctx->file = new_file; ctx->file = new_file;
resp.events_reported = ctx->events_reported; resp.events_reported = ctx->events_reported;
mutex_unlock(&mut); xa_unlock(&ctx_table);
ucma_unlock_files(cur_file, new_file); ucma_unlock_files(cur_file, new_file);
response: response:
...@@ -1757,18 +1745,15 @@ static int ucma_close(struct inode *inode, struct file *filp) ...@@ -1757,18 +1745,15 @@ static int ucma_close(struct inode *inode, struct file *filp)
ctx->destroying = 1; ctx->destroying = 1;
mutex_unlock(&file->mut); mutex_unlock(&file->mut);
mutex_lock(&mut); xa_erase(&ctx_table, ctx->id);
idr_remove(&ctx_idr, ctx->id);
mutex_unlock(&mut);
flush_workqueue(file->close_wq); flush_workqueue(file->close_wq);
/* At that step once ctx was marked as destroying and workqueue /* At that step once ctx was marked as destroying and workqueue
* was flushed we are safe from any inflights handlers that * was flushed we are safe from any inflights handlers that
* might put other closing task. * might put other closing task.
*/ */
mutex_lock(&mut); xa_lock(&ctx_table);
if (!ctx->closing) { if (!ctx->closing) {
mutex_unlock(&mut); xa_unlock(&ctx_table);
ucma_put_ctx(ctx); ucma_put_ctx(ctx);
wait_for_completion(&ctx->comp); wait_for_completion(&ctx->comp);
/* rdma_destroy_id ensures that no event handlers are /* rdma_destroy_id ensures that no event handlers are
...@@ -1776,7 +1761,7 @@ static int ucma_close(struct inode *inode, struct file *filp) ...@@ -1776,7 +1761,7 @@ static int ucma_close(struct inode *inode, struct file *filp)
*/ */
rdma_destroy_id(ctx->cm_id); rdma_destroy_id(ctx->cm_id);
} else { } else {
mutex_unlock(&mut); xa_unlock(&ctx_table);
} }
ucma_free_ctx(ctx); ucma_free_ctx(ctx);
...@@ -1805,6 +1790,19 @@ static struct miscdevice ucma_misc = { ...@@ -1805,6 +1790,19 @@ static struct miscdevice ucma_misc = {
.fops = &ucma_fops, .fops = &ucma_fops,
}; };
static int ucma_get_global_nl_info(struct ib_client_nl_info *res)
{
res->abi = RDMA_USER_CM_ABI_VERSION;
res->cdev = ucma_misc.this_device;
return 0;
}
static struct ib_client rdma_cma_client = {
.name = "rdma_cm",
.get_global_nl_info = ucma_get_global_nl_info,
};
MODULE_ALIAS_RDMA_CLIENT("rdma_cm");
static ssize_t show_abi_version(struct device *dev, static ssize_t show_abi_version(struct device *dev,
struct device_attribute *attr, struct device_attribute *attr,
char *buf) char *buf)
...@@ -1833,7 +1831,14 @@ static int __init ucma_init(void) ...@@ -1833,7 +1831,14 @@ static int __init ucma_init(void)
ret = -ENOMEM; ret = -ENOMEM;
goto err2; goto err2;
} }
ret = ib_register_client(&rdma_cma_client);
if (ret)
goto err3;
return 0; return 0;
err3:
unregister_net_sysctl_table(ucma_ctl_table_hdr);
err2: err2:
device_remove_file(ucma_misc.this_device, &dev_attr_abi_version); device_remove_file(ucma_misc.this_device, &dev_attr_abi_version);
err1: err1:
...@@ -1843,11 +1848,10 @@ static int __init ucma_init(void) ...@@ -1843,11 +1848,10 @@ static int __init ucma_init(void)
static void __exit ucma_cleanup(void) static void __exit ucma_cleanup(void)
{ {
ib_unregister_client(&rdma_cma_client);
unregister_net_sysctl_table(ucma_ctl_table_hdr); unregister_net_sysctl_table(ucma_ctl_table_hdr);
device_remove_file(ucma_misc.this_device, &dev_attr_abi_version); device_remove_file(ucma_misc.this_device, &dev_attr_abi_version);
misc_deregister(&ucma_misc); misc_deregister(&ucma_misc);
idr_destroy(&ctx_idr);
idr_destroy(&multicast_idr);
} }
module_init(ucma_init); module_init(ucma_init);
......
...@@ -54,9 +54,10 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d ...@@ -54,9 +54,10 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) { for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) {
page = sg_page_iter_page(&sg_iter); page = sg_page_iter_page(&sg_iter);
if (!PageDirty(page) && umem->writable && dirty) if (umem->writable && dirty)
set_page_dirty_lock(page); put_user_pages_dirty_lock(&page, 1);
put_page(page); else
put_user_page(page);
} }
sg_free_table(&umem->sg_head); sg_free_table(&umem->sg_head);
...@@ -244,7 +245,6 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, ...@@ -244,7 +245,6 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
umem->context = context; umem->context = context;
umem->length = size; umem->length = size;
umem->address = addr; umem->address = addr;
umem->page_shift = PAGE_SHIFT;
umem->writable = ib_access_writable(access); umem->writable = ib_access_writable(access);
umem->owning_mm = mm = current->mm; umem->owning_mm = mm = current->mm;
mmgrab(mm); mmgrab(mm);
...@@ -361,6 +361,9 @@ static void __ib_umem_release_tail(struct ib_umem *umem) ...@@ -361,6 +361,9 @@ static void __ib_umem_release_tail(struct ib_umem *umem)
*/ */
void ib_umem_release(struct ib_umem *umem) void ib_umem_release(struct ib_umem *umem)
{ {
if (!umem)
return;
if (umem->is_odp) { if (umem->is_odp) {
ib_umem_odp_release(to_ib_umem_odp(umem)); ib_umem_odp_release(to_ib_umem_odp(umem));
__ib_umem_release_tail(umem); __ib_umem_release_tail(umem);
...@@ -385,7 +388,7 @@ int ib_umem_page_count(struct ib_umem *umem) ...@@ -385,7 +388,7 @@ int ib_umem_page_count(struct ib_umem *umem)
n = 0; n = 0;
for_each_sg(umem->sg_head.sgl, sg, umem->nmap, i) for_each_sg(umem->sg_head.sgl, sg, umem->nmap, i)
n += sg_dma_len(sg) >> umem->page_shift; n += sg_dma_len(sg) >> PAGE_SHIFT;
return n; return n;
} }
......
This diff is collapsed.
...@@ -54,6 +54,7 @@ ...@@ -54,6 +54,7 @@
#include <rdma/ib_mad.h> #include <rdma/ib_mad.h>
#include <rdma/ib_user_mad.h> #include <rdma/ib_user_mad.h>
#include <rdma/rdma_netlink.h>
#include "core_priv.h" #include "core_priv.h"
...@@ -744,7 +745,7 @@ static int ib_umad_reg_agent(struct ib_umad_file *file, void __user *arg, ...@@ -744,7 +745,7 @@ static int ib_umad_reg_agent(struct ib_umad_file *file, void __user *arg,
"process %s did not enable P_Key index support.\n", "process %s did not enable P_Key index support.\n",
current->comm); current->comm);
dev_warn(&file->port->dev, dev_warn(&file->port->dev,
" Documentation/infiniband/user_mad.txt has info on the new ABI.\n"); " Documentation/infiniband/user_mad.rst has info on the new ABI.\n");
} }
} }
...@@ -1124,11 +1125,48 @@ static const struct file_operations umad_sm_fops = { ...@@ -1124,11 +1125,48 @@ static const struct file_operations umad_sm_fops = {
.llseek = no_llseek, .llseek = no_llseek,
}; };
static int ib_umad_get_nl_info(struct ib_device *ibdev, void *client_data,
struct ib_client_nl_info *res)
{
struct ib_umad_device *umad_dev = client_data;
if (!rdma_is_port_valid(ibdev, res->port))
return -EINVAL;
res->abi = IB_USER_MAD_ABI_VERSION;
res->cdev = &umad_dev->ports[res->port - rdma_start_port(ibdev)].dev;
return 0;
}
static struct ib_client umad_client = { static struct ib_client umad_client = {
.name = "umad", .name = "umad",
.add = ib_umad_add_one, .add = ib_umad_add_one,
.remove = ib_umad_remove_one .remove = ib_umad_remove_one,
.get_nl_info = ib_umad_get_nl_info,
}; };
MODULE_ALIAS_RDMA_CLIENT("umad");
static int ib_issm_get_nl_info(struct ib_device *ibdev, void *client_data,
struct ib_client_nl_info *res)
{
struct ib_umad_device *umad_dev =
ib_get_client_data(ibdev, &umad_client);
if (!rdma_is_port_valid(ibdev, res->port))
return -EINVAL;
res->abi = IB_USER_MAD_ABI_VERSION;
res->cdev = &umad_dev->ports[res->port - rdma_start_port(ibdev)].sm_dev;
return 0;
}
static struct ib_client issm_client = {
.name = "issm",
.get_nl_info = ib_issm_get_nl_info,
};
MODULE_ALIAS_RDMA_CLIENT("issm");
static ssize_t ibdev_show(struct device *dev, struct device_attribute *attr, static ssize_t ibdev_show(struct device *dev, struct device_attribute *attr,
char *buf) char *buf)
...@@ -1387,13 +1425,17 @@ static int __init ib_umad_init(void) ...@@ -1387,13 +1425,17 @@ static int __init ib_umad_init(void)
} }
ret = ib_register_client(&umad_client); ret = ib_register_client(&umad_client);
if (ret) { if (ret)
pr_err("couldn't register ib_umad client\n");
goto out_class; goto out_class;
}
ret = ib_register_client(&issm_client);
if (ret)
goto out_client;
return 0; return 0;
out_client:
ib_unregister_client(&umad_client);
out_class: out_class:
class_unregister(&umad_class); class_unregister(&umad_class);
...@@ -1411,6 +1453,7 @@ static int __init ib_umad_init(void) ...@@ -1411,6 +1453,7 @@ static int __init ib_umad_init(void)
static void __exit ib_umad_cleanup(void) static void __exit ib_umad_cleanup(void)
{ {
ib_unregister_client(&issm_client);
ib_unregister_client(&umad_client); ib_unregister_client(&umad_client);
class_unregister(&umad_class); class_unregister(&umad_class);
unregister_chrdev_region(base_umad_dev, unregister_chrdev_region(base_umad_dev,
......
...@@ -756,7 +756,9 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs) ...@@ -756,7 +756,9 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
mr->device = pd->device; mr->device = pd->device;
mr->pd = pd; mr->pd = pd;
mr->type = IB_MR_TYPE_USER;
mr->dm = NULL; mr->dm = NULL;
mr->sig_attrs = NULL;
mr->uobject = uobj; mr->uobject = uobj;
atomic_inc(&pd->usecnt); atomic_inc(&pd->usecnt);
mr->res.type = RDMA_RESTRACK_MR; mr->res.type = RDMA_RESTRACK_MR;
...@@ -1021,12 +1023,11 @@ static struct ib_ucq_object *create_cq(struct uverbs_attr_bundle *attrs, ...@@ -1021,12 +1023,11 @@ static struct ib_ucq_object *create_cq(struct uverbs_attr_bundle *attrs,
attr.comp_vector = cmd->comp_vector; attr.comp_vector = cmd->comp_vector;
attr.flags = cmd->flags; attr.flags = cmd->flags;
cq = ib_dev->ops.create_cq(ib_dev, &attr, &attrs->driver_udata); cq = rdma_zalloc_drv_obj(ib_dev, ib_cq);
if (IS_ERR(cq)) { if (!cq) {
ret = PTR_ERR(cq); ret = -ENOMEM;
goto err_file; goto err_file;
} }
cq->device = ib_dev; cq->device = ib_dev;
cq->uobject = &obj->uobject; cq->uobject = &obj->uobject;
cq->comp_handler = ib_uverbs_comp_handler; cq->comp_handler = ib_uverbs_comp_handler;
...@@ -1034,6 +1035,10 @@ static struct ib_ucq_object *create_cq(struct uverbs_attr_bundle *attrs, ...@@ -1034,6 +1035,10 @@ static struct ib_ucq_object *create_cq(struct uverbs_attr_bundle *attrs,
cq->cq_context = ev_file ? &ev_file->ev_queue : NULL; cq->cq_context = ev_file ? &ev_file->ev_queue : NULL;
atomic_set(&cq->usecnt, 0); atomic_set(&cq->usecnt, 0);
ret = ib_dev->ops.create_cq(cq, &attr, &attrs->driver_udata);
if (ret)
goto err_free;
obj->uobject.object = cq; obj->uobject.object = cq;
memset(&resp, 0, sizeof resp); memset(&resp, 0, sizeof resp);
resp.base.cq_handle = obj->uobject.id; resp.base.cq_handle = obj->uobject.id;
...@@ -1054,7 +1059,9 @@ static struct ib_ucq_object *create_cq(struct uverbs_attr_bundle *attrs, ...@@ -1054,7 +1059,9 @@ static struct ib_ucq_object *create_cq(struct uverbs_attr_bundle *attrs,
err_cb: err_cb:
ib_destroy_cq_user(cq, uverbs_get_cleared_udata(attrs)); ib_destroy_cq_user(cq, uverbs_get_cleared_udata(attrs));
cq = NULL;
err_free:
kfree(cq);
err_file: err_file:
if (ev_file) if (ev_file)
ib_uverbs_release_ucq(attrs->ufile, ev_file, obj); ib_uverbs_release_ucq(attrs->ufile, ev_file, obj);
...@@ -2541,7 +2548,7 @@ static int ib_uverbs_detach_mcast(struct uverbs_attr_bundle *attrs) ...@@ -2541,7 +2548,7 @@ static int ib_uverbs_detach_mcast(struct uverbs_attr_bundle *attrs)
struct ib_uqp_object *obj; struct ib_uqp_object *obj;
struct ib_qp *qp; struct ib_qp *qp;
struct ib_uverbs_mcast_entry *mcast; struct ib_uverbs_mcast_entry *mcast;
int ret = -EINVAL; int ret;
bool found = false; bool found = false;
ret = uverbs_request(attrs, &cmd, sizeof(cmd)); ret = uverbs_request(attrs, &cmd, sizeof(cmd));
...@@ -3715,9 +3722,6 @@ static int ib_uverbs_ex_modify_cq(struct uverbs_attr_bundle *attrs) ...@@ -3715,9 +3722,6 @@ static int ib_uverbs_ex_modify_cq(struct uverbs_attr_bundle *attrs)
* trailing driver_data flex array. In this case the size of the base struct * trailing driver_data flex array. In this case the size of the base struct
* cannot be changed. * cannot be changed.
*/ */
#define offsetof_after(_struct, _member) \
(offsetof(_struct, _member) + sizeof(((_struct *)NULL)->_member))
#define UAPI_DEF_WRITE_IO(req, resp) \ #define UAPI_DEF_WRITE_IO(req, resp) \
.write.has_resp = 1 + \ .write.has_resp = 1 + \
BUILD_BUG_ON_ZERO(offsetof(req, response) != 0) + \ BUILD_BUG_ON_ZERO(offsetof(req, response) != 0) + \
...@@ -3748,11 +3752,11 @@ static int ib_uverbs_ex_modify_cq(struct uverbs_attr_bundle *attrs) ...@@ -3748,11 +3752,11 @@ static int ib_uverbs_ex_modify_cq(struct uverbs_attr_bundle *attrs)
*/ */
#define UAPI_DEF_WRITE_IO_EX(req, req_last_member, resp, resp_last_member) \ #define UAPI_DEF_WRITE_IO_EX(req, req_last_member, resp, resp_last_member) \
.write.has_resp = 1, \ .write.has_resp = 1, \
.write.req_size = offsetof_after(req, req_last_member), \ .write.req_size = offsetofend(req, req_last_member), \
.write.resp_size = offsetof_after(resp, resp_last_member) .write.resp_size = offsetofend(resp, resp_last_member)
#define UAPI_DEF_WRITE_I_EX(req, req_last_member) \ #define UAPI_DEF_WRITE_I_EX(req, req_last_member) \
.write.req_size = offsetof_after(req, req_last_member) .write.req_size = offsetofend(req, req_last_member)
const struct uapi_definition uverbs_def_write_intf[] = { const struct uapi_definition uverbs_def_write_intf[] = {
DECLARE_UVERBS_OBJECT( DECLARE_UVERBS_OBJECT(
......
...@@ -51,6 +51,7 @@ ...@@ -51,6 +51,7 @@
#include <rdma/ib.h> #include <rdma/ib.h>
#include <rdma/uverbs_std_types.h> #include <rdma/uverbs_std_types.h>
#include <rdma/rdma_netlink.h>
#include "uverbs.h" #include "uverbs.h"
#include "core_priv.h" #include "core_priv.h"
...@@ -198,7 +199,7 @@ void ib_uverbs_release_file(struct kref *ref) ...@@ -198,7 +199,7 @@ void ib_uverbs_release_file(struct kref *ref)
ib_dev = srcu_dereference(file->device->ib_dev, ib_dev = srcu_dereference(file->device->ib_dev,
&file->device->disassociate_srcu); &file->device->disassociate_srcu);
if (ib_dev && !ib_dev->ops.disassociate_ucontext) if (ib_dev && !ib_dev->ops.disassociate_ucontext)
module_put(ib_dev->owner); module_put(ib_dev->ops.owner);
srcu_read_unlock(&file->device->disassociate_srcu, srcu_key); srcu_read_unlock(&file->device->disassociate_srcu, srcu_key);
if (atomic_dec_and_test(&file->device->refcount)) if (atomic_dec_and_test(&file->device->refcount))
...@@ -1065,7 +1066,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp) ...@@ -1065,7 +1066,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
module_dependent = !(ib_dev->ops.disassociate_ucontext); module_dependent = !(ib_dev->ops.disassociate_ucontext);
if (module_dependent) { if (module_dependent) {
if (!try_module_get(ib_dev->owner)) { if (!try_module_get(ib_dev->ops.owner)) {
ret = -ENODEV; ret = -ENODEV;
goto err; goto err;
} }
...@@ -1100,7 +1101,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp) ...@@ -1100,7 +1101,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
return stream_open(inode, filp); return stream_open(inode, filp);
err_module: err_module:
module_put(ib_dev->owner); module_put(ib_dev->ops.owner);
err: err:
mutex_unlock(&dev->lists_mutex); mutex_unlock(&dev->lists_mutex);
...@@ -1148,12 +1149,41 @@ static const struct file_operations uverbs_mmap_fops = { ...@@ -1148,12 +1149,41 @@ static const struct file_operations uverbs_mmap_fops = {
.compat_ioctl = ib_uverbs_ioctl, .compat_ioctl = ib_uverbs_ioctl,
}; };
static int ib_uverbs_get_nl_info(struct ib_device *ibdev, void *client_data,
struct ib_client_nl_info *res)
{
struct ib_uverbs_device *uverbs_dev = client_data;
int ret;
if (res->port != -1)
return -EINVAL;
res->abi = ibdev->ops.uverbs_abi_ver;
res->cdev = &uverbs_dev->dev;
/*
* To support DRIVER_ID binding in userspace some of the driver need
* upgrading to expose their PCI dependent revision information
* through get_context instead of relying on modalias matching. When
* the drivers are fixed they can drop this flag.
*/
if (!ibdev->ops.uverbs_no_driver_id_binding) {
ret = nla_put_u32(res->nl_msg, RDMA_NLDEV_ATTR_UVERBS_DRIVER_ID,
ibdev->ops.driver_id);
if (ret)
return ret;
}
return 0;
}
static struct ib_client uverbs_client = { static struct ib_client uverbs_client = {
.name = "uverbs", .name = "uverbs",
.no_kverbs_req = true, .no_kverbs_req = true,
.add = ib_uverbs_add_one, .add = ib_uverbs_add_one,
.remove = ib_uverbs_remove_one .remove = ib_uverbs_remove_one,
.get_nl_info = ib_uverbs_get_nl_info,
}; };
MODULE_ALIAS_RDMA_CLIENT("uverbs");
static ssize_t ibdev_show(struct device *device, struct device_attribute *attr, static ssize_t ibdev_show(struct device *device, struct device_attribute *attr,
char *buf) char *buf)
...@@ -1186,7 +1216,7 @@ static ssize_t abi_version_show(struct device *device, ...@@ -1186,7 +1216,7 @@ static ssize_t abi_version_show(struct device *device,
srcu_key = srcu_read_lock(&dev->disassociate_srcu); srcu_key = srcu_read_lock(&dev->disassociate_srcu);
ib_dev = srcu_dereference(dev->ib_dev, &dev->disassociate_srcu); ib_dev = srcu_dereference(dev->ib_dev, &dev->disassociate_srcu);
if (ib_dev) if (ib_dev)
ret = sprintf(buf, "%d\n", ib_dev->uverbs_abi_ver); ret = sprintf(buf, "%u\n", ib_dev->ops.uverbs_abi_ver);
srcu_read_unlock(&dev->disassociate_srcu, srcu_key); srcu_read_unlock(&dev->disassociate_srcu, srcu_key);
return ret; return ret;
......
...@@ -128,6 +128,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DM_MR_REG)( ...@@ -128,6 +128,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DM_MR_REG)(
mr->device = pd->device; mr->device = pd->device;
mr->pd = pd; mr->pd = pd;
mr->type = IB_MR_TYPE_DM;
mr->dm = dm; mr->dm = dm;
mr->uobject = uobj; mr->uobject = uobj;
atomic_inc(&pd->usecnt); atomic_inc(&pd->usecnt);
......
...@@ -22,6 +22,8 @@ static void *uapi_add_elm(struct uverbs_api *uapi, u32 key, size_t alloc_size) ...@@ -22,6 +22,8 @@ static void *uapi_add_elm(struct uverbs_api *uapi, u32 key, size_t alloc_size)
return ERR_PTR(-EOVERFLOW); return ERR_PTR(-EOVERFLOW);
elm = kzalloc(alloc_size, GFP_KERNEL); elm = kzalloc(alloc_size, GFP_KERNEL);
if (!elm)
return ERR_PTR(-ENOMEM);
rc = radix_tree_insert(&uapi->radix, key, elm); rc = radix_tree_insert(&uapi->radix, key, elm);
if (rc) { if (rc) {
kfree(elm); kfree(elm);
...@@ -645,7 +647,7 @@ struct uverbs_api *uverbs_alloc_api(struct ib_device *ibdev) ...@@ -645,7 +647,7 @@ struct uverbs_api *uverbs_alloc_api(struct ib_device *ibdev)
return ERR_PTR(-ENOMEM); return ERR_PTR(-ENOMEM);
INIT_RADIX_TREE(&uapi->radix, GFP_KERNEL); INIT_RADIX_TREE(&uapi->radix, GFP_KERNEL);
uapi->driver_id = ibdev->driver_id; uapi->driver_id = ibdev->ops.driver_id;
rc = uapi_merge_def(uapi, ibdev, uverbs_core_api, false); rc = uapi_merge_def(uapi, ibdev, uverbs_core_api, false);
if (rc) if (rc)
......
This diff is collapsed.
...@@ -7,7 +7,6 @@ obj-$(CONFIG_INFINIBAND_EFA) += efa/ ...@@ -7,7 +7,6 @@ obj-$(CONFIG_INFINIBAND_EFA) += efa/
obj-$(CONFIG_INFINIBAND_I40IW) += i40iw/ obj-$(CONFIG_INFINIBAND_I40IW) += i40iw/
obj-$(CONFIG_MLX4_INFINIBAND) += mlx4/ obj-$(CONFIG_MLX4_INFINIBAND) += mlx4/
obj-$(CONFIG_MLX5_INFINIBAND) += mlx5/ obj-$(CONFIG_MLX5_INFINIBAND) += mlx5/
obj-$(CONFIG_INFINIBAND_NES) += nes/
obj-$(CONFIG_INFINIBAND_OCRDMA) += ocrdma/ obj-$(CONFIG_INFINIBAND_OCRDMA) += ocrdma/
obj-$(CONFIG_INFINIBAND_VMWARE_PVRDMA) += vmw_pvrdma/ obj-$(CONFIG_INFINIBAND_VMWARE_PVRDMA) += vmw_pvrdma/
obj-$(CONFIG_INFINIBAND_USNIC) += usnic/ obj-$(CONFIG_INFINIBAND_USNIC) += usnic/
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
...@@ -45,7 +45,6 @@ struct efa_com_admin_sq { ...@@ -45,7 +45,6 @@ struct efa_com_admin_sq {
/* Don't use anything other than atomic64 */ /* Don't use anything other than atomic64 */
struct efa_com_stats_admin { struct efa_com_stats_admin {
atomic64_t aborted_cmd;
atomic64_t submitted_cmd; atomic64_t submitted_cmd;
atomic64_t completed_cmd; atomic64_t completed_cmd;
atomic64_t no_completion; atomic64_t no_completion;
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment