Commits · 597ecc5a095406a668e53ab330495ddb65327f77 · Kirill Smelkov / linux

21 Sep, 2018 3 commits

RDMA/umem: Get rid of struct ib_umem.odp_data · 597ecc5a

Jason Gunthorpe authored Sep 16, 2018

This no longer has any use, we can use container_of to get to the
umem_odp, and a simple flag to indicate if this is an odp MR. Remove the
few remaining references to it.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

597ecc5a

RDMA/umem: Make ib_umem_odp into a sub structure of ib_umem · 41b4deea

Jason Gunthorpe authored Sep 16, 2018

These two structures are linked together, use the container_of pattern
instead of a double allocation to make the code simpler and easier to
follow.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

41b4deea

RDMA/umem: Use ib_umem_odp in all function signatures connected to ODP · b5231b01

Jason Gunthorpe authored Sep 16, 2018

All of these functions already require the ODP version of the umem struct,
make this very clear by having the signature require it. This paves the
way to using the container_of() pattern to link umem_odp and umem
together.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

b5231b01

20 Sep, 2018 7 commits

RDMA/usnic: Do not use ucontext->tgid · ece8ea7b

Jason Gunthorpe authored Sep 16, 2018

Update this driver to match the code it copies from umem.c which no longer
uses tgid.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

ece8ea7b

RDMA/umem: Do not use current->tgid to track the mm_struct · d4b4dd1b

Jason Gunthorpe authored Sep 16, 2018

This is just wrong, the process that calls into the reg_mr is the process
associated with the umem, and that does not have to be the same process
that created the context.

When this code was first written mmgrab() didn't exist, however these days
we can just directly hold the mm_struct pointer in the umem and have no
ambiguity when it comes to releasing the umem as to which mm it was
associated with.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

d4b4dd1b

RDMA/ucontext: Get rid of the old disassociate flow · ce92db1c

Jason Gunthorpe authored Sep 16, 2018

The disassociate_ucontext function in every driver is now empty, so we
don't need this ugly and wrong code that was messing with tgids.

rdma_user_mmap_io does this same work in a better way.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

ce92db1c

RDMA/hns: Use rdma_user_mmap_io · 6745d356

Jason Gunthorpe authored Sep 16, 2018

Rely on the new core code helper to map BAR memory from the driver.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

6745d356

RDMA/mlx5: Use rdma_user_mmap_io · e2cd1d1a

Jason Gunthorpe authored Sep 16, 2018

Rely on the new core code helper to map BAR memory from the driver.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

e2cd1d1a

RDMA/mlx4: Use rdma_user_mmap_io · c282da41

Jason Gunthorpe authored Sep 16, 2018

Rely on the new core code helper to map BAR memory from the driver.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

c282da41

RDMA/ucontext: Add a core API for mmaping driver IO memory · 5f9794dc

Jason Gunthorpe authored Sep 16, 2018

To support disassociation and PCI hot unplug, we have to track all the
VMAs that refer to the device IO memory. When disassociation occurs the
VMAs have to be revised to point to the zero page, not the IO memory, to
allow the physical HW to be unplugged.

The three drivers supporting this implemented three different versions
of this algorithm, all leaving something to be desired. This new common
implementation has a few differences from the driver versions:

- Track all VMAs, including splitting/truncating/etc. Tie the lifetime of
  the private data allocation to the lifetime of the vma. This avoids any
  tricks with setting vm_ops which Linus didn't like. (see link)
- Support multiple mms, and support properly tracking mmaps triggered by
  processes other than the one first opening the uverbs fd. This makes
  fork behavior of disassociation enabled drivers the same as fork support
  in normal drivers.
- Don't use crazy get_task stuff.
- Simplify the approach for to racing between vm_ops close and
  disassociation, fixing the related bugs most of the driver
  implementations had. Since we are in core code the tracking list can be
  placed in struct ib_uverbs_ufile, which has a lifetime strictly longer
  than any VMAs created by mmap on the uverbs FD.

Link: https://www.spinics.net/lists/stable/msg248747.html
Link: https://lkml.kernel.org/r/CA+55aFxJTV_g46AQPoPXen-UPiqR1HGMZictt7VpC-SMFbm3Cw@mail.gmail.comSigned-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

5f9794dc

19 Sep, 2018 5 commits

RDMA/hns: Move all prints out of irq handle · b00a92c8

liuyixian authored Sep 03, 2018

It will trigger unnecessary interrupts caused by time out if prints inside
aeq handle under some configurations.  Thus, move all prints out of aeq
handle to work queue.
Signed-off-by: liuyixian <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

b00a92c8

RDMA/uverbs: Fix error unwind in ib_uverbs_add_one · 00991039

Jason Gunthorpe authored Sep 17, 2018

The error path has several mistakes

- cdev_del should not be called if cdev_device_add fails
- We must call put_device on all the goto exit paths as that is what frees
  the uapi, SRCU and the struct itself.

While we are here consolidate all the uvdev_dev init that cannot fail at
the top.

Fixes: c5c4d92e ("RDMA/uverbs: Use cdev_device_add() instead of cdev_add()")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>

00991039

RDMA/core: Properly return the error code of rdma_set_src_addr_rcu · 0965cc95

YueHaibing authored Sep 19, 2018

rdma_set_src_addr_rcu should check copy_src_l2_addr fails, rather than
always return 0. Also copy_src_l2_addr should return 'ret' as its return
value when rdma_translate_ip fails.

Fixes: c31d4b2d ("RDMA/core: Protect against changing dst->dev during destination resolve")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

0965cc95

RDMA/i40iw: Fix incorrect iterator type · 802fa45c

Håkon Bugge authored Sep 17, 2018

Commit f27b4746 ("i40iw: add connection management code") uses an
incorrect rcu iterator, whilst holding the rtnl_lock. Since the
critical region invokes i40iw_manage_qhash(), which is a sleeping
function, the rcu locking and traversal cannot be used.
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

802fa45c

RDMA/uverbs: Remove is_closed from ib_uverbs_file · 6ebce447

Jason Gunthorpe authored Sep 16, 2018

This does nothing but indicate if the uverbs_file is in the device's list,
use list_del_init instead.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>

6ebce447

17 Sep, 2018 1 commit

IB/rxe: Revise the ib_wr_opcode enum · 9a59739b

Jason Gunthorpe authored Aug 14, 2018

This enum has become part of the uABI, as both RXE and the
ib_uverbs_post_send() command expect userspace to supply values from this
enum. So it should be properly placed in include/uapi/rdma.

In userspace this enum is called 'enum ibv_wr_opcode' as part of
libibverbs.h. That enum defines different values for IB_WR_LOCAL_INV,
IB_WR_SEND_WITH_INV, and IB_WR_LSO. These were introduced (incorrectly, it
turns out) into libiberbs in 2015.

The kernel has changed its mind on the numbering for several of the IB_WC
values over the years, but has remained stable on IB_WR_LOCAL_INV and
below.

Based on this we can conclude that there is no real user space user of the
values beyond IB_WR_ATOMIC_FETCH_AND_ADD, as they have never worked via
rdma-core. This is confirmed by inspection, only rxe uses the kernel enum
and implements the latter operations. rxe has clearly never worked with
these attributes from userspace. Other drivers that support these opcodes
implement the functionality without calling out to the kernel.

To make IB_WR_SEND_WITH_INV and related work for RXE in userspace we
choose to renumber the IB_WR enum in the kernel to match the uABI that
userspace has bee using since before Soft RoCE was merged. This is an
overall simpler configuration for the whole software stack, and obviously
can't break anything existing.
Reported-by: Seth Howell <seth.howell@intel.com>
Tested-by: Seth Howell <seth.howell@intel.com>
Fixes: 8700e3e7 ("Soft RoCE driver")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

9a59739b

13 Sep, 2018 4 commits

RDMA: Remove duplicated include from ib_addr.h · cb816cd2

YueHaibing authored Sep 13, 2018

Remove duplicated include.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

cb816cd2

IB/ipoib: Log sysfs 'dev_id' accesses from userspace · f6350da4

Arseny Maslennikov authored Sep 06, 2018

Some tools may currently be using only the deprecated attribute;
let's print an elaborate and clear deprecation notice to kmsg.

To do that, we have to replace the whole sysfs file, since we inherit
the original one from netdev.
Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
Signed-off-by: Doug Ledford <dledford@redhat.com>

f6350da4

IB/ipoib: Use dev_port to expose network interface port numbers · 9b8b2a32

Arseny Maslennikov authored Sep 06, 2018

Some InfiniBand network devices have multiple ports on the same PCI
function. This initializes the `dev_port' sysfs field of those
network interfaces with their port number.

Prior to this the kernel erroneously used the `dev_id' sysfs
field of those network interfaces to convey the port number to userspace.

The use of `dev_id' was considered correct until Linux 3.15,
when another field, `dev_port', was defined for this particular
purpose and `dev_id' was reserved for distinguishing stacked ifaces
(e.g: VLANs) with the same hardware address as their parent device.

Similar fixes to net/mlx4_en and many other drivers, which started
exporting this information through `dev_id' before 3.15, were accepted
into the kernel 4 years ago.
See 76a066f2 (`net/mlx4_en: Expose port number through sysfs').
Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
Signed-off-by: Doug Ledford <dledford@redhat.com>

9b8b2a32

Documentation/ABI: document /sys/class/net/*/dev_port · 4c0b6534

Arseny Maslennikov authored Sep 06, 2018

The sysfs field was introduced 4 years ago along with fixes to various
drivers that erroneously used `dev_id' for that purpose, but it was not
properly documented anywhere.
See commit v3.14-rc3-739-g3f85944f.
Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
Signed-off-by: Doug Ledford <dledford@redhat.com>

4c0b6534

12 Sep, 2018 12 commits

RDMA/core: Consider net ns of gid attribute for RoCE · 0e9d2c19

Parav Pandit authored Sep 05, 2018

When resolving destination address or route, when net namespace is
unavailable, refer to the net namespace of the netdevice of the SGID
attribute. This is typically the case for requests arriving from the
network for RoCE ports.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

0e9d2c19

RDMA/core: Introduce rdma_read_gid_attr_ndev_rcu() to check GID attribute · d6b1764a

Parav Pandit authored Sep 05, 2018

Introduce an API rdma_read_gid_attr_ndev_rcu() to return GID attribute
netdevice which is in UP state for accessing netdevice's fields such as
net namespace and ifindex.

This is useful for users who intent to access netdevice fields under rcu
lock.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

d6b1764a

RDMA/core: Simplify roce_resolve_route_from_path() · 6aaecd38

Parav Pandit authored Sep 05, 2018

Currently RoCE route resolve functionality is split between two
functions. (a) roce_resolve_route_from_path() and its helper function
rdma_resolve_ip_route().

Due to this multiple sockaddr src structures are created in both functions
with rdma_dev_addr is an interface between the two for checks.

Since there is only one user of rdma_resolve_ip_route() as RoCE, combine
the functionality of both functions to roce_resolve_route_from_path() and
further reduce the scope of rdma_dev_addr to core/addr.c

This also allow to extend addr_resolve() in subsequent patch to consider
netdev properties of GID in safer way under rcu lock.

Additionally src and dst addresses were always provided, so skip the src
addr NULL pointer check as they are present on the stack now.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

6aaecd38

RDMA/core: Protect against changing dst->dev during destination resolve · c31d4b2d

Parav Pandit authored Sep 05, 2018

During resolving address process, during route lookup and while performing
src address translation in case of loopback mode, hold the rcu lock so
that if netdevice is moving to different net namespace, or being
unregistered, it can be synchronized with net/core/dev.c, ie

change_net_namespace()
->dev_close_many()
  ->rt6_uncached_list_flush_dev() who would change dst->dev

to loopback device of the given net namespace.

Therefore, hold the rcu lock and sync with synchronize_net() of
change_net_namespace() to ensure that netdevice cannot get freed while
dst->dev is being used.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

c31d4b2d

RDMA/core: Refer to network type instead of device type · 307edde8

Parav Pandit authored Sep 05, 2018

Set and refer to rdma_dev_addr network type instead of dst->ndev to reduce
dependency on accessing dst netdevice.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

307edde8

RDMA/core: Use common code flow for IPv4/6 for addr resolve · 783793b5

Parav Pandit authored Sep 05, 2018

Use common code flow for resolving neighbour and for finding source
addresses.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

783793b5

RDMA/core: Rename rdma_copy_addr to rdma_copy_src_l2_addr · 77addc52

Parav Pandit authored Sep 05, 2018

Now that rdma_copy_addr() only copies the source addresses and all callers
are interested in copying only source addresses, simplify it to drop the
destination address argument.

Given that it only copies source layer2 addresses, rename it to
rdma_copy_src_l2_addr for better code readability.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

77addc52

RDMA/core: Introduce and use rdma_set_src_addr() between IPv4 and IPv6 · a362ea1d

Parav Pandit authored Sep 05, 2018

rdma_translate_ip() is done while resolving address for the loopback
addresses. The current flow is convoluted with resolve neighbor being
optional.

This patch simplifies the code in following ways.

(a) Use common code between IPv4 and IPv6 for address translation,
    loopback checks and acquiring netdevice.
(b) During neigh resolve in addr_resolve_neigh(), only copy destination
    address.
(c) Always resolve the source address before the destination address,
    because it doesn't depend on resolving neigh being requested or not.

This helps to reduce 3 calls of rdma_copy_addr and rdma_translate_ip to
one and makes it easier to follow the code flow.

Now that ib_nl_fetch_ha() doesn't depend on dst, drop dst argument from
ib_nl_fetch_ha().
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

a362ea1d

RDMA/core: Let protocol specific function typecast sockaddr structure · 89c5691c

Parav Pandit authored Sep 05, 2018

Current code typecasts destination address using extra variable but uses
source address as is.

Even though the compiler optimizes such code well, just let each protocol
specific function typecast for src and dest both and have symmetric code.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

89c5691c

RDMA/core: Avoid unnecessary sa_family overwrite · f89b7dfa

Parav Pandit authored Sep 05, 2018

addr4_resolve() and addr6_resolve() are called by checking the value of
sa_family.

Both above functions overwrite the value after typecasting, this is not
necessary.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

f89b7dfa

RDMA/core Introduce and use rdma_find_ndev_for_src_ip_rcu · caf1e3ae

Parav Pandit authored Sep 05, 2018

This fixes two issues:
1. When address family is other than IPv4 or v6, rdma_translate_ip()
   returns success which is incorrect.
2. When address familty is AF_INET6, and if the source address is not
   found, it returns success, which is also incorrect.

Therefore, introduce and use rdma_find_ndev_for_src_ip_rcu() helper
function which returns correct success or error status and is also useful
for future code refactor in addr_resolve().
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

caf1e3ae

IB/mlx5: Allow transition of DCI QP to reset · 99ed748e

Moni Shoua authored Sep 12, 2018

The transition is allowed from any state and the atrribute mask must be
IB_QP_STATE.

Fixes: c32a4f29 ("IB/mlx5: Add support for DC Initiator QP")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

99ed748e

11 Sep, 2018 8 commits

IB/hfi1: set_intr_bits uses incorrect source for register modification · b53ae6bc

Michael J. Ruhl authored Sep 10, 2018

HFI IRQ enable bits are not being set correctly.  Send context error and
DC IRQs are not being enabled correctly.  In addition, send context error
IRQs are not being delivered.

Because of this, send context errors are not being handled correctly when
they occur.

When setting the IRQ bits, if an IRQ range is used, and the last bit is on
a register boundary (bit 63), the calculated index for the final register
modification is incorrect (index + 1 vs. index).

The incorrect index calculation causes incorrect IRQ bits to be set.  In
this case the send context error IRQ is NOT enabled.

Fix by using the 'last' value rather than the counted 'src' value to
determine the final index to use.  This satisfies all cases.

Fixes: a2f7bbdc ("IB/hfi1: Rework the IRQ API to be more flexible")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

b53ae6bc

IB/hfi1: Missing return value in error path for user sdma · 2bf4b33f

Michael J. Ruhl authored Sep 10, 2018

If the set_txreq_header_agh() function returns an error, the exit path
is chosen.

In this path, the code fails to set the return value.  This will cause
the caller to not realize an error has occurred.

Set the return value correctly in the error path.
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

2bf4b33f

IB/hfi1: Right size user_sdma sequence numbers and related variables · 3ca633f1

Michael J. Ruhl authored Sep 10, 2018

Hardware limits the maximum number of packets to u16 packets.

Match that size for all relevant sequence numbers in the user_sdma
engine.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

3ca633f1

IB/hfi1: Remove race conditions in user_sdma send path · 28a9a9e8

Michael J. Ruhl authored Sep 10, 2018

Packet queue state is over used to determine SDMA descriptor
availablitity and packet queue request state.

cpu 0  ret = user_sdma_send_pkts(req, pcount);
cpu 0  if (atomic_read(&pq->n_reqs))
cpu 1  IRQ user_sdma_txreq_cb calls pq_update() (state to _INACTIVE)
cpu 0        xchg(&pq->state, SDMA_PKT_Q_ACTIVE);

At this point pq->n_reqs == 0 and pq->state is incorrectly
SDMA_PKT_Q_ACTIVE.  The close path will hang waiting for the state
to return to _INACTIVE.

This can also change the state from _DEFERRED to _ACTIVE.  However,
this is a mostly benign race.

Remove the racy code path.

Use n_reqs to determine if a packet queue is active or not.
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

28a9a9e8

IB/hfi1: Eliminate races in the SDMA send error path · a0e0cb82

Michael J. Ruhl authored Sep 10, 2018

pq_update() can only be called in two places: from the completion
function when the complete (npkts) sequence of packets has been
submitted and processed, or from setup function if a subset of the
packets were submitted (i.e. the error path).

Currently both paths can call pq_update() if an error occurrs.  This
race will cause the n_req value to go negative, hanging file_close(),
or cause a crash by freeing the txlist more than once.

Several variables are used to determine SDMA send state.  Most of
these are unnecessary, and have code inspectible races between the
setup function and the completion function, in both the send path and
the error path.

The request 'status' value can be set by the setup or by the
completion function.  This is code inspectibly racy.  Since the status
is not needed in the completion code or by the caller it has been
removed.

The request 'done' value races between usage by the setup and the
completion function.  The completion function does not need this.
When the number of processed packets matches npkts, it is done.

The 'has_error' value races between usage of the setup and the
completion function.  This can cause incorrect error handling and leave
the n_req in an incorrect value (i.e. negative).

Simplify the code by removing all of the unneeded state checks and
variables.

Clean up iovs node when it is freed.

Eliminate race conditions in the error path:

If all packets are submitted, the completion handler will set the
completion status correctly (ok or aborted).

If all packets are not submitted, the caller must wait until the
submitted packets have completed, and then set the completion status.

These two change eliminate the race condition in the error path.
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

a0e0cb82

RDMA/hns: Fix an error code in hns_roce_v2_init_eq_table() · f1a31542

Dan Carpenter authored Sep 10, 2018

The error code isn't set on this path.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

f1a31542

IB/{hfi1, qib, rdmavt}: Schedule multi RC/UC packets instead of posting · 0b79b277

Michael J. Ruhl authored Sep 10, 2018

The post_send() path determines if it should post directly or, schedule
the post for later.  The current logic is:

  if the swqe ring is empty or (for hfi1) wqe->length <= piothreshold
    post the send
  else
    schedule

This can allow large requests to call the send engine directly.  Large
requests can potentially produce a large number of packets prior to
returning to the caller, blocking the caller from posting more requests,
and allowing better parallel processing.

Allow the driver(s) more say in this logic (pass call_send to the driver,
rather than examining a return value).

Update hfi1/qib logic to schedule the send engine if an RC or UC message
is larger than the QP MTU size.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

0b79b277

infiniband: remove redundant condition check before debugfs_remove · 3e5d60bc

zhong jiang authored Sep 08, 2018

debugfs_remove has taken the IS_ERR_OR_NULL into account. Just remove the
unnecessary condition.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

3e5d60bc