Commit b6836230 authored by Aharon Landau's avatar Aharon Landau Committed by Jason Gunthorpe

RDMA/mlx5: Avoid taking MRs from larger MR cache pools when a pool is empty

Currently, if a cache entry is empty, the driver will try to take MRs
from larger cache entries. This behavior consumes a lot of memory.
In addition, when searching for an mkey in an entry, the entry is locked.
When using a multithreaded application with the old behavior, the threads
will block each other more often, which can hurt performance as can be
seen in the table below.

Therefore, avoid it by creating a new mkey when the requested cache entry
is empty.

The test was performed on a machine with
Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 44 cores.

Here are the time measures for allocating MRs of 2^6 pages. The search in
the cache started from entry 6.

+------------+---------------------+---------------------+
|            |     Old behavior    |     New behavior    |
|            +----------+----------+----------+----------+
|            | 1 thread | 5 thread | 1 thread | 5 thread |
+============+==========+==========+==========+==========+
|  1,000 MRs |   14 ms  |   30 ms  |   14 ms  |   80 ms  |
+------------+----------+----------+----------+----------+
| 10,000 MRs |  135 ms  |   6 sec  |  173 ms  |  880 ms  |
+------------+----------+----------+----------+----------+
|100,000 MRs | 11.2 sec |  57 sec  | 1.74 sec |  8.8 sec |
+------------+----------+----------+----------+----------+

Link: https://lore.kernel.org/r/71af2770c737b936f7b10f457f0ef303ffcf7ad7.1632644527.git.leonro@nvidia.comSigned-off-by: default avatarAharon Landau <aharonl@nvidia.com>
Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
parent 3f3fe682
...@@ -600,29 +600,21 @@ struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev, ...@@ -600,29 +600,21 @@ struct mlx5_ib_mr *mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev,
/* Return a MR already available in the cache */ /* Return a MR already available in the cache */
static struct mlx5_ib_mr *get_cache_mr(struct mlx5_cache_ent *req_ent) static struct mlx5_ib_mr *get_cache_mr(struct mlx5_cache_ent *req_ent)
{ {
struct mlx5_ib_dev *dev = req_ent->dev;
struct mlx5_ib_mr *mr = NULL; struct mlx5_ib_mr *mr = NULL;
struct mlx5_cache_ent *ent = req_ent; struct mlx5_cache_ent *ent = req_ent;
/* Try larger MR pools from the cache to satisfy the allocation */ spin_lock_irq(&ent->lock);
for (; ent != &dev->cache.ent[MR_CACHE_LAST_STD_ENTRY + 1]; ent++) { if (!list_empty(&ent->head)) {
mlx5_ib_dbg(dev, "order %u, cache index %zu\n", ent->order, mr = list_first_entry(&ent->head, struct mlx5_ib_mr, list);
ent - dev->cache.ent); list_del(&mr->list);
ent->available_mrs--;
spin_lock_irq(&ent->lock);
if (!list_empty(&ent->head)) {
mr = list_first_entry(&ent->head, struct mlx5_ib_mr,
list);
list_del(&mr->list);
ent->available_mrs--;
queue_adjust_cache_locked(ent);
spin_unlock_irq(&ent->lock);
mlx5_clear_mr(mr);
return mr;
}
queue_adjust_cache_locked(ent); queue_adjust_cache_locked(ent);
spin_unlock_irq(&ent->lock); spin_unlock_irq(&ent->lock);
mlx5_clear_mr(mr);
return mr;
} }
queue_adjust_cache_locked(ent);
spin_unlock_irq(&ent->lock);
req_ent->miss++; req_ent->miss++;
return NULL; return NULL;
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment