• Mike Marciniszyn's avatar
    RDMA/core: Fix umem iterator when PAGE_SIZE is greater then HCA pgsz · 4fbc3a52
    Mike Marciniszyn authored
    64k pages introduce the situation in this diagram when the HCA 4k page
    size is being used:
    
     +-------------------------------------------+ <--- 64k aligned VA
     |                                           |
     |              HCA 4k page                  |
     |                                           |
     +-------------------------------------------+
     |                   o                       |
     |                                           |
     |                   o                       |
     |                                           |
     |                   o                       |
     +-------------------------------------------+
     |                                           |
     |              HCA 4k page                  |
     |                                           |
     +-------------------------------------------+ <--- Live HCA page
     |OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO| <--- offset
     |                                           | <--- VA
     |                MR data                    |
     +-------------------------------------------+
     |                                           |
     |              HCA 4k page                  |
     |                                           |
     +-------------------------------------------+
     |                   o                       |
     |                                           |
     |                   o                       |
     |                                           |
     |                   o                       |
     +-------------------------------------------+
     |                                           |
     |              HCA 4k page                  |
     |                                           |
     +-------------------------------------------+
    
    The VA addresses are coming from rdma-core in this diagram can be
    arbitrary, but for 64k pages, the VA may be offset by some number of HCA
    4k pages and followed by some number of HCA 4k pages.
    
    The current iterator doesn't account for either the preceding 4k pages or
    the following 4k pages.
    
    Fix the issue by extending the ib_block_iter to contain the number of DMA
    pages like comment [1] says and by using __sg_advance to start the
    iterator at the first live HCA page.
    
    The changes are contained in a parallel set of iterator start and next
    functions that are umem aware and specific to umem since there is one user
    of the rdma_for_each_block() without umem.
    
    These two fixes prevents the extra pages before and after the user MR
    data.
    
    Fix the preceding pages by using the __sq_advance field to start at the
    first 4k page containing MR data.
    
    Fix the following pages by saving the number of pgsz blocks in the
    iterator state and downcounting on each next.
    
    This fix allows for the elimination of the small page crutch noted in the
    Fixes.
    
    Fixes: 10c75ccb ("RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz()")
    Link: https://lore.kernel.org/r/20231129202143.1434-2-shiraz.saleem@intel.comSigned-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
    Signed-off-by: default avatarShiraz Saleem <shiraz.saleem@intel.com>
    Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    4fbc3a52
ib_verbs.h 142 KB