• Jason Gunthorpe's avatar
    RDMA/ucma: Rework ucma_migrate_id() to avoid races with destroy · f5449e74
    Jason Gunthorpe authored
    ucma_destroy_id() assumes that all things accessing the ctx will do so via
    the xarray. This assumption violated only in the case the FD is being
    closed, then the ctx is reached via the ctx_list. Normally this is OK
    since ucma_destroy_id() cannot run concurrenty with release(), however
    with ucma_migrate_id() is involved this can violated as the close of the
    2nd FD can run concurrently with destroy on the first:
    
                    CPU0                      CPU1
            ucma_destroy_id(fda)
                                      ucma_migrate_id(fda -> fdb)
                                           ucma_get_ctx()
            xa_lock()
             _ucma_find_context()
             xa_erase()
            xa_unlock()
                                           xa_lock()
                                            ctx->file = new_file
                                            list_move()
                                           xa_unlock()
                                          ucma_put_ctx()
    
                                       ucma_close(fdb)
                                          _destroy_id()
                                          kfree(ctx)
    
            _destroy_id()
              wait_for_completion()
              // boom, ctx was freed
    
    The ctx->file must be modified under the handler and xa_lock, and prior to
    modification the ID must be rechecked that it is still reachable from
    cur_file, ie there is no parallel destroy or migrate.
    
    To make this work remove the double locking and streamline the control
    flow. The double locking was obsoleted by the handler lock now directly
    preventing new uevents from being created, and the ctx_list cannot be read
    while holding fgets on both files. Removing the double locking also
    removes the need to check for the same file.
    
    Fixes: 88314e4d ("RDMA/cma: add support for rdma_migrate_id()")
    Link: https://lore.kernel.org/r/0-v1-05c5a4090305+3a872-ucma_syz_migrate_jgg@nvidia.com
    Reported-and-tested-by: syzbot+cc6fc752b3819e082d0c@syzkaller.appspotmail.com
    Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    f5449e74
ucma.c 46.3 KB