• Matan Barak's avatar
    IB/core: Add RoCE GID table management · 03db3a2d
    Matan Barak authored
    RoCE GIDs are based on IP addresses configured on Ethernet net-devices
    which relate to the RDMA (RoCE) device port.
    
    Currently, each of the low-level drivers that support RoCE (ocrdma,
    mlx4) manages its own RoCE port GID table. As there's nothing which is
    essentially vendor specific, we generalize that, and enhance the RDMA
    core GID cache to do this job.
    
    In order to populate the GID table, we listen for events:
    
    (a) netdev up/down/change_addr events - if a netdev is built onto
        our RoCE device, we need to add/delete its IPs. This involves
        adding all GIDs related to this ndev, add default GIDs, etc.
    
    (b) inet events - add new GIDs (according to the IP addresses)
        to the table.
    
    For programming the port RoCE GID table, providers must implement
    the add_gid and del_gid callbacks.
    
    RoCE GID management requires us to state the associated net_device
    alongside the GID. This information is necessary in order to manage
    the GID table. For example, when a net_device is removed, its
    associated GIDs need to be removed as well.
    
    RoCE mandates generating a default GID for each port, based on the
    related net-device's IPv6 link local. In contrast to the GID based on
    the regular IPv6 link-local (as we generate GID per IP address),
    the default GID is also available when the net device is down (in
    order to support loopback).
    
    Locking is done as follows:
    The patch modify the GID table code both for new RoCE drivers
    implementing the add_gid/del_gid callbacks and for current RoCE and
    IB drivers that do not. The flows for updating the table are
    different, so the locking requirements are too.
    
    While updating RoCE GID table, protection against multiple writers is
    achieved via mutex_lock(&table->lock). Since writing to a table
    requires us to find an entry (possible a free entry) in the table and
    then modify it, this mutex protects both the find_gid and write_gid
    ensuring the atomicity of the action.
    Each entry in the GID cache is protected by rwlock. In RoCE, writing
    (usually results from netdev notifier) involves invoking the vendor's
    add_gid and del_gid callbacks, which could sleep.
    Therefore, an invalid flag is added for each entry. Updates for RoCE are
    done via a workqueue, thus sleeping is permitted.
    
    In IB, updates are done in write_lock_irq(&device->cache.lock), thus
    write_gid isn't allowed to sleep and add_gid/del_gid are not called.
    
    When passing net-device into/out-of the GID cache, the device
    is always passed held (dev_hold).
    
    The code uses a single work item for updating all RDMA devices,
    following a netdev or inet notifier.
    
    The patch moves the cache from being a client (which was incorrect,
    as the cache is part of the IB infrastructure) to being explicitly
    initialized/freed when a device is registered/removed.
    Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
    Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
    03db3a2d
device.c 26 KB