• Alan Adamson's avatar
    nvme: Atomic write support · 5f9bbea0
    Alan Adamson authored
    Add support to set block layer request_queue atomic write limits. The
    limits will be derived from either the namespace or controller atomic
    parameters.
    
    NVMe atomic-related parameters are grouped into "normal" and "power-fail"
    (or PF) class of parameter. For atomic write support, only PF parameters
    are of interest. The "normal" parameters are concerned with racing reads
    and writes (which also applies to PF). See NVM Command Set Specification
    Revision 1.0d section 2.1.4 for reference.
    
    Whether to use per namespace or controller atomic parameters is decided by
    NSFEAT bit 1 - see Figure 97: Identify – Identify Namespace Data
    Structure, NVM Command Set.
    
    NVMe namespaces may define an atomic boundary, whereby no atomic guarantees
    are provided for a write which straddles this per-lba space boundary. The
    block layer merging policy is such that no merges may occur in which the
    resultant request would straddle such a boundary.
    
    Unlike SCSI, NVMe specifies no granularity or alignment rules, apart from
    atomic boundary rule. In addition, again unlike SCSI, there is no
    dedicated atomic write command - a write which adheres to the atomic size
    limit and boundary is implicitly atomic.
    
    If NSFEAT bit 1 is set, the following parameters are of interest:
    - NAWUPF (Namespace Atomic Write Unit Power Fail)
    - NABSPF (Namespace Atomic Boundary Size Power Fail)
    - NABO (Namespace Atomic Boundary Offset)
    
    and we set request_queue limits as follows:
    - atomic_write_unit_max = rounddown_pow_of_two(NAWUPF)
    - atomic_write_max_bytes = NAWUPF
    - atomic_write_boundary = NABSPF
    
    If in the unlikely scenario that NABO is non-zero, then atomic writes will
    not be supported at all as dealing with this adds extra complexity. This
    policy may change in future.
    
    In all cases, atomic_write_unit_min is set to the logical block size.
    
    If NSFEAT bit 1 is unset, the following parameter is of interest:
    - AWUPF (Atomic Write Unit Power Fail)
    
    and we set request_queue limits as follows:
    - atomic_write_unit_max = rounddown_pow_of_two(AWUPF)
    - atomic_write_max_bytes = AWUPF
    - atomic_write_boundary = 0
    
    A new function, nvme_valid_atomic_write(), is also called from submission
    path to verify that a request has been submitted to the driver will
    actually be executed atomically. As mentioned, there is no dedicated NVMe
    atomic write command (which may error for a command which exceeds the
    controller atomic write limits).
    
    Note on NABSPF:
    There seems to be some vagueness in the spec as to whether NABSPF applies
    for NSFEAT bit 1 being unset. Figure 97 does not explicitly mention NABSPF
    and how it is affected by bit 1. However Figure 4 does tell to check Figure
    97 for info about per-namespace parameters, which NABSPF is, so it is
    implied. However currently nvme_update_disk_info() does check namespace
    parameter NABO regardless of this bit.
    Signed-off-by: default avatarAlan Adamson <alan.adamson@oracle.com>
    Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
    Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    jpg: total rewrite
    Signed-off-by: default avatarJohn Garry <john.g.garry@oracle.com>
    Link: https://lore.kernel.org/r/20240620125359.2684798-11-john.g.garry@oracle.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
    5f9bbea0
core.c 133 KB