• Benno Lossin's avatar
    rust: add pin-init API core · 90e53c5e
    Benno Lossin authored
    This API is used to facilitate safe pinned initialization of structs. It
    replaces cumbersome `unsafe` manual initialization with elegant safe macro
    invocations.
    
    Due to the size of this change it has been split into six commits:
    1. This commit introducing the basic public interface: traits and
       functions to represent and create initializers.
    2. Adds the `#[pin_data]`, `pin_init!`, `try_pin_init!`, `init!` and
       `try_init!` macros along with their internal types.
    3. Adds the `InPlaceInit` trait that allows using an initializer to create
       an object inside of a `Box<T>` and other smart pointers.
    4. Adds the `PinnedDrop` trait and adds macro support for it in
       the `#[pin_data]` macro.
    5. Adds the `stack_pin_init!` macro allowing to pin-initialize a struct on
       the stack.
    6. Adds the `Zeroable` trait and `init::zeroed` function to initialize
       types that have `0x00` in all bytes as a valid bit pattern.
    
    --
    
    In this section the problem that the new pin-init API solves is outlined.
    This message describes the entirety of the API, not just the parts
    introduced in this commit. For a more granular explanation and additional
    information on pinning and this issue, view [1].
    
    Pinning is Rust's way of enforcing the address stability of a value. When a
    value gets pinned it will be impossible for safe code to move it to another
    location. This is done by wrapping pointers to said object with `Pin<P>`.
    This wrapper prevents safe code from creating mutable references to the
    object, preventing mutable access, which is needed to move the value.
    `Pin<P>` provides `unsafe` functions to circumvent this and allow
    modifications regardless. It is then the programmer's responsibility to
    uphold the pinning guarantee.
    
    Many kernel data structures require a stable address, because there are
    foreign pointers to them which would get invalidated by moving the
    structure. Since these data structures are usually embedded in structs to
    use them, this pinning property propagates to the container struct.
    Resulting in most structs in both Rust and C code needing to be pinned.
    
    So if we want to have a `mutex` field in a Rust struct, this struct also
    needs to be pinned, because a `mutex` contains a `list_head`. Additionally
    initializing a `list_head` requires already having the final memory
    location available, because it is initialized by pointing it to itself. But
    this presents another challenge in Rust: values have to be initialized at
    all times. There is the `MaybeUninit<T>` wrapper type, which allows
    handling uninitialized memory, but this requires using the `unsafe` raw
    pointers and a casting the type to the initialized variant.
    
    This problem gets exacerbated when considering encapsulation and the normal
    safety requirements of Rust code. The fields of the Rust `Mutex<T>` should
    not be accessible to normal driver code. After all if anyone can modify
    the fields, there is no way to ensure the invariants of the `Mutex<T>` are
    upheld. But if the fields are inaccessible, then initialization of a
    `Mutex<T>` needs to be somehow achieved via a function or a macro. Because
    the `Mutex<T>` must be pinned in memory, the function cannot return it by
    value. It also cannot allocate a `Box` to put the `Mutex<T>` into, because
    that is an unnecessary allocation and indirection which would hurt
    performance.
    
    The solution in the rust tree (e.g. this commit: [2]) that is replaced by
    this API is to split this function into two parts:
    
    1. A `new` function that returns a partially initialized `Mutex<T>`,
    2. An `init` function that requires the `Mutex<T>` to be pinned and that
       fully initializes the `Mutex<T>`.
    
    Both of these functions have to be marked `unsafe`, since a call to `new`
    needs to be accompanied with a call to `init`, otherwise using the
    `Mutex<T>` could result in UB. And because calling `init` twice also is not
    safe. While `Mutex<T>` initialization cannot fail, other structs might
    also have to allocate memory, which would result in conditional successful
    initialization requiring even more manual accommodation work.
    
    Combine this with the problem of pin-projections -- the way of accessing
    fields of a pinned struct -- which also have an `unsafe` API, pinned
    initialization is riddled with `unsafe` resulting in very poor ergonomics.
    Not only that, but also having to call two functions possibly multiple
    lines apart makes it very easy to forget it outright or during refactoring.
    
    Here is an example of the current way of initializing a struct with two
    synchronization primitives (see [3] for the full example):
    
        struct SharedState {
            state_changed: CondVar,
            inner: Mutex<SharedStateInner>,
        }
    
        impl SharedState {
            fn try_new() -> Result<Arc<Self>> {
                let mut state = Pin::from(UniqueArc::try_new(Self {
                    // SAFETY: `condvar_init!` is called below.
                    state_changed: unsafe { CondVar::new() },
                    // SAFETY: `mutex_init!` is called below.
                    inner: unsafe {
                        Mutex::new(SharedStateInner { token_count: 0 })
                    },
                })?);
    
                // SAFETY: `state_changed` is pinned when `state` is.
                let pinned = unsafe {
                    state.as_mut().map_unchecked_mut(|s| &mut s.state_changed)
                };
                kernel::condvar_init!(pinned, "SharedState::state_changed");
    
                // SAFETY: `inner` is pinned when `state` is.
                let pinned = unsafe {
                    state.as_mut().map_unchecked_mut(|s| &mut s.inner)
                };
                kernel::mutex_init!(pinned, "SharedState::inner");
    
                Ok(state.into())
            }
        }
    
    The pin-init API of this patch solves this issue by providing a
    comprehensive solution comprised of macros and traits. Here is the example
    from above using the pin-init API:
    
        #[pin_data]
        struct SharedState {
            #[pin]
            state_changed: CondVar,
            #[pin]
            inner: Mutex<SharedStateInner>,
        }
    
        impl SharedState {
            fn new() -> impl PinInit<Self> {
                pin_init!(Self {
                    state_changed <- new_condvar!("SharedState::state_changed"),
                    inner <- new_mutex!(
                        SharedStateInner { token_count: 0 },
                        "SharedState::inner",
                    ),
                })
            }
        }
    
    Notably the way the macro is used here requires no `unsafe` and thus comes
    with the usual Rust promise of safe code not introducing any memory
    violations. Additionally it is now up to the caller of `new()` to decide
    the memory location of the `SharedState`. They can choose at the moment
    `Arc<T>`, `Box<T>` or the stack.
    
    --
    
    The API has the following architecture:
    1. Initializer traits `PinInit<T, E>` and `Init<T, E>` that act like
       closures.
    2. Macros to create these initializer traits safely.
    3. Functions to allow manually writing initializers.
    
    The initializers (an `impl PinInit<T, E>`) receive a raw pointer pointing
    to uninitialized memory and their job is to fully initialize a `T` at that
    location. If initialization fails, they return an error (`E`) by value.
    
    This way of initializing cannot be safely exposed to the user, since it
    relies upon these properties outside of the control of the trait:
    - the memory location (slot) needs to be valid memory,
    - if initialization fails, the slot should not be read from,
    - the value in the slot should be pinned, so it cannot move and the memory
      cannot be deallocated until the value is dropped.
    
    This is why using an initializer is facilitated by another trait that
    ensures these requirements.
    
    These initializers can be created manually by just supplying a closure that
    fulfills the same safety requirements as `PinInit<T, E>`. But this is an
    `unsafe` operation. To allow safe initializer creation, the `pin_init!` is
    provided along with three other variants: `try_pin_init!`, `try_init!` and
    `init!`. These take a modified struct initializer as a parameter and
    generate a closure that initializes the fields in sequence.
    The macros take great care in upholding the safety requirements:
    - A shadowed struct type is used as the return type of the closure instead
      of `()`. This is to prevent early returns, as these would prevent full
      initialization.
    - To ensure every field is only initialized once, a normal struct
      initializer is placed in unreachable code. The type checker will emit
      errors if a field is missing or specified multiple times.
    - When initializing a field fails, the whole initializer will fail and
      automatically drop fields that have been initialized earlier.
    - Only the correct initializer type is allowed for unpinned fields. You
      cannot use a `impl PinInit<T, E>` to initialize a structurally not pinned
      field.
    
    To ensure the last point, an additional macro `#[pin_data]` is needed. This
    macro annotates the struct itself and the user specifies structurally
    pinned and not pinned fields.
    
    Because dropping a pinned struct is also not allowed to break the pinning
    invariants, another macro attribute `#[pinned_drop]` is needed. This
    macro is introduced in a following commit.
    
    These two macros also have mechanisms to ensure the overall safety of the
    API. Additionally, they utilize a combined proc-macro, declarative macro
    design: first a proc-macro enables the outer attribute syntax `#[...]` and
    does some important pre-parsing. Notably this prepares the generics such
    that the declarative macro can handle them using token trees. Then the
    actual parsing of the structure and the emission of code is handled by a
    declarative macro.
    
    For pin-projections the crates `pin-project` [4] and `pin-project-lite` [5]
    had been considered, but were ultimately rejected:
    - `pin-project` depends on `syn` [6] which is a very big dependency, around
      50k lines of code.
    - `pin-project-lite` is a more reasonable 5k lines of code, but contains a
      very complex declarative macro to parse generics. On top of that it
      would require modification that would need to be maintained
      independently.
    
    Link: https://rust-for-linux.com/the-safe-pinned-initialization-problem [1]
    Link: https://github.com/Rust-for-Linux/linux/tree/0a04dc4ddd671efb87eef54dde0fb38e9074f4be [2]
    Link: https://github.com/Rust-for-Linux/linux/blob/f509ede33fc10a07eba3da14aa00302bd4b5dddd/samples/rust/rust_miscdev.rs [3]
    Link: https://crates.io/crates/pin-project [4]
    Link: https://crates.io/crates/pin-project-lite [5]
    Link: https://crates.io/crates/syn [6]
    Co-developed-by: default avatarGary Guo <gary@garyguo.net>
    Signed-off-by: default avatarGary Guo <gary@garyguo.net>
    Signed-off-by: default avatarBenno Lossin <benno.lossin@proton.me>
    Reviewed-by: default avatarAlice Ryhl <aliceryhl@google.com>
    Reviewed-by: default avatarWedson Almeida Filho <wedsonaf@gmail.com>
    Reviewed-by: default avatarAndreas Hindborg <a.hindborg@samsung.com>
    Link: https://lore.kernel.org/r/20230408122429.1103522-7-y86-dev@protonmail.comSigned-off-by: default avatarMiguel Ojeda <ojeda@kernel.org>
    90e53c5e
lib.rs 2.94 KB