1. 16 May, 2023 5 commits
    • Josh Triplett's avatar
      io_uring: Add io_uring_setup flag to pre-register ring fd and never install it · 6e76ac59
      Josh Triplett authored
      With IORING_REGISTER_USE_REGISTERED_RING, an application can register
      the ring fd and use it via registered index rather than installed fd.
      This allows using a registered ring for everything *except* the initial
      mmap.
      
      With IORING_SETUP_NO_MMAP, io_uring_setup uses buffers allocated by the
      user, rather than requiring a subsequent mmap.
      
      The combination of the two allows a user to operate *entirely* via a
      registered ring fd, making it unnecessary to ever install the fd in the
      first place. So, add a flag IORING_SETUP_REGISTERED_FD_ONLY to make
      io_uring_setup register the fd and return a registered index, without
      installing the fd.
      
      This allows an application to avoid touching the fd table at all, and
      allows a library to never even momentarily install a file descriptor.
      
      This splits out an io_ring_add_registered_file helper from
      io_ring_add_registered_fd, for use by io_uring_setup.
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Link: https://lore.kernel.org/r/bc8f431bada371c183b95a83399628b605e978a3.1682699803.git.josh@joshtriplett.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6e76ac59
    • Jens Axboe's avatar
      io_uring: support for user allocated memory for rings/sqes · 03d89a2d
      Jens Axboe authored
      Currently io_uring applications must call mmap(2) twice to map the rings
      themselves, and the sqes array. This works fine, but it does not support
      using huge pages to back the rings/sqes.
      
      Provide a way for the application to pass in pre-allocated memory for
      the rings/sqes, which can then suitably be allocated from shmfs or
      via mmap to get huge page support.
      
      Particularly for larger rings, this reduces the TLBs needed.
      
      If an application wishes to take advantage of that, it must pre-allocate
      the memory needed for the sq/cq ring, and the sqes. The former must
      be passed in via the io_uring_params->cq_off.user_data field, while the
      latter is passed in via the io_uring_params->sq_off.user_data field. Then
      it must set IORING_SETUP_NO_MMAP in the io_uring_params->flags field,
      and io_uring will then map the existing memory into the kernel for shared
      use. The application must not call mmap(2) to map rings as it otherwise
      would have, that will now fail with -EINVAL if this setup flag was used.
      
      The pages used for the rings and sqes must be contigious. The intent here
      is clearly that huge pages should be used, otherwise the normal setup
      procedure works fine as-is. The application may use one huge page for
      both the rings and sqes.
      
      Outside of those initialization changes, everything works like it did
      before.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      03d89a2d
    • Jens Axboe's avatar
      io_uring: add ring freeing helper · 9c189eee
      Jens Axboe authored
      We do rings and sqes separately, move them into a helper that does both
      the freeing and clearing of the memory.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9c189eee
    • Jens Axboe's avatar
      io_uring: return error pointer from io_mem_alloc() · e27cef86
      Jens Axboe authored
      In preparation for having more than one time of ring allocator, make the
      existing one return valid/error-pointer rather than just NULL.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e27cef86
    • Jens Axboe's avatar
      io_uring: remove sq/cq_off memset · 9b1b58ca
      Jens Axboe authored
      We only have two reserved members we're not clearing, do so manually
      instead. This is in preparation for using one of these members for
      a new feature.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9b1b58ca
  2. 15 May, 2023 3 commits
  3. 14 May, 2023 13 commits
  4. 13 May, 2023 17 commits
  5. 12 May, 2023 2 commits
    • Borislav Petkov (AMD)'s avatar
      x86/retbleed: Fix return thunk alignment · 9a48d604
      Borislav Petkov (AMD) authored
      SYM_FUNC_START_LOCAL_NOALIGN() adds an endbr leading to this layout
      (leaving only the last 2 bytes of the address):
      
        3bff <zen_untrain_ret>:
        3bff:       f3 0f 1e fa             endbr64
        3c03:       f6                      test   $0xcc,%bl
      
        3c04 <__x86_return_thunk>:
        3c04:       c3                      ret
        3c05:       cc                      int3
        3c06:       0f ae e8                lfence
      
      However, "the RET at __x86_return_thunk must be on a 64 byte boundary,
      for alignment within the BTB."
      
      Use SYM_START instead.
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a48d604
    • Linus Torvalds's avatar
      Merge tag 'for-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 76c7f887
      Linus Torvalds authored
      Pull more btrfs fixes from David Sterba:
      
       - fix incorrect number of bitmap entries for space cache if loading is
         interrupted by some error
      
       - fix backref walking, this breaks a mode of LOGICAL_INO_V2 ioctl that
         is used in deduplication tools
      
       - zoned mode fixes:
            - properly finish zone reserved for relocation
            - correctly calculate super block zone end on ZNS
            - properly initialize new extent buffer for redirty
      
       - make mount option clear_cache work with block-group-tree, to rebuild
         free-space-tree instead of temporarily disabling it that would lead
         to a forced read-only mount
      
       - fix alignment check for offset when printing extent item
      
      * tag 'for-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: make clear_cache mount option to rebuild FST without disabling it
        btrfs: zero the buffer before marking it dirty in btrfs_redirty_list_add
        btrfs: zoned: fix full zone super block reading on ZNS
        btrfs: zoned: zone finish data relocation BG with last IO
        btrfs: fix backref walking not returning all inode refs
        btrfs: fix space cache inconsistency after error loading it from disk
        btrfs: print-tree: parent bytenr must be aligned to sector size
      76c7f887