1. 22 Oct, 2023 25 commits
  2. 19 Oct, 2023 10 commits
  3. 12 Sep, 2023 5 commits
    • Kent Overstreet's avatar
      lib: Export errname · 21db9314
      Kent Overstreet authored
      errname() returns the name of an errcode; this functionality is
      otherwise only available for error pointers via %pE - bcachefs uses this
      for better error messages.
      Signed-off-by: default avatarChristopher James Halse Rogers <raof@ubuntu.com>
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      21db9314
    • Kent Overstreet's avatar
      lib/string_helpers: string_get_size() now returns characters wrote · 83feeb19
      Kent Overstreet authored
      printbuf now needs to know the number of characters that would have been
      written if the buffer was too small, like snprintf(); this changes
      string_get_size() to return the the return value of snprintf().
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      83feeb19
    • Christopher James Halse Rogers's avatar
      stacktrace: Export stack_trace_save_tsk · 7d672f40
      Christopher James Halse Rogers authored
      The bcachefs module wants it, and there doesn't seem to be any
      reason it shouldn't be exported like the other functions.
      Signed-off-by: default avatarChristopher James Halse Rogers <raof@ubuntu.com>
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      7d672f40
    • Kent Overstreet's avatar
      fs: factor out d_mark_tmpfile() · 771eb4fe
      Kent Overstreet authored
      New helper for bcachefs - bcachefs doesn't want the
      inode_dec_link_count() call that d_tmpfile does, it handles i_nlink on
      its own atomically with other btree updates
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      771eb4fe
    • Kent Overstreet's avatar
      sched: Add task_struct->faults_disabled_mapping · 2b69987b
      Kent Overstreet authored
      There has been a long standing page cache coherence bug with direct IO.
      This provides part of a mechanism to fix it, currently just used by
      bcachefs but potentially worth promoting to the VFS.
      
      Direct IO evicts the range of the pagecache being read or written to.
      
      For reads, we need dirty pages to be written to disk, so that the read
      doesn't return stale data. For writes, we need to evict that range of
      the pagecache so that it's not stale after the write completes.
      
      However, without a locking mechanism to prevent those pages from being
      re-added to the pagecache - by a buffered read or page fault - page
      cache inconsistency is still possible.
      
      This isn't necessarily just an issue for userspace when they're playing
      games; filesystems may hang arbitrary state off the pagecache, and so
      page cache inconsistency may cause real filesystem bugs, depending on
      the filesystem. This is less of an issue for iomap based filesystems,
      but e.g. buffer heads caches disk block mappings (!) and attaches them
      to the pagecache, and bcachefs attaches disk reservations to pagecache
      pages.
      
      This issue has been hard to fix, because
       - we need to add a lock (henceforth called pagecache_add_lock), which
         would be held for the duration of the direct IO
       - page faults add pages to the page cache, thus need to take the same
         lock
       - dio -> gup -> page fault thus can deadlock
      
      And we cannot enforce a lock ordering with this lock, since userspace
      will be controlling the lock ordering (via the fd and buffer arguments
      to direct IOs), so we need a different method of deadlock avoidance.
      
      We need to tell the page fault handler that we're already holding a
      pagecache_add_lock, and since plumbing it through the entire gup() path
      would be highly impractical this adds a field to task_struct.
      
      Then the full method is:
       - in the dio path, when we first take the pagecache_add_lock, note the
         mapping in the current task_struct
       - in the page fault handler, if faults_disabled_mapping is set, we
         check if it's the same mapping as the one we're taking a page fault
         for, and if so return an error.
      
         Then we check lock ordering: if there's a lock ordering violation and
         trylock fails, we'll have to cycle the locks and return an error that
         tells the DIO path to retry: faults_disabled_mapping is also used for
         signalling "locks were dropped, please retry".
      
      Also relevant to this patch: mapping->invalidate_lock.
      mapping->invalidate_lock provides most of the required semantics - it's
      used by truncate/fallocate to block pages being added to the pagecache.
      However, since it's a rwsem, direct IOs would need to take the write
      side in order to block page cache adds, and would then be exclusive with
      each other - we'll need a new type of lock to pair with this approach.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Darrick J. Wong <djwong@kernel.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Andreas Grünbacher <andreas.gruenbacher@gmail.com>
      2b69987b