1. 30 Jan, 2024 3 commits
    • Neel Natu's avatar
      kernfs: fix false-positive WARN(nr_mmapped) in kernfs_drain_open_files · 05d8f255
      Neel Natu authored
      Prior to this change 'on->nr_mmapped' tracked the total number of
      mmaps across all of its associated open files via kernfs_fop_mmap().
      Thus if the file descriptor associated with a kernfs_open_file was
      mmapped 10 times then we would have: 'of->mmapped = true' and
      'of_on(of)->nr_mmapped = 10'.
      
      The problem is that closing or draining a 'of->mmapped' file would
      only decrement one from the 'of_on(of)->nr_mmapped' counter.
      
      For e.g. we have this from kernfs_unlink_open_file():
              if (of->mmapped)
                      on->nr_mmapped--;
      
      The WARN_ON_ONCE(on->nr_mmapped) in kernfs_drain_open_files() is
      easy to reproduce by:
      1. opening a (mmap-able) kernfs file.
      2. mmap-ing that file more than once (mapping just once masks the issue).
      3. trigger a drain of that kernfs file.
      
      Modulo out-of-tree patches I was able to trigger this reliably by
      identifying pci device nodes in sysfs that have resource regions
      that are mmap-able and that don't have any driver attached to them
      (steps 1 and 2). For step 3 we can "echo 1 > remove" to trigger a
      kernfs_drain.
      Signed-off-by: default avatarNeel Natu <neelnatu@google.com>
      Link: https://lore.kernel.org/r/20240127234636.609265-1-neelnatu@google.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05d8f255
    • Tejun Heo's avatar
      kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in kernfs_find_and_get_node_by_id() · 4207b556
      Tejun Heo authored
      The BPF helper bpf_cgroup_from_id() calls kernfs_find_and_get_node_by_id()
      which acquires kernfs_idr_lock, which is an non-raw non-IRQ-safe lock. This
      can lead to deadlocks as bpf_cgroup_from_id() can be called from any BPF
      programs including e.g. the ones that attach to functions which are holding
      the scheduler rq lock.
      
      Consider the following BPF program:
      
        SEC("fentry/__set_cpus_allowed_ptr_locked")
        int BPF_PROG(__set_cpus_allowed_ptr_locked, struct task_struct *p,
      	       struct affinity_context *affn_ctx, struct rq *rq, struct rq_flags *rf)
        {
      	  struct cgroup *cgrp = bpf_cgroup_from_id(p->cgroups->dfl_cgrp->kn->id);
      
      	  if (cgrp) {
      		  bpf_printk("%d[%s] in %s", p->pid, p->comm, cgrp->kn->name);
      		  bpf_cgroup_release(cgrp);
      	  }
      	  return 0;
        }
      
      __set_cpus_allowed_ptr_locked() is called with rq lock held and the above
      BPF program calls bpf_cgroup_from_id() within leading to the following
      lockdep warning:
      
        =====================================================
        WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
        6.7.0-rc3-work-00053-g07124366a1d7-dirty #147 Not tainted
        -----------------------------------------------------
        repro/1620 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
        ffffffff833b3688 (kernfs_idr_lock){+.+.}-{2:2}, at: kernfs_find_and_get_node_by_id+0x1e/0x70
      
      		and this task is already holding:
        ffff888237ced698 (&rq->__lock){-.-.}-{2:2}, at: task_rq_lock+0x4e/0xf0
        which would create a new lock dependency:
         (&rq->__lock){-.-.}-{2:2} -> (kernfs_idr_lock){+.+.}-{2:2}
        ...
         Possible interrupt unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(kernfs_idr_lock);
      				 local_irq_disable();
      				 lock(&rq->__lock);
      				 lock(kernfs_idr_lock);
          <Interrupt>
            lock(&rq->__lock);
      
      		 *** DEADLOCK ***
        ...
        Call Trace:
         dump_stack_lvl+0x55/0x70
         dump_stack+0x10/0x20
         __lock_acquire+0x781/0x2a40
         lock_acquire+0xbf/0x1f0
         _raw_spin_lock+0x2f/0x40
         kernfs_find_and_get_node_by_id+0x1e/0x70
         cgroup_get_from_id+0x21/0x240
         bpf_cgroup_from_id+0xe/0x20
         bpf_prog_98652316e9337a5a___set_cpus_allowed_ptr_locked+0x96/0x11a
         bpf_trampoline_6442545632+0x4f/0x1000
         __set_cpus_allowed_ptr_locked+0x5/0x5a0
         sched_setaffinity+0x1b3/0x290
         __x64_sys_sched_setaffinity+0x4f/0x60
         do_syscall_64+0x40/0xe0
         entry_SYSCALL_64_after_hwframe+0x46/0x4e
      
      Let's fix it by protecting kernfs_node and kernfs_root with RCU and making
      kernfs_find_and_get_node_by_id() acquire rcu_read_lock() instead of
      kernfs_idr_lock.
      
      This adds an rcu_head to kernfs_node making it larger by 16 bytes on 64bit.
      Combined with the preceding rearrange patch, the net increase is 8 bytes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Andrea Righi <andrea.righi@canonical.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Link: https://lore.kernel.org/r/20240109214828.252092-4-tj@kernel.orgSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4207b556
    • Tejun Heo's avatar
      kernfs: Rearrange kernfs_node fields to reduce its size on 64bit · 1c9f2c76
      Tejun Heo authored
      Moving .flags and .mode right below .hash makes kernfs_node smaller by 8
      bytes on 64bit. To avoid creating a hole from 8 bytes alignment on 32bit
      archs, .priv is moved below so that there are two 32bit pointers after the
      64bit .id field.
      
      v2: Updated to avoid size increase on 32bit noticed by Geert.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Link: https://lore.kernel.org/r/ZZ7hwA18nfmFjYpj@slm.duckdns.orgSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c9f2c76
  2. 29 Jan, 2024 1 commit
  3. 28 Jan, 2024 7 commits
  4. 27 Jan, 2024 9 commits
  5. 26 Jan, 2024 20 commits