1. 05 May, 2024 29 commits
  2. 04 May, 2024 7 commits
    • Steven Rostedt (Google)'s avatar
      eventfs: Have "events" directory get permissions from its parent · d57cf30c
      Steven Rostedt (Google) authored
      The events directory gets its permissions from the root inode. But this
      can cause an inconsistency if the instances directory changes its
      permissions, as the permissions of the created directories under it should
      inherit the permissions of the instances directory when directories under
      it are created.
      
      Currently the behavior is:
      
       # cd /sys/kernel/tracing
       # chgrp 1002 instances
       # mkdir instances/foo
       # ls -l instances/foo
      [..]
       -r--r-----  1 root lkp  0 May  1 18:55 buffer_total_size_kb
       -rw-r-----  1 root lkp  0 May  1 18:55 current_tracer
       -rw-r-----  1 root lkp  0 May  1 18:55 error_log
       drwxr-xr-x  1 root root 0 May  1 18:55 events
       --w-------  1 root lkp  0 May  1 18:55 free_buffer
       drwxr-x---  2 root lkp  0 May  1 18:55 options
       drwxr-x--- 10 root lkp  0 May  1 18:55 per_cpu
       -rw-r-----  1 root lkp  0 May  1 18:55 set_event
      
      All the files and directories under "foo" has the "lkp" group except the
      "events" directory. That's because its getting its default value from the
      mount point instead of its parent.
      
      Have the "events" directory make its default value based on its parent's
      permissions. That now gives:
      
       # ls -l instances/foo
      [..]
       -rw-r-----  1 root lkp 0 May  1 21:16 buffer_subbuf_size_kb
       -r--r-----  1 root lkp 0 May  1 21:16 buffer_total_size_kb
       -rw-r-----  1 root lkp 0 May  1 21:16 current_tracer
       -rw-r-----  1 root lkp 0 May  1 21:16 error_log
       drwxr-xr-x  1 root lkp 0 May  1 21:16 events
       --w-------  1 root lkp 0 May  1 21:16 free_buffer
       drwxr-x---  2 root lkp 0 May  1 21:16 options
       drwxr-x--- 10 root lkp 0 May  1 21:16 per_cpu
       -rw-r-----  1 root lkp 0 May  1 21:16 set_event
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240502200906.161887248@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: 8186fff7 ("tracefs/eventfs: Use root and instance inodes as default ownership")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      d57cf30c
    • Steven Rostedt (Google)'s avatar
      eventfs: Do not treat events directory different than other directories · 22e61e15
      Steven Rostedt (Google) authored
      Treat the events directory the same as other directories when it comes to
      permissions. The events directory was considered different because it's
      dentry is persistent, whereas the other directory dentries are created
      when accessed. But the way tracefs now does its ownership by using the
      root dentry's permissions as the default permissions, the events directory
      can get out of sync when a remount is performed setting the group and user
      permissions.
      
      Remove the special case for the events directory on setting the
      attributes. This allows the updates caused by remount to work properly as
      well as simplifies the code.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240502200906.002923579@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: 8186fff7 ("tracefs/eventfs: Use root and instance inodes as default ownership")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      22e61e15
    • Steven Rostedt (Google)'s avatar
      eventfs: Do not differentiate the toplevel events directory · d53891d3
      Steven Rostedt (Google) authored
      The toplevel events directory is really no different than the events
      directory of instances. Having the two be different caused
      inconsistencies and made it harder to fix the permissions bugs.
      
      Make all events directories act the same.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240502200905.846448710@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: 8186fff7 ("tracefs/eventfs: Use root and instance inodes as default ownership")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      d53891d3
    • Steven Rostedt (Google)'s avatar
      tracefs: Still use mount point as default permissions for instances · 6599bd55
      Steven Rostedt (Google) authored
      If the instances directory's permissions were never change, then have it
      and its children use the mount point permissions as the default.
      
      Currently, the permissions of instance directories are determined by the
      instance directory's permissions itself. But if the tracefs file system is
      remounted and changes the permissions, the instance directory and its
      children should use the new permission.
      
      But because both the instance directory and its children use the instance
      directory's inode for permissions, it misses the update.
      
      To demonstrate this:
      
        # cd /sys/kernel/tracing/
        # mkdir instances/foo
        # ls -ld instances/foo
       drwxr-x--- 5 root root 0 May  1 19:07 instances/foo
        # ls -ld instances
       drwxr-x--- 3 root root 0 May  1 18:57 instances
        # ls -ld current_tracer
       -rw-r----- 1 root root 0 May  1 18:57 current_tracer
      
        # mount -o remount,gid=1002 .
        # ls -ld instances
       drwxr-x--- 3 root root 0 May  1 18:57 instances
        # ls -ld instances/foo/
       drwxr-x--- 5 root root 0 May  1 19:07 instances/foo/
        # ls -ld current_tracer
       -rw-r----- 1 root lkp 0 May  1 18:57 current_tracer
      
      Notice that changing the group id to that of "lkp" did not affect the
      instances directory nor its children. It should have been:
      
        # ls -ld current_tracer
       -rw-r----- 1 root root 0 May  1 19:19 current_tracer
        # ls -ld instances/foo/
       drwxr-x--- 5 root root 0 May  1 19:25 instances/foo/
        # ls -ld instances
       drwxr-x--- 3 root root 0 May  1 19:19 instances
      
        # mount -o remount,gid=1002 .
        # ls -ld current_tracer
       -rw-r----- 1 root lkp 0 May  1 19:19 current_tracer
        # ls -ld instances
       drwxr-x--- 3 root lkp 0 May  1 19:19 instances
        # ls -ld instances/foo/
       drwxr-x--- 5 root lkp 0 May  1 19:25 instances/foo/
      
      Where all files were updated by the remount gid update.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240502200905.686838327@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: 8186fff7 ("tracefs/eventfs: Use root and instance inodes as default ownership")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      6599bd55
    • Steven Rostedt (Google)'s avatar
      tracefs: Reset permissions on remount if permissions are options · baa23a8d
      Steven Rostedt (Google) authored
      There's an inconsistency with the way permissions are handled in tracefs.
      Because the permissions are generated when accessed, they default to the
      root inode's permission if they were never set by the user. If the user
      sets the permissions, then a flag is set and the permissions are saved via
      the inode (for tracefs files) or an internal attribute field (for
      eventfs).
      
      But if a remount happens that specify the permissions, all the files that
      were not changed by the user gets updated, but the ones that were are not.
      If the user were to remount the file system with a given permission, then
      all files and directories within that file system should be updated.
      
      This can cause security issues if a file's permission was updated but the
      admin forgot about it. They could incorrectly think that remounting with
      permissions set would update all files, but miss some.
      
      For example:
      
       # cd /sys/kernel/tracing
       # chgrp 1002 current_tracer
       # ls -l
      [..]
       -rw-r-----  1 root root 0 May  1 21:25 buffer_size_kb
       -rw-r-----  1 root root 0 May  1 21:25 buffer_subbuf_size_kb
       -r--r-----  1 root root 0 May  1 21:25 buffer_total_size_kb
       -rw-r-----  1 root lkp  0 May  1 21:25 current_tracer
       -rw-r-----  1 root root 0 May  1 21:25 dynamic_events
       -r--r-----  1 root root 0 May  1 21:25 dyn_ftrace_total_info
       -r--r-----  1 root root 0 May  1 21:25 enabled_functions
      
      Where current_tracer now has group "lkp".
      
       # mount -o remount,gid=1001 .
       # ls -l
       -rw-r-----  1 root tracing 0 May  1 21:25 buffer_size_kb
       -rw-r-----  1 root tracing 0 May  1 21:25 buffer_subbuf_size_kb
       -r--r-----  1 root tracing 0 May  1 21:25 buffer_total_size_kb
       -rw-r-----  1 root lkp     0 May  1 21:25 current_tracer
       -rw-r-----  1 root tracing 0 May  1 21:25 dynamic_events
       -r--r-----  1 root tracing 0 May  1 21:25 dyn_ftrace_total_info
       -r--r-----  1 root tracing 0 May  1 21:25 enabled_functions
      
      Everything changed but the "current_tracer".
      
      Add a new link list that keeps track of all the tracefs_inodes which has
      the permission flags that tell if the file/dir should use the root inode's
      permission or not. Then on remount, clear all the flags so that the
      default behavior of using the root inode's permission is done for all
      files and directories.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240502200905.529542160@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: 8186fff7 ("tracefs/eventfs: Use root and instance inodes as default ownership")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      baa23a8d
    • Steven Rostedt (Google)'s avatar
      eventfs: Free all of the eventfs_inode after RCU · ee4e0379
      Steven Rostedt (Google) authored
      The freeing of eventfs_inode via a kfree_rcu() callback. But the content
      of the eventfs_inode was being freed after the last kref. This is
      dangerous, as changes are being made that can access the content of an
      eventfs_inode from an RCU loop.
      
      Instead of using kfree_rcu() use call_rcu() that calls a function to do
      all the freeing of the eventfs_inode after a RCU grace period has expired.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240502200905.370261163@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Fixes: 43aa6f97 ("eventfs: Get rid of dentry pointers without refcounts")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      ee4e0379
    • Steven Rostedt (Google)'s avatar
      eventfs/tracing: Add callback for release of an eventfs_inode · b63db58e
      Steven Rostedt (Google) authored
      Synthetic events create and destroy tracefs files when they are created
      and removed. The tracing subsystem has its own file descriptor
      representing the state of the events attached to the tracefs files.
      There's a race between the eventfs files and this file descriptor of the
      tracing system where the following can cause an issue:
      
      With two scripts 'A' and 'B' doing:
      
        Script 'A':
          echo "hello int aaa" > /sys/kernel/tracing/synthetic_events
          while :
          do
            echo 0 > /sys/kernel/tracing/events/synthetic/hello/enable
          done
      
        Script 'B':
          echo > /sys/kernel/tracing/synthetic_events
      
      Script 'A' creates a synthetic event "hello" and then just writes zero
      into its enable file.
      
      Script 'B' removes all synthetic events (including the newly created
      "hello" event).
      
      What happens is that the opening of the "enable" file has:
      
       {
      	struct trace_event_file *file = inode->i_private;
      	int ret;
      
      	ret = tracing_check_open_get_tr(file->tr);
       [..]
      
      But deleting the events frees the "file" descriptor, and a "use after
      free" happens with the dereference at "file->tr".
      
      The file descriptor does have a reference counter, but there needs to be a
      way to decrement it from the eventfs when the eventfs_inode is removed
      that represents this file descriptor.
      
      Add an optional "release" callback to the eventfs_entry array structure,
      that gets called when the eventfs file is about to be removed. This allows
      for the creating on the eventfs file to increment the tracing file
      descriptor ref counter. When the eventfs file is deleted, it can call the
      release function that will call the put function for the tracing file
      descriptor.
      
      This will protect the tracing file from being freed while a eventfs file
      that references it is being opened.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240426073410.17154-1-Tze-nan.Wu@mediatek.com/
      Link: https://lore.kernel.org/linux-trace-kernel/20240502090315.448cba46@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Fixes: 5790b1fb ("eventfs: Remove eventfs_file and just use eventfs_inode")
      Reported-by: default avatarTze-nan wu <Tze-nan.Wu@mediatek.com>
      Tested-by: default avatarTze-nan Wu (吳澤南) <Tze-nan.Wu@mediatek.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b63db58e
  3. 03 May, 2024 4 commits
    • Linus Torvalds's avatar
      Merge tag 'cxl-fixes-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl · 7367539a
      Linus Torvalds authored
      Pull cxl fix from Dave Jiang:
       "Add missing RCH support for endpoint access_coordinate calculation.
      
        A late bug was reported by Robert Richter that the Restricted CXL Host
        (RCH) support was missing in the CXL endpoint access_coordinate
        calculation.
      
        The missing support causes the topology iterator to stumble over a
        NULL pointer and triggers a kernel OOPS on a platform with CXL 1.1
        support.
      
        The fix bypasses RCH topology as the access_coordinate calculation is
        not necessary since RCH does not support hotplug and the memory region
        exported should be covered by the HMAT table already.
      
        A unit test is also added to cxl_test to check against future
        regressions on the topology iterator"
      
      * tag 'cxl-fixes-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
        cxl: Fix cxl_endpoint_get_perf_coordinate() support for RCH
      7367539a
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.9a-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · ddb4c3f2
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
       "Two fixes when running as Xen PV guests for issues introduced in the
        6.9 merge window, both related to apic id handling"
      
      * tag 'for-linus-6.9a-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        x86/xen: return a sane initial apic id when running as PV guest
        x86/xen/smp_pv: Register the boot CPU APIC properly
      ddb4c3f2
    • Linus Torvalds's avatar
      Merge tag 'efi-urgent-for-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · f094ee78
      Linus Torvalds authored
      Pull EFI fix from Ard Biesheuvel:
       "This works around a shortcoming in the memory acceptation API, which
        may apparently hog the CPU for long enough to trigger the softlockup
        watchdog.
      
        Note that this only affects confidential VMs running under the Intel
        TDX hypervisor, which is why I accepted this for now, but this should
        obviously be fixed properly in the future"
      
      * tag 'efi-urgent-for-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        efi/unaccepted: touch soft lockup during memory accept
      f094ee78
    • Linus Torvalds's avatar
      Merge tag 'block-6.9-20240503' of git://git.kernel.dk/linux · 3d25a941
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Nothing major in here - an nvme pull request with mostly auth/tcp
        fixes, and a single fix for ublk not setting segment count and size
        limits"
      
      * tag 'block-6.9-20240503' of git://git.kernel.dk/linux:
        nvme-tcp: strict pdu pacing to avoid send stalls on TLS
        nvmet: fix nvme status code when namespace is disabled
        nvmet-tcp: fix possible memory leak when tearing down a controller
        nvme: cancel pending I/O if nvme controller is in terminal state
        nvmet-auth: replace pr_debug() with pr_err() to report an error.
        nvmet-auth: return the error code to the nvmet_auth_host_hash() callers
        nvme: find numa distance only if controller has valid numa id
        ublk: remove segment count and size limits
        nvme: fix warn output about shared namespaces without CONFIG_NVME_MULTIPATH
      3d25a941