1. 02 Dec, 2022 2 commits
    • Kuniyuki Iwashima's avatar
      seccomp: Move copy_seccomp() to no failure path. · a1140cb2
      Kuniyuki Iwashima authored
      Our syzbot instance reported memory leaks in do_seccomp() [0], similar
      to the report [1].  It shows that we miss freeing struct seccomp_filter
      and some objects included in it.
      
      We can reproduce the issue with the program below [2] which calls one
      seccomp() and two clone() syscalls.
      
      The first clone()d child exits earlier than its parent and sends a
      signal to kill it during the second clone(), more precisely before the
      fatal_signal_pending() test in copy_process().  When the parent receives
      the signal, it has to destroy the embryonic process and return -EINTR to
      user space.  In the failure path, we have to call seccomp_filter_release()
      to decrement the filter's refcount.
      
      Initially, we called it in free_task() called from the failure path, but
      the commit 3a15fb6e ("seccomp: release filter after task is fully
      dead") moved it to release_task() to notify user space as early as possible
      that the filter is no longer used.
      
      To keep the change and current seccomp refcount semantics, let's move
      copy_seccomp() just after the signal check and add a WARN_ON_ONCE() in
      free_task() for future debugging.
      
      [0]:
      unreferenced object 0xffff8880063add00 (size 256):
        comm "repro_seccomp", pid 230, jiffies 4294687090 (age 9.914s)
        hex dump (first 32 bytes):
          01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
          ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
        backtrace:
          do_seccomp (./include/linux/slab.h:600 ./include/linux/slab.h:733 kernel/seccomp.c:666 kernel/seccomp.c:708 kernel/seccomp.c:1871 kernel/seccomp.c:1991)
          do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
          entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      unreferenced object 0xffffc90000035000 (size 4096):
        comm "repro_seccomp", pid 230, jiffies 4294687090 (age 9.915s)
        hex dump (first 32 bytes):
          01 00 00 00 00 00 00 00 00 00 00 00 05 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          __vmalloc_node_range (mm/vmalloc.c:3226)
          __vmalloc_node (mm/vmalloc.c:3261 (discriminator 4))
          bpf_prog_alloc_no_stats (kernel/bpf/core.c:91)
          bpf_prog_alloc (kernel/bpf/core.c:129)
          bpf_prog_create_from_user (net/core/filter.c:1414)
          do_seccomp (kernel/seccomp.c:671 kernel/seccomp.c:708 kernel/seccomp.c:1871 kernel/seccomp.c:1991)
          do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
          entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      unreferenced object 0xffff888003fa1000 (size 1024):
        comm "repro_seccomp", pid 230, jiffies 4294687090 (age 9.915s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          bpf_prog_alloc_no_stats (./include/linux/slab.h:600 ./include/linux/slab.h:733 kernel/bpf/core.c:95)
          bpf_prog_alloc (kernel/bpf/core.c:129)
          bpf_prog_create_from_user (net/core/filter.c:1414)
          do_seccomp (kernel/seccomp.c:671 kernel/seccomp.c:708 kernel/seccomp.c:1871 kernel/seccomp.c:1991)
          do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
          entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      unreferenced object 0xffff888006360240 (size 16):
        comm "repro_seccomp", pid 230, jiffies 4294687090 (age 9.915s)
        hex dump (first 16 bytes):
          01 00 37 00 76 65 72 6c e0 83 01 06 80 88 ff ff  ..7.verl........
        backtrace:
          bpf_prog_store_orig_filter (net/core/filter.c:1137)
          bpf_prog_create_from_user (net/core/filter.c:1428)
          do_seccomp (kernel/seccomp.c:671 kernel/seccomp.c:708 kernel/seccomp.c:1871 kernel/seccomp.c:1991)
          do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
          entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      unreferenced object 0xffff8880060183e0 (size 8):
        comm "repro_seccomp", pid 230, jiffies 4294687090 (age 9.915s)
        hex dump (first 8 bytes):
          06 00 00 00 00 00 ff 7f                          ........
        backtrace:
          kmemdup (mm/util.c:129)
          bpf_prog_store_orig_filter (net/core/filter.c:1144)
          bpf_prog_create_from_user (net/core/filter.c:1428)
          do_seccomp (kernel/seccomp.c:671 kernel/seccomp.c:708 kernel/seccomp.c:1871 kernel/seccomp.c:1991)
          do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
          entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      
      [1]: https://syzkaller.appspot.com/bug?id=2809bb0ac77ad9aa3f4afe42d6a610aba594a987
      
      [2]:
      #define _GNU_SOURCE
      #include <sched.h>
      #include <signal.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      #include <linux/filter.h>
      #include <linux/seccomp.h>
      
      void main(void)
      {
      	struct sock_filter filter[] = {
      		BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
      	};
      	struct sock_fprog fprog = {
      		.len = sizeof(filter) / sizeof(filter[0]),
      		.filter = filter,
      	};
      	long i, pid;
      
      	syscall(__NR_seccomp, SECCOMP_SET_MODE_FILTER, 0, &fprog);
      
      	for (i = 0; i < 2; i++) {
      		pid = syscall(__NR_clone, CLONE_NEWNET | SIGKILL, NULL, NULL, 0);
      		if (pid == 0)
      			return;
      	}
      }
      
      Fixes: 3a15fb6e ("seccomp: release filter after task is fully dead")
      Reported-by: syzbot+ab17848fe269b573eb71@syzkaller.appspotmail.com
      Reported-by: default avatarAyushman Dutta <ayudutta@amazon.com>
      Suggested-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220823154532.82913-1-kuniyu@amazon.com
      a1140cb2
    • Gautam Menghani's avatar
      selftests/seccomp: Check CAP_SYS_ADMIN capability in the test mode_filter_without_nnp · fc1e3980
      Gautam Menghani authored
      In the "mode_filter_without_nnp" test in seccomp_bpf, there is currently
      a TODO which asks to check the capability CAP_SYS_ADMIN instead of euid.
      This patch adds support to check if the calling process has the flag
      CAP_SYS_ADMIN, and also if this flag has CAP_EFFECTIVE set.
      Signed-off-by: default avatarGautam Menghani <gautammenghani201@gmail.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220731092529.28760-1-gautammenghani201@gmail.com
      fc1e3980
  2. 23 Oct, 2022 9 commits
  3. 22 Oct, 2022 21 commits
  4. 21 Oct, 2022 8 commits
    • Linus Torvalds's avatar
      Merge tag '6.1-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · bd8e9634
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
      
       - memory leak fixes
      
       - fixes for directory leases, including an important one which fixes a
         problem noticed by git functional tests
      
       - fixes relating to missing free_xid calls (helpful for
         tracing/debugging of entry/exit into cifs.ko)
      
       - a multichannel fix
      
       - a small cleanup fix (use of list_move instead of list_del/list_add)
      
      * tag '6.1-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: update internal module number
        cifs: fix memory leaks in session setup
        cifs: drop the lease for cached directories on rmdir or rename
        smb3: interface count displayed incorrectly
        cifs: Fix memory leak when build ntlmssp negotiate blob failed
        cifs: set rc to -ENOENT if we can not get a dentry for the cached dir
        cifs: use LIST_HEAD() and list_move() to simplify code
        cifs: Fix xid leak in cifs_get_file_info_unix()
        cifs: Fix xid leak in cifs_ses_add_channel()
        cifs: Fix xid leak in cifs_flock()
        cifs: Fix xid leak in cifs_copy_file_range()
        cifs: Fix xid leak in cifs_create()
      bd8e9634
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · 022c028f
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
       "Fixes for patches merged in v6.1"
      
      * tag 'nfsd-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        nfsd: ensure we always call fh_verify_error tracepoint
        NFSD: unregister shrinker when nfsd_init_net() fails
      022c028f
    • Chang S. Bae's avatar
      x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly · 471f0aa7
      Chang S. Bae authored
      When an extended state component is not present in fpstate, but in init
      state, the function copies from init_fpstate via copy_feature().
      
      But, dynamic states are not present in init_fpstate because of all-zeros
      init states. Then retrieving them from init_fpstate will explode like this:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000000
       ...
       RIP: 0010:memcpy_erms+0x6/0x10
        ? __copy_xstate_to_uabi_buf+0x381/0x870
        fpu_copy_guest_fpstate_to_uabi+0x28/0x80
        kvm_arch_vcpu_ioctl+0x14c/0x1460 [kvm]
        ? __this_cpu_preempt_check+0x13/0x20
        ? vmx_vcpu_put+0x2e/0x260 [kvm_intel]
        kvm_vcpu_ioctl+0xea/0x6b0 [kvm]
        ? kvm_vcpu_ioctl+0xea/0x6b0 [kvm]
        ? __fget_light+0xd4/0x130
        __x64_sys_ioctl+0xe3/0x910
        ? debug_smp_processor_id+0x17/0x20
        ? fpregs_assert_state_consistent+0x27/0x50
        do_syscall_64+0x3f/0x90
        entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Adjust the 'mask' to zero out the userspace buffer for the features that
      are not available both from fpstate and from init_fpstate.
      
      The dynamic features depend on the compacted XSAVE format. Ensure it is
      enabled before reading XCOMP_BV in init_fpstate.
      
      Fixes: 2308ee57 ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
      Reported-by: default avatarYuan Yao <yuan.yao@intel.com>
      Suggested-by: default avatarDave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarChang S. Bae <chang.seok.bae@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Tested-by: default avatarYuan Yao <yuan.yao@intel.com>
      Link: https://lore.kernel.org/lkml/BYAPR11MB3717EDEF2351C958F2C86EED95259@BYAPR11MB3717.namprd11.prod.outlook.com/
      Link: https://lkml.kernel.org/r/20221021185844.13472-1-chang.seok.bae@intel.com
      471f0aa7
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · ed537795
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Two small changes, one in the lpfc driver and the other in the core.
      
        The core change is an additional footgun guard which prevents users
        from writing the wrong state to sysfs and causing a hang"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: lpfc: Fix memory leak in lpfc_create_port()
        scsi: core: Restrict legal sdev_state transitions via sysfs
      ed537795
    • Linus Torvalds's avatar
      Merge tag 'block-6.1-2022-10-20' of git://git.kernel.dk/linux · d4b7332e
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - NVMe pull request via Christoph:
            - fix nvme-hwmon for DMA non-cohehrent architectures (Serge Semin)
            - add a nvme-hwmong maintainer (Christoph Hellwig)
            - fix error pointer dereference in error handling (Dan Carpenter)
            - fix invalid memory reference in nvmet_subsys_attr_qid_max_show
              (Daniel Wagner)
            - don't limit the DMA segment size in nvme-apple (Russell King)
            - fix workqueue MEM_RECLAIM flushing dependency (Sagi Grimberg)
            - disable write zeroes on various Kingston SSDs (Xander Li)
      
       - fix a memory leak with block device tracing (Ye)
      
       - flexible-array fix for ublk (Yushan)
      
       - document the ublk recovery feature from this merge window
         (ZiyangZhang)
      
       - remove dead bfq variable in struct (Yuwei)
      
       - error handling rq clearing fix (Yu)
      
       - add an IRQ safety check for the cached bio freeing (Pavel)
      
       - drbd bio cloning fix (Christoph)
      
      * tag 'block-6.1-2022-10-20' of git://git.kernel.dk/linux:
        blktrace: remove unnessary stop block trace in 'blk_trace_shutdown'
        blktrace: fix possible memleak in '__blk_trace_remove'
        blktrace: introduce 'blk_trace_{start,stop}' helper
        bio: safeguard REQ_ALLOC_CACHE bio put
        block, bfq: remove unused variable for bfq_queue
        drbd: only clone bio if we have a backing device
        ublk_drv: use flexible-array member instead of zero-length array
        nvmet: fix invalid memory reference in nvmet_subsys_attr_qid_max_show
        nvmet: fix workqueue MEM_RECLAIM flushing dependency
        nvme-hwmon: kmalloc the NVME SMART log buffer
        nvme-hwmon: consistently ignore errors from nvme_hwmon_init
        nvme: add Guenther as nvme-hwmon maintainer
        nvme-apple: don't limit DMA segement size
        nvme-pci: disable write zeroes on various Kingston SSD
        nvme: fix error pointer dereference in error handling
        Documentation: document ublk user recovery feature
        blk-mq: fix null pointer dereference in blk_mq_clear_rq_mapping()
      d4b7332e
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.1-2022-10-20' of git://git.kernel.dk/linux · 294e73ff
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - Fix a potential memory leak in the error handling path of io-wq setup
         (Rafael)
      
       - Kill an errant debug statement that got added in this release (me)
      
       - Fix an oops with an invalid direct descriptor with IORING_OP_MSG_RING
         (Harshit)
      
       - Remove unneeded FFS_SCM flagging (Pavel)
      
       - Remove polling off the exit path (Pavel)
      
       - Move out direct descriptor debug check to the cleanup path (Pavel)
      
       - Use the proper helper rather than open-coding cached request get
         (Pavel)
      
      * tag 'io_uring-6.1-2022-10-20' of git://git.kernel.dk/linux:
        io-wq: Fix memory leak in worker creation
        io_uring/msg_ring: Fix NULL pointer dereference in io_msg_send_fd()
        io_uring/rw: remove leftover debug statement
        io_uring: don't iopoll from io_ring_ctx_wait_and_kill()
        io_uring: reuse io_alloc_req()
        io_uring: kill hot path fixed file bitmap debug checks
        io_uring: remove FFS_SCM
      294e73ff
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.1-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 1d61754c
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
       "Just two fixes for the new 'virtio with grants' feature"
      
      * tag 'for-linus-6.1-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/virtio: Convert PAGE_SIZE/PAGE_SHIFT/PFN_UP to Xen counterparts
        xen/virtio: Handle cases when page offset > PAGE_SIZE properly
      1d61754c
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20221020' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 0de0b768
      Linus Torvalds authored
      Pull selinux fix from Paul Moore:
       "A small SELinux fix for a GFP_KERNEL allocation while a spinlock is
        held.
      
        The patch, while still fairly small, is a bit larger than one might
        expect from a simple s/GFP_KERNEL/GFP_ATOMIC/ conversion because we
        added support for the function to be called with different gfp flags
        depending on the context, preserving GFP_KERNEL for those cases that
        can safely sleep"
      
      * tag 'selinux-pr-20221020' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: enable use of both GFP_KERNEL and GFP_ATOMIC in convert_context()
      0de0b768