1. 12 Jun, 2021 1 commit
  2. 11 Jun, 2021 13 commits
  3. 10 Jun, 2021 13 commits
    • Eric W. Biederman's avatar
      coredump: Limit what can interrupt coredumps · 06af8679
      Eric W. Biederman authored
      Olivier Langlois has been struggling with coredumps being incompletely written in
      processes using io_uring.
      
      Olivier Langlois <olivier@trillion01.com> writes:
      > io_uring is a big user of task_work and any event that io_uring made a
      > task waiting for that occurs during the core dump generation will
      > generate a TIF_NOTIFY_SIGNAL.
      >
      > Here are the detailed steps of the problem:
      > 1. io_uring calls vfs_poll() to install a task to a file wait queue
      >    with io_async_wake() as the wakeup function cb from io_arm_poll_handler()
      > 2. wakeup function ends up calling task_work_add() with TWA_SIGNAL
      > 3. task_work_add() sets the TIF_NOTIFY_SIGNAL bit by calling
      >    set_notify_signal()
      
      The coredump code deliberately supports being interrupted by SIGKILL,
      and depends upon prepare_signal to filter out all other signals.   Now
      that signal_pending includes wake ups for TIF_NOTIFY_SIGNAL this hack
      in dump_emitted by the coredump code no longer works.
      
      Make the coredump code more robust by explicitly testing for all of
      the wakeup conditions the coredump code supports.  This prevents
      new wakeup conditions from breaking the coredump code, as well
      as fixing the current issue.
      
      The filesystem code that the coredump code uses already limits
      itself to only aborting on fatal_signal_pending.  So it should
      not develop surprising wake-up reasons either.
      
      v2: Don't remove the now unnecessary code in prepare_signal.
      
      Cc: stable@vger.kernel.org
      Fixes: 12db8b69 ("entry: Add support for TIF_NOTIFY_SIGNAL")
      Reported-by: default avatarOlivier Langlois <olivier@trillion01.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      06af8679
    • Linus Torvalds's avatar
      Merge branch 'for-5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · f09eacca
      Linus Torvalds authored
      Pull cgroup fix from Tejun Heo:
       "This is a high priority but low risk fix for a cgroup1 bug where
        rename(2) can change a cgroup's name to something which can break
        parsing of /proc/PID/cgroup"
      
      * 'for-5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup1: don't allow '\n' in renaming
      f09eacca
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 29a877d5
      Linus Torvalds authored
      Pull rdma fixes from Jason Gunthorpe:
       "A mixture of small bug fixes and a small security issue:
      
         - WARN_ON when IPoIB is automatically moved between namespaces
      
         - Long standing bug where mlx5 would use the wrong page for the
           doorbell recovery memory if fork is used
      
         - Security fix for mlx4 that disables the timestamp feature
      
         - Several crashers for mlx5
      
         - Plug a recent mlx5 memory leak for the sig_mr"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        IB/mlx5: Fix initializing CQ fragments buffer
        RDMA/mlx5: Delete right entry from MR signature database
        RDMA: Verify port when creating flow rule
        RDMA/mlx5: Block FDB rules when not in switchdev mode
        RDMA/mlx4: Do not map the core_clock page to user space unless enabled
        RDMA/mlx5: Use different doorbell memory for different processes
        RDMA/ipoib: Fix warning caused by destroying non-initial netns
      29a877d5
    • Robert Marko's avatar
      hwmon: (tps23861) correct shunt LSB values · e13d1127
      Robert Marko authored
      Current shunt LSB values got reversed during in the
      original driver commit.
      
      So, correct the current shunt LSB values according to
      the datasheet.
      
      This caused reading slightly skewed current values.
      
      Fixes: fff7b8ab ("hwmon: add Texas Instruments TPS23861 driver")
      Signed-off-by: default avatarRobert Marko <robert.marko@sartura.hr>
      Link: https://lore.kernel.org/r/20210609220728.499879-3-robert.marko@sartura.hrSigned-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      e13d1127
    • Robert Marko's avatar
      hwmon: (tps23861) set current shunt value · b325d352
      Robert Marko authored
      TPS23861 has a configuration bit for setting of the
      current shunt value used on the board.
      Its bit 0 of the General Mask 1 register.
      
      According to the datasheet bit values are:
      0 for 255 mOhm (Default)
      1 for 250 mOhm
      
      So, configure the bit before registering the hwmon
      device according to the value passed in the DTS or
      default one if none is passed.
      
      This caused potentially reading slightly skewed values
      due to max current value being 1.02A when 250mOhm shunt
      is used instead of 1.0A when 255mOhm is used.
      
      Fixes: fff7b8ab ("hwmon: add Texas Instruments TPS23861 driver")
      Signed-off-by: default avatarRobert Marko <robert.marko@sartura.hr>
      Link: https://lore.kernel.org/r/20210609220728.499879-2-robert.marko@sartura.hrSigned-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      b325d352
    • Robert Marko's avatar
      hwmon: (tps23861) define regmap max register · fb8543fb
      Robert Marko authored
      Define the max register address the device supports.
      This allows reading the whole register space via
      regmap debugfs, without it only register 0x0 is visible.
      
      This was forgotten in the original driver commit.
      
      Fixes: fff7b8ab ("hwmon: add Texas Instruments TPS23861 driver")
      Signed-off-by: default avatarRobert Marko <robert.marko@sartura.hr>
      Link: https://lore.kernel.org/r/20210609220728.499879-1-robert.marko@sartura.hrSigned-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      fb8543fb
    • Takashi Iwai's avatar
      ALSA: seq: Fix race of snd_seq_timer_open() · 83e197a8
      Takashi Iwai authored
      The timer instance per queue is exclusive, and snd_seq_timer_open()
      should have managed the concurrent accesses.  It looks as if it's
      checking the already existing timer instance at the beginning, but
      it's not right, because there is no protection, hence any later
      concurrent call of snd_seq_timer_open() may override the timer
      instance easily.  This may result in UAF, as the leftover timer
      instance can keep running while the queue itself gets closed, as
      spotted by syzkaller recently.
      
      For avoiding the race, add a proper check at the assignment of
      tmr->timeri again, and return -EBUSY if it's been already registered.
      
      Reported-by: syzbot+ddc1260a83ed1cbf6fb5@syzkaller.appspotmail.com
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/000000000000dce34f05c42f110c@google.com
      Link: https://lore.kernel.org/r/20210610152059.24633-1-tiwai@suse.deSigned-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      83e197a8
    • Stephen Boyd's avatar
      drm/msm/dsi: Stash away calculated vco frequency on recalc · 170b7635
      Stephen Boyd authored
      A problem was reported on CoachZ devices where the display wouldn't come
      up, or it would be distorted. It turns out that the PLL code here wasn't
      getting called once dsi_pll_10nm_vco_recalc_rate() started returning the
      same exact frequency, down to the Hz, that the bootloader was setting
      instead of 0 when the clk was registered with the clk framework.
      
      After commit 001d8dc3 ("drm/msm/dsi: remove temp data from global
      pll structure") we use a hardcoded value for the parent clk frequency,
      i.e.  VCO_REF_CLK_RATE, and we also hardcode the value for FRAC_BITS,
      instead of getting it from the config structure. This combination of
      changes to the recalc function allows us to properly calculate the
      frequency of the PLL regardless of whether or not the PLL has been
      clk_prepare()d or clk_set_rate()d. That's a good improvement.
      
      Unfortunately, this means that now we won't call down into the PLL clk
      driver when we call clk_set_rate() because the frequency calculated in
      the framework matches the frequency that is set in hardware. If the rate
      is the same as what we want it should be OK to not call the set_rate PLL
      op. The real problem is that the prepare op in this driver uses a
      private struct member to stash away the vco frequency so that it can
      call the set_rate op directly during prepare. Once the set_rate op is
      never called because recalc_rate told us the rate is the same, we don't
      set this private struct member before the prepare op runs, so we try to
      call the set_rate function directly with a frequency of 0. This
      effectively kills the PLL and configures it for a rate that won't work.
      Calling set_rate from prepare is really quite bad and will confuse any
      downstream clks about what the rate actually is of their parent. Fixing
      that will be a rather large change though so we leave that to later.
      
      For now, let's stash away the rate we calculate during recalc so that
      the prepare op knows what frequency to set, instead of 0. This way
      things keep working and the display can enable the PLL properly. In the
      future, we should remove that code from the prepare op so that it
      doesn't even try to call the set rate function.
      
      Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
      Cc: Abhinav Kumar <abhinavk@codeaurora.org>
      Fixes: 001d8dc3 ("drm/msm/dsi: remove temp data from global pll structure")
      Signed-off-by: default avatarStephen Boyd <swboyd@chromium.org>
      Link: https://lore.kernel.org/r/20210608195519.125561-1-swboyd@chromium.orgSigned-off-by: default avatarRob Clark <robdclark@chromium.org>
      170b7635
    • Alexander Kuznetsov's avatar
      cgroup1: don't allow '\n' in renaming · b7e24eb1
      Alexander Kuznetsov authored
      cgroup_mkdir() have restriction on newline usage in names:
      $ mkdir $'/sys/fs/cgroup/cpu/test\ntest2'
      mkdir: cannot create directory
      '/sys/fs/cgroup/cpu/test\ntest2': Invalid argument
      
      But in cgroup1_rename() such check is missed.
      This allows us to make /proc/<pid>/cgroup unparsable:
      $ mkdir /sys/fs/cgroup/cpu/test
      $ mv /sys/fs/cgroup/cpu/test $'/sys/fs/cgroup/cpu/test\ntest2'
      $ echo $$ > $'/sys/fs/cgroup/cpu/test\ntest2'
      $ cat /proc/self/cgroup
      11:pids:/
      10:freezer:/
      9:hugetlb:/
      8:cpuset:/
      7:blkio:/user.slice
      6:memory:/user.slice
      5:net_cls,net_prio:/
      4:perf_event:/
      3:devices:/user.slice
      2:cpu,cpuacct:/test
      test2
      1:name=systemd:/
      0::/
      Signed-off-by: default avatarAlexander Kuznetsov <wwfq@yandex-team.ru>
      Reported-by: default avatarAndrey Krasichkov <buglloc@yandex-team.ru>
      Acked-by: default avatarDmitry Yakunin <zeil@yandex-team.ru>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      b7e24eb1
    • Alaa Hleihel's avatar
      IB/mlx5: Fix initializing CQ fragments buffer · 2ba0aa2f
      Alaa Hleihel authored
      The function init_cq_frag_buf() can be called to initialize the current CQ
      fragments buffer cq->buf, or the temporary cq->resize_buf that is filled
      during CQ resize operation.
      
      However, the offending commit started to use function get_cqe() for
      getting the CQEs, the issue with this change is that get_cqe() always
      returns CQEs from cq->buf, which leads us to initialize the wrong buffer,
      and in case of enlarging the CQ we try to access elements beyond the size
      of the current cq->buf and eventually hit a kernel panic.
      
       [exception RIP: init_cq_frag_buf+103]
        [ffff9f799ddcbcd8] mlx5_ib_resize_cq at ffffffffc0835d60 [mlx5_ib]
        [ffff9f799ddcbdb0] ib_resize_cq at ffffffffc05270df [ib_core]
        [ffff9f799ddcbdc0] llt_rdma_setup_qp at ffffffffc0a6a712 [llt]
        [ffff9f799ddcbe10] llt_rdma_cc_event_action at ffffffffc0a6b411 [llt]
        [ffff9f799ddcbe98] llt_rdma_client_conn_thread at ffffffffc0a6bb75 [llt]
        [ffff9f799ddcbec8] kthread at ffffffffa66c5da1
        [ffff9f799ddcbf50] ret_from_fork_nospec_begin at ffffffffa6d95ddd
      
      Fix it by getting the needed CQE by calling mlx5_frag_buf_get_wqe() that
      takes the correct source buffer as a parameter.
      
      Fixes: 388ca8be ("IB/mlx5: Implement fragmented completion queue (CQ)")
      Link: https://lore.kernel.org/r/90a0e8c924093cfa50a482880ad7e7edb73dc19a.1623309971.git.leonro@nvidia.comSigned-off-by: default avatarAlaa Hleihel <alaa@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      2ba0aa2f
    • Aharon Landau's avatar
      RDMA/mlx5: Delete right entry from MR signature database · 6466f03f
      Aharon Landau authored
      The value mr->sig is stored in the entry upon mr allocation, however, ibmr
      is wrongly entered here as "old", therefore, xa_cmpxchg() does not replace
      the entry with NULL, which leads to the following trace:
      
       WARNING: CPU: 28 PID: 2078 at drivers/infiniband/hw/mlx5/main.c:3643 mlx5_ib_stage_init_cleanup+0x4d/0x60 [mlx5_ib]
       Modules linked in: nvme_rdma nvme_fabrics nvme_core 8021q garp mrp bonding bridge stp llc rfkill rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_tad
       CPU: 28 PID: 2078 Comm: reboot Tainted: G               X --------- ---  5.13.0-0.rc2.19.el9.x86_64 #1
       Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 2.9.1 12/07/2018
       RIP: 0010:mlx5_ib_stage_init_cleanup+0x4d/0x60 [mlx5_ib]
       Code: 8d bb 70 1f 00 00 be 00 01 00 00 e8 9d 94 ce da 48 3d 00 01 00 00 75 02 5b c3 0f 0b 5b c3 0f 0b 48 83 bb b0 20 00 00 00 74 d5 <0f> 0b eb d1 4
       RSP: 0018:ffffa8db06d33c90 EFLAGS: 00010282
       RAX: 0000000000000000 RBX: ffff97f890a44000 RCX: ffff97f900ec0160
       RDX: 0000000000000000 RSI: 0000000080080001 RDI: ffff97f890a44000
       RBP: ffffffffc0c189b8 R08: 0000000000000001 R09: 0000000000000000
       R10: 0000000000000001 R11: 0000000000000300 R12: ffff97f890a44000
       R13: ffffffffc0c36030 R14: 00000000fee1dead R15: 0000000000000000
       FS:  00007f0d5a8a3b40(0000) GS:ffff98077fb80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000555acbf4f450 CR3: 00000002a6f56002 CR4: 00000000001706e0
       Call Trace:
        mlx5r_remove+0x39/0x60 [mlx5_ib]
        auxiliary_bus_remove+0x1b/0x30
        __device_release_driver+0x17a/0x230
        device_release_driver+0x24/0x30
        bus_remove_device+0xdb/0x140
        device_del+0x18b/0x3e0
        mlx5_detach_device+0x59/0x90 [mlx5_core]
        mlx5_unload_one+0x22/0x60 [mlx5_core]
        shutdown+0x31/0x3a [mlx5_core]
        pci_device_shutdown+0x34/0x60
        device_shutdown+0x15b/0x1c0
        __do_sys_reboot.cold+0x2f/0x5b
        ? vfs_writev+0xc7/0x140
        ? handle_mm_fault+0xc5/0x290
        ? do_writev+0x6b/0x110
        do_syscall_64+0x40/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: e6fb246c ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()")
      Link: https://lore.kernel.org/r/f3f585ea0db59c2a78f94f65eedeafc5a2374993.1623309971.git.leonro@nvidia.comSigned-off-by: default avatarAharon Landau <aharonl@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      6466f03f
    • Maor Gottlieb's avatar
      RDMA: Verify port when creating flow rule · 2adcb4c5
      Maor Gottlieb authored
      Validate port value provided by the user and with that remove no longer
      needed validation by the driver.  The missing check in the mlx5_ib driver
      could cause to the below oops.
      
      Call trace:
        _create_flow_rule+0x2d4/0xf28 [mlx5_ib]
        mlx5_ib_create_flow+0x2d0/0x5b0 [mlx5_ib]
        ib_uverbs_ex_create_flow+0x4cc/0x624 [ib_uverbs]
        ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd4/0x150 [ib_uverbs]
        ib_uverbs_cmd_verbs.isra.7+0xb28/0xc50 [ib_uverbs]
        ib_uverbs_ioctl+0x158/0x1d0 [ib_uverbs]
        do_vfs_ioctl+0xd0/0xaf0
        ksys_ioctl+0x84/0xb4
        __arm64_sys_ioctl+0x28/0xc4
        el0_svc_common.constprop.3+0xa4/0x254
        el0_svc_handler+0x84/0xa0
        el0_svc+0x10/0x26c
       Code: b9401260 f9615681 51000400 8b001c20 (f9403c1a)
      
      Fixes: 436f2ad0 ("IB/core: Export ib_create/destroy_flow through uverbs")
      Link: https://lore.kernel.org/r/faad30dc5219a01727f47db3dc2f029d07c82c00.1623309971.git.leonro@nvidia.comReviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      2adcb4c5
    • Desmond Cheong Zhi Xi's avatar
      drm: Lock pointer access in drm_master_release() · c336a5ee
      Desmond Cheong Zhi Xi authored
      This patch eliminates the following smatch warning:
      drivers/gpu/drm/drm_auth.c:320 drm_master_release() warn: unlocked access 'master' (line 318) expected lock '&dev->master_mutex'
      
      The 'file_priv->master' field should be protected by the mutex lock to
      '&dev->master_mutex'. This is because other processes can concurrently
      modify this field and free the current 'file_priv->master'
      pointer. This could result in a use-after-free error when 'master' is
      dereferenced in subsequent function calls to
      'drm_legacy_lock_master_cleanup()' or to 'drm_lease_revoke()'.
      
      An example of a scenario that would produce this error can be seen
      from a similar bug in 'drm_getunique()' that was reported by Syzbot:
      https://syzkaller.appspot.com/bug?id=148d2f1dfac64af52ffd27b661981a540724f803
      
      In the Syzbot report, another process concurrently acquired the
      device's master mutex in 'drm_setmaster_ioctl()', then overwrote
      'fpriv->master' in 'drm_new_set_master()'. The old value of
      'fpriv->master' was subsequently freed before the mutex was unlocked.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDesmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210609092119.173590-1-desmondcheongzx@gmail.com
      c336a5ee
  4. 09 Jun, 2021 12 commits
  5. 08 Jun, 2021 1 commit
    • Paolo Bonzini's avatar
      kvm: avoid speculation-based attacks from out-of-range memslot accesses · da27a83f
      Paolo Bonzini authored
      KVM's mechanism for accessing guest memory translates a guest physical
      address (gpa) to a host virtual address using the right-shifted gpa
      (also known as gfn) and a struct kvm_memory_slot.  The translation is
      performed in __gfn_to_hva_memslot using the following formula:
      
            hva = slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE
      
      It is expected that gfn falls within the boundaries of the guest's
      physical memory.  However, a guest can access invalid physical addresses
      in such a way that the gfn is invalid.
      
      __gfn_to_hva_memslot is called from kvm_vcpu_gfn_to_hva_prot, which first
      retrieves a memslot through __gfn_to_memslot.  While __gfn_to_memslot
      does check that the gfn falls within the boundaries of the guest's
      physical memory or not, a CPU can speculate the result of the check and
      continue execution speculatively using an illegal gfn. The speculation
      can result in calculating an out-of-bounds hva.  If the resulting host
      virtual address is used to load another guest physical address, this
      is effectively a Spectre gadget consisting of two consecutive reads,
      the second of which is data dependent on the first.
      
      Right now it's not clear if there are any cases in which this is
      exploitable.  One interesting case was reported by the original author
      of this patch, and involves visiting guest page tables on x86.  Right
      now these are not vulnerable because the hva read goes through get_user(),
      which contains an LFENCE speculation barrier.  However, there are
      patches in progress for x86 uaccess.h to mask kernel addresses instead of
      using LFENCE; once these land, a guest could use speculation to read
      from the VMM's ring 3 address space.  Other architectures such as ARM
      already use the address masking method, and would be susceptible to
      this same kind of data-dependent access gadgets.  Therefore, this patch
      proactively protects from these attacks by masking out-of-bounds gfns
      in __gfn_to_hva_memslot, which blocks speculation of invalid hvas.
      
      Sean Christopherson noted that this patch does not cover
      kvm_read_guest_offset_cached.  This however is limited to a few bytes
      past the end of the cache, and therefore it is unlikely to be useful in
      the context of building a chain of data dependent accesses.
      Reported-by: default avatarArtemiy Margaritov <artemiy.margaritov@gmail.com>
      Co-developed-by: default avatarArtemiy Margaritov <artemiy.margaritov@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      da27a83f