1. 26 Aug, 2021 28 commits
  2. 25 Aug, 2021 12 commits
    • kernel test robot's avatar
      net: usb: asix: ax88772: fix boolconv.cocci warnings · ec92e524
      kernel test robot authored
      drivers/net/usb/asix_devices.c:757:60-65: WARNING: conversion to bool not needed here
      
       Remove unneeded conversion to bool
      
      Semantic patch information:
       Relational and logical operators evaluate to bool,
       explicit conversion is overly verbose and unneeded.
      
      Generated by: scripts/coccinelle/misc/boolconv.cocci
      
      Fixes: 7a141e64 ("net: usb: asix: ax88772: move embedded PHY detection as early as possible")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarkernel test robot <lkp@intel.com>
      Reviewed-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://lore.kernel.org/r/20210825183538.13070-1-o.rempel@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec92e524
    • Trond Myklebust's avatar
      SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()... · 062b829c
      Trond Myklebust authored
      If the attempt to reserve a slot fails, we currently leak the XPT_BUSY
      flag on the socket. Among other things, this make it impossible to close
      the socket.
      
      Fixes: 82011c80 ("SUNRPC: Move svc_xprt_received() call sites")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      062b829c
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 73f3af7b
      Linus Torvalds authored
      Merge fixes from Andrew Morton:
       "2 patches.
      
        Subsystems affected by this patch series: mm/memory-hotplug and
        MAINTAINERS"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        MAINTAINERS: exfat: update my email address
        mm/memory_hotplug: fix potential permanent lru cache disable
      73f3af7b
    • Namjae Jeon's avatar
      MAINTAINERS: exfat: update my email address · a34cc13a
      Namjae Jeon authored
      My email address in exfat entry will be not available in a few days.
      Update it to my own kernel.org address.
      
      Link: https://lkml.kernel.org/r/20210825044833.16806-1-namjae.jeon@samsung.comSigned-off-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a34cc13a
    • Miaohe Lin's avatar
      mm/memory_hotplug: fix potential permanent lru cache disable · 946746d1
      Miaohe Lin authored
      If offline_pages failed after lru_cache_disable(), it forgot to do
      lru_cache_enable() in error path.  So we would have lru cache disabled
      permanently in this case.
      
      Link: https://lkml.kernel.org/r/20210821094246.10149-3-linmiaohe@huawei.com
      Fixes: d479960e ("mm: disable LRU pagevec during the migration temporarily")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      946746d1
    • Linus Torvalds's avatar
      pipe: do FASYNC notifications for every pipe IO, not just state changes · fe67f4dd
      Linus Torvalds authored
      It turns out that the SIGIO/FASYNC situation is almost exactly the same
      as the EPOLLET case was: user space really wants to be notified after
      every operation.
      
      Now, in a perfect world it should be sufficient to only notify user
      space on "state transitions" when the IO state changes (ie when a pipe
      goes from unreadable to readable, or from unwritable to writable).  User
      space should then do as much as possible - fully emptying the buffer or
      what not - and we'll notify it again the next time the state changes.
      
      But as with EPOLLET, we have at least one case (stress-ng) where the
      kernel sent SIGIO due to the pipe being marked for asynchronous
      notification, but the user space signal handler then didn't actually
      necessarily read it all before returning (it read more than what was
      written, but since there could be multiple writes, it could leave data
      pending).
      
      The user space code then expected to get another SIGIO for subsequent
      writes - even though the pipe had been readable the whole time - and
      would only then read more.
      
      This is arguably a user space bug - and Colin King already fixed the
      stress-ng code in question - but the kernel regression rules are clear:
      it doesn't matter if kernel people think that user space did something
      silly and wrong.  What matters is that it used to work.
      
      So if user space depends on specific historical kernel behavior, it's a
      regression when that behavior changes.  It's on us: we were silly to
      have that non-optimal historical behavior, and our old kernel behavior
      was what user space was tested against.
      
      Because of how the FASYNC notification was tied to wakeup behavior, this
      was first broken by commits f467a6a6 and 1b6b26ae ("pipe: fix
      and clarify pipe read/write wakeup logic"), but at the time it seems
      nobody noticed.  Probably because the stress-ng problem case ends up
      being timing-dependent too.
      
      It was then unwittingly fixed by commit 3a34b13a ("pipe: make pipe
      writes always wake up readers") only to be broken again when by commit
      3b844826 ("pipe: avoid unnecessary EPOLLET wakeups under normal
      loads").
      
      And at that point the kernel test robot noticed the performance
      refression in the stress-ng.sigio.ops_per_sec case.  So the "Fixes" tag
      below is somewhat ad hoc, but it matches when the issue was noticed.
      
      Fix it for good (knock wood) by simply making the kill_fasync() case
      separate from the wakeup case.  FASYNC is quite rare, and we clearly
      shouldn't even try to use the "avoid unnecessary wakeups" logic for it.
      
      Link: https://lore.kernel.org/lkml/20210824151337.GC27667@xsang-OptiPlex-9020/
      Fixes: 3b844826 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Tested-by: default avatarOliver Sang <oliver.sang@intel.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fe67f4dd
    • Linus Torvalds's avatar
      Merge branch 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · 62add982
      Linus Torvalds authored
      Pull ucount fixes from Eric Biederman:
       "This branch fixes a regression that made it impossible to increase
        rlimits that had been converted to the ucount infrastructure, and also
        fixes a reference counting bug where the reference was not incremented
        soon enough.
      
        The fixes are trivial and the bugs have been encountered in the wild,
        and the fixes have been tested"
      
      * 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        ucounts: Increase ucounts reference counter before the security hook
        ucounts: Fix regression preventing increasing of rlimits in init_user_ns
      62add982
    • Tuo Li's avatar
      ceph: fix possible null-pointer dereference in ceph_mdsmap_decode() · a9e6ffbc
      Tuo Li authored
      kcalloc() is called to allocate memory for m->m_info, and if it fails,
      ceph_mdsmap_destroy() behind the label out_err will be called:
        ceph_mdsmap_destroy(m);
      
      In ceph_mdsmap_destroy(), m->m_info is dereferenced through:
        kfree(m->m_info[i].export_targets);
      
      To fix this possible null-pointer dereference, check m->m_info before the
      for loop to free m->m_info[i].export_targets.
      
      [ jlayton: fix up whitespace damage
      	   only kfree(m->m_info) if it's non-NULL ]
      Reported-by: default avatarTOTE Robot <oslab@tsinghua.edu.cn>
      Signed-off-by: default avatarTuo Li <islituo@gmail.com>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      a9e6ffbc
    • Xiubo Li's avatar
      ceph: correctly handle releasing an embedded cap flush · b2f9fa1f
      Xiubo Li authored
      The ceph_cap_flush structures are usually dynamically allocated, but
      the ceph_cap_snap has an embedded one.
      
      When force umounting, the client will try to remove all the session
      caps. During this, it will free them, but that should not be done
      with the ones embedded in a capsnap.
      
      Fix this by adding a new boolean that indicates that the cap flush is
      embedded in a capsnap, and skip freeing it if that's set.
      
      At the same time, switch to using list_del_init() when detaching the
      i_list and g_list heads.  It's possible for a forced umount to remove
      these objects but then handle_cap_flushsnap_ack() races in and does the
      list_del_init() again, corrupting memory.
      
      Cc: stable@vger.kernel.org
      URL: https://tracker.ceph.com/issues/52283Signed-off-by: default avatarXiubo Li <xiubli@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      b2f9fa1f
    • Qu Wenruo's avatar
      Revert "btrfs: compression: don't try to compress if we don't have enough pages" · 4e965576
      Qu Wenruo authored
      This reverts commit f2165627.
      
      [BUG]
      It's no longer possible to create compressed inline extent after commit
      f2165627 ("btrfs: compression: don't try to compress if we don't
      have enough pages").
      
      [CAUSE]
      For compression code, there are several possible reasons we have a range
      that needs to be compressed while it's no more than one page.
      
      - Compressed inline write
        The data is always smaller than one sector and the test lacks the
        condition to properly recognize a non-inline extent.
      
      - Compressed subpage write
        For the incoming subpage compressed write support, we require page
        alignment of the delalloc range.
        And for 64K page size, we can compress just one page into smaller
        sectors.
      
      For those reasons, the requirement for the data to be more than one page
      is not correct, and is already causing regression for compressed inline
      data writeback.  The idea of skipping one page to avoid wasting CPU time
      could be revisited in the future.
      
      [FIX]
      Fix it by reverting the offending commit.
      Reported-by: default avatarZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Link: https://lore.kernel.org/linux-btrfs/afa2742.c084f5d6.17b6b08dffc@tnonline.net
      Fixes: f2165627 ("btrfs: compression: don't try to compress if we don't have enough pages")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4e965576
    • Will Deacon's avatar
      Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID" · 3eb9cdff
      Will Deacon authored
      This partially reverts commit 16c9afc7.
      
      Alex Bee reports a regression in 5.14 on their RK3328 SoC when
      configuring the PL330 DMA controller:
      
       | ------------[ cut here ]------------
       | WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
       | Modules linked in: spi_rockchip(+) fuse
       | CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
       | Hardware name: Pine64 Rock64 (DT)
       | pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
       | pc : dma_map_resource+0x68/0xc0
       | lr : pl330_prep_slave_fifo+0x78/0xd0
      
      This appears to be because dma_map_resource() is being called for a
      physical address which does not correspond to a memory address yet does
      have a valid 'struct page' due to the way in which the vmemmap is
      constructed.
      
      Prior to 16c9afc7 ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), the arm64
      implementation of pfn_valid() called memblock_is_memory() to return
      'false' for such regions and the DMA mapping request would proceed.
      However, now that we are using the generic implementation where only the
      presence of the memory map entry is considered, we return 'true' and
      erroneously fail with DMA_MAPPING_ERROR because we identify the region
      as DRAM.
      
      Although fixing this in the DMA mapping code is arguably the right fix,
      it is a risky, cross-architecture change at this stage in the cycle. So
      just revert arm64 back to its old pfn_valid() implementation for v5.14.
      The change to the generic pfn_valid() code is preserved from the original
      patch, so as to avoid impacting other architectures.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Reported-by: default avatarAlex Bee <knaerzche@gmail.com>
      Link: https://lore.kernel.org/r/d3a3c828-b777-faf8-e901-904995688437@gmail.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      3eb9cdff
    • Davide Caratti's avatar
      net/sched: ets: fix crash when flipping from 'strict' to 'quantum' · cd9b50ad
      Davide Caratti authored
      While running kselftests, Hangbin observed that sch_ets.sh often crashes,
      and splats like the following one are seen in the output of 'dmesg':
      
       BUG: kernel NULL pointer dereference, address: 0000000000000000
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 159f12067 P4D 159f12067 PUD 159f13067 PMD 0
       Oops: 0000 [#1] SMP NOPTI
       CPU: 2 PID: 921 Comm: tc Not tainted 5.14.0-rc6+ #458
       Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
       RIP: 0010:__list_del_entry_valid+0x2d/0x50
       Code: 48 8b 57 08 48 b9 00 01 00 00 00 00 ad de 48 39 c8 0f 84 ac 6e 5b 00 48 b9 22 01 00 00 00 00 ad de 48 39 ca 0f 84 cf 6e 5b 00 <48> 8b 32 48 39 fe 0f 85 af 6e 5b 00 48 8b 50 08 48 39 f2 0f 85 94
       RSP: 0018:ffffb2da005c3890 EFLAGS: 00010217
       RAX: 0000000000000000 RBX: ffff9073ba23f800 RCX: dead000000000122
       RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff9073ba23fbc8
       RBP: ffff9073ba23f890 R08: 0000000000000001 R09: 0000000000000001
       R10: 0000000000000001 R11: 0000000000000001 R12: dead000000000100
       R13: ffff9073ba23fb00 R14: 0000000000000002 R15: 0000000000000002
       FS:  00007f93e5564e40(0000) GS:ffff9073bba00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 000000014ad34000 CR4: 0000000000350ee0
       Call Trace:
        ets_qdisc_reset+0x6e/0x100 [sch_ets]
        qdisc_reset+0x49/0x1d0
        tbf_reset+0x15/0x60 [sch_tbf]
        qdisc_reset+0x49/0x1d0
        dev_reset_queue.constprop.42+0x2f/0x90
        dev_deactivate_many+0x1d3/0x3d0
        dev_deactivate+0x56/0x90
        qdisc_graft+0x47e/0x5a0
        tc_get_qdisc+0x1db/0x3e0
        rtnetlink_rcv_msg+0x164/0x4c0
        netlink_rcv_skb+0x50/0x100
        netlink_unicast+0x1a5/0x280
        netlink_sendmsg+0x242/0x480
        sock_sendmsg+0x5b/0x60
        ____sys_sendmsg+0x1f2/0x260
        ___sys_sendmsg+0x7c/0xc0
        __sys_sendmsg+0x57/0xa0
        do_syscall_64+0x3a/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7f93e44b8338
       Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55
       RSP: 002b:00007ffc0db737a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 0000000061255c06 RCX: 00007f93e44b8338
       RDX: 0000000000000000 RSI: 00007ffc0db73810 RDI: 0000000000000003
       RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
       R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000001
       R13: 0000000000687880 R14: 0000000000000000 R15: 0000000000000000
       Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt iTCO_vendor_support intel_rapl_msr intel_rapl_common joydev i2c_i801 pcspkr i2c_smbus lpc_ich virtio_balloon ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel libata serio_raw virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod
       CR2: 0000000000000000
      
      When the change() function decreases the value of 'nstrict', we must take
      into account that packets might be already enqueued on a class that flips
      from 'strict' to 'quantum': otherwise that class will not be added to the
      bandwidth-sharing list. Then, a call to ets_qdisc_reset() will attempt to
      do list_del(&alist) with 'alist' filled with zero, hence the NULL pointer
      dereference.
      For classes flipping from 'strict' to 'quantum', initialize an empty list
      and eventually add it to the bandwidth-sharing list, if there are packets
      already enqueued. In this way, the kernel will:
       a) prevent crashing as described above.
       b) avoid retaining the backlog packets (for an arbitrarily long time) in
          case no packet is enqueued after a change from 'strict' to 'quantum'.
      Reported-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Fixes: dcc68b4d ("net: sch_ets: Add a new Qdisc")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd9b50ad