1. 15 Jul, 2021 2 commits
  2. 14 Jul, 2021 7 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8096acd7
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski.
       "Including fixes from bpf and netfilter.
      
        Current release - regressions:
      
         - sock: fix parameter order in sock_setsockopt()
      
        Current release - new code bugs:
      
         - netfilter: nft_last:
             - fix incorrect arithmetic when restoring last used
             - honor NFTA_LAST_SET on restoration
      
        Previous releases - regressions:
      
         - udp: properly flush normal packet at GRO time
      
         - sfc: ensure correct number of XDP queues; don't allow enabling the
           feature if there isn't sufficient resources to Tx from any CPU
      
         - dsa: sja1105: fix address learning getting disabled on the CPU port
      
         - mptcp: addresses a rmem accounting issue that could keep packets in
           subflow receive buffers longer than necessary, delaying MPTCP-level
           ACKs
      
         - ip_tunnel: fix mtu calculation for ETHER tunnel devices
      
         - do not reuse skbs allocated from skbuff_fclone_cache in the napi
           skb cache, we'd try to return them to the wrong slab cache
      
         - tcp: consistently disable header prediction for mptcp
      
        Previous releases - always broken:
      
         - bpf: fix subprog poke descriptor tracking use-after-free
      
         - ipv6:
             - allocate enough headroom in ip6_finish_output2() in case
               iptables TEE is used
             - tcp: drop silly ICMPv6 packet too big messages to avoid
               expensive and pointless lookups (which may serve as a DDOS
               vector)
             - make sure fwmark is copied in SYNACK packets
             - fix 'disable_policy' for forwarded packets (align with IPv4)
      
         - netfilter: conntrack:
             - do not renew entry stuck in tcp SYN_SENT state
             - do not mark RST in the reply direction coming after SYN packet
               for an out-of-sync entry
      
         - mptcp: cleanly handle error conditions with MP_JOIN and syncookies
      
         - mptcp: fix double free when rejecting a join due to port mismatch
      
         - validate lwtstate->data before returning from skb_tunnel_info()
      
         - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
      
         - mt76: mt7921: continue to probe driver when fw already downloaded
      
         - bonding: fix multiple issues with offloading IPsec to (thru?) bond
      
         - stmmac: ptp: fix issues around Qbv support and setting time back
      
         - bcmgenet: always clear wake-up based on energy detection
      
        Misc:
      
         - sctp: move 198 addresses from unusable to private scope
      
         - ptp: support virtual clocks and timestamping
      
         - openvswitch: optimize operation for key comparison"
      
      * tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits)
        net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave()
        sfc: add logs explaining XDP_TX/REDIRECT is not available
        sfc: ensure correct number of XDP queues
        sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
        net: fddi: fix UAF in fza_probe
        net: dsa: sja1105: fix address learning getting disabled on the CPU port
        net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload
        net: Use nlmsg_unicast() instead of netlink_unicast()
        octeontx2-pf: Fix uninitialized boolean variable pps
        ipv6: allocate enough headroom in ip6_finish_output2()
        net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific
        net: bridge: multicast: fix MRD advertisement router port marking race
        net: bridge: multicast: fix PIM hello router port marking race
        net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340
        dsa: fix for_each_child.cocci warnings
        virtio_net: check virtqueue_add_sgs() return value
        mptcp: properly account bulk freed memory
        selftests: mptcp: fix case multiple subflows limited by server
        mptcp: avoid processing packet if a subflow reset
        mptcp: fix syncookie process if mptcp can not_accept new subflow
        ...
      8096acd7
    • Christian Brauner's avatar
      fs: add vfs_parse_fs_param_source() helper · d1d488d8
      Christian Brauner authored
      Add a simple helper that filesystems can use in their parameter parser
      to parse the "source" parameter. A few places open-coded this function
      and that already caused a bug in the cgroup v1 parser that we fixed.
      Let's make it harder to get this wrong by introducing a helper which
      performs all necessary checks.
      
      Link: https://syzkaller.appspot.com/bug?id=6312526aba5beae046fdae8f00399f87aab48b12
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d1d488d8
    • Christian Brauner's avatar
      cgroup: verify that source is a string · 3b046272
      Christian Brauner authored
      The following sequence can be used to trigger a UAF:
      
          int fscontext_fd = fsopen("cgroup");
          int fd_null = open("/dev/null, O_RDONLY);
          int fsconfig(fscontext_fd, FSCONFIG_SET_FD, "source", fd_null);
          close_range(3, ~0U, 0);
      
      The cgroup v1 specific fs parser expects a string for the "source"
      parameter.  However, it is perfectly legitimate to e.g.  specify a file
      descriptor for the "source" parameter.  The fs parser doesn't know what
      a filesystem allows there.  So it's a bug to assume that "source" is
      always of type fs_value_is_string when it can reasonably also be
      fs_value_is_file.
      
      This assumption in the cgroup code causes a UAF because struct
      fs_parameter uses a union for the actual value.  Access to that union is
      guarded by the param->type member.  Since the cgroup paramter parser
      didn't check param->type but unconditionally moved param->string into
      fc->source a close on the fscontext_fd would trigger a UAF during
      put_fs_context() which frees fc->source thereby freeing the file stashed
      in param->file causing a UAF during a close of the fd_null.
      
      Fix this by verifying that param->type is actually a string and report
      an error if not.
      
      In follow up patches I'll add a new generic helper that can be used here
      and by other filesystems instead of this error-prone copy-pasta fix.
      But fixing it in here first makes backporting a it to stable a lot
      easier.
      
      Fixes: 8d2451f4 ("cgroup1: switch to option-by-option parsing")
      Reported-by: syzbot+283ce5a46486d6acdbaf@syzkaller.appspotmail.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: <stable@kernel.org>
      Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3b046272
    • Benjamin Gaignard's avatar
      iommu/rockchip: Fix physical address decoding · c987b65a
      Benjamin Gaignard authored
      Restore bits 39 to 32 at correct position.
      It reverses the operation done in rk_dma_addr_dte_v2().
      
      Fixes: c55356c5 ("iommu: rockchip: Add support for iommu v2")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarBenjamin Gaignard <benjamin.gaignard@collabora.com>
      Link: https://lore.kernel.org/r/20210712101232.318589-1-benjamin.gaignard@collabora.comSigned-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      c987b65a
    • Lu Baolu's avatar
      iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries · 474dd1c6
      Lu Baolu authored
      The commit 2b0140c6 ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
      fixes an issue of "sub-device is removed where the context entry is cleared
      for all aliases". But this commit didn't consider the PASID entry and PASID
      table in VT-d scalable mode. This fix increases the coverage of scalable
      mode.
      Suggested-by: default avatarSanjay Kumar <sanjay.k.kumar@intel.com>
      Fixes: 8038bdb8 ("iommu/vt-d: Only clear real DMA device's context entries")
      Fixes: 2b0140c6 ("iommu/vt-d: Use pci_real_dma_dev() for mapping")
      Cc: stable@vger.kernel.org # v5.6+
      Cc: Jon Derrick <jonathan.derrick@intel.com>
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20210712071712.3416949-1-baolu.lu@linux.intel.comSigned-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      474dd1c6
    • Sanjay Kumar's avatar
      iommu/vt-d: Global devTLB flush when present context entry changed · 37764b95
      Sanjay Kumar authored
      This fixes a bug in context cache clear operation. The code was not
      following the correct invalidation flow. A global device TLB invalidation
      should be added after the IOTLB invalidation. At the same time, it
      uses the domain ID from the context entry. But in scalable mode, the
      domain ID is in PASID table entry, not context entry.
      
      Fixes: 7373a8cc ("iommu/vt-d: Setup context and enable RID2PASID support")
      Cc: stable@vger.kernel.org # v5.0+
      Signed-off-by: default avatarSanjay Kumar <sanjay.k.kumar@intel.com>
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20210712071315.3416543-1-baolu.lu@linux.intel.comSigned-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      37764b95
    • Marek Szyprowski's avatar
      iommu/qcom: Revert "iommu/arm: Cleanup resources in case of probe error path" · ce36c942
      Marek Szyprowski authored
      QCOM IOMMU driver calls bus_set_iommu() for every IOMMU device controller,
      what fails for the second and latter IOMMU devices. This is intended and
      must be not fatal to the driver registration process. Also the cleanup
      path should take care of the runtime PM state, what is missing in the
      current patch. Revert relevant changes to the QCOM IOMMU driver until
      a proper fix is prepared.
      
      This partially reverts commit 249c9dc6.
      
      Fixes: 249c9dc6 ("iommu/arm: Cleanup resources in case of probe error path")
      Suggested-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20210705065657.30356-1-m.szyprowski@samsung.comSigned-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      ce36c942
  3. 13 Jul, 2021 11 commits
    • Vladimir Oltean's avatar
      net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave() · bcb9928a
      Vladimir Oltean authored
      This was not caught because there is no switch driver which implements
      the .port_bridge_join but not .port_bridge_leave method, but it should
      nonetheless be fixed, as in certain conditions (driver development) it
      might lead to NULL pointer dereference.
      
      Fixes: f66a6a69 ("net: dsa: permit cross-chip bridging between all trees in the system")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bcb9928a
    • Linus Torvalds's avatar
      Merge tag 'vboxsf-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux · 40226a3d
      Linus Torvalds authored
      Pull vboxsf fixes from Hans de Goede:
       "This adds support for the atomic_open directory-inode op to vboxsf.
      
        Note this is not just an enhancement this also fixes an actual issue
        which users are hitting, see the commit message of the "boxsf: Add
        support for the atomic_open directory-inode" patch"
      
      * tag 'vboxsf-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux:
        vboxsf: Add support for the atomic_open directory-inode op
        vboxsf: Add vboxsf_[create|release]_sf_handle() helpers
        vboxsf: Make vboxsf_dir_create() return the handle for the created file
        vboxsf: Honor excl flag to the dir-inode create op
      40226a3d
    • Linus Torvalds's avatar
      Merge tag 'for-5.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · f02bf857
      Linus Torvalds authored
      Pull btrfs zoned mode fixes from David Sterba:
      
       - fix deadlock when allocating system chunk
      
       - fix wrong mutex unlock on an error path
      
       - fix extent map splitting for append operation
      
       - update and fix message reporting unusable chunk space
      
       - don't block when background zone reclaim runs with balance in
         parallel
      
      * tag 'for-5.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: zoned: fix wrong mutex unlock on failure to allocate log root tree
        btrfs: don't block if we can't acquire the reclaim lock
        btrfs: properly split extent_map for REQ_OP_ZONE_APPEND
        btrfs: rework chunk allocation to avoid exhaustion of the system chunk array
        btrfs: fix deadlock with concurrent chunk allocations involving system chunks
        btrfs: zoned: print unusable percentage when reclaiming block groups
        btrfs: zoned: fix types for u64 division in btrfs_reclaim_bgs_work
      f02bf857
    • David S. Miller's avatar
      Merge branch 'sfc-tx-queues' · 28efd208
      David S. Miller authored
      Íñigo Huguet says:
      
      ====================
      sfc: Fix lack of XDP TX queues
      
      A change introduced in commit e26ca4b5 ("sfc: reduce the number of
      requested xdp ev queues") created a bug in XDP_TX and XDP_REDIRECT
      because it unintentionally reduced the number of XDP TX queues, letting
      not enough queues to have one per CPU, which leaded to errors if XDP
      TX/REDIRECT was done from a high numbered CPU.
      
      This patchs make the following changes:
      - Fix the bug mentioned above
      - Revert commit 99ba0ea6 ("sfc: adjust efx->xdp_tx_queue_count with
        the real number of initialized queues") which intended to fix a related
        problem, created by mentioned bug, but it's no longer necessary
      - Add a new error log message if there are not enough resources to make
        XDP_TX/REDIRECT work
      
      V1 -> V2: keep the calculation of how many tx queues can handle a single
      event queue, but apply the "max. tx queues per channel" upper limit.
      V2 -> V3: WARN_ON if the number of initialized XDP TXQs differs from the
      expected.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28efd208
    • Íñigo Huguet's avatar
      sfc: add logs explaining XDP_TX/REDIRECT is not available · d2a16bde
      Íñigo Huguet authored
      If it's not possible to allocate enough channels for XDP, XDP_TX and
      XDP_REDIRECT don't work. However, only a message saying that not enough
      channels were available was shown, but not saying what are the
      consequences in that case. The user didn't know if he/she can use XDP
      or not, if the performance is reduced, or what.
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2a16bde
    • Íñigo Huguet's avatar
      sfc: ensure correct number of XDP queues · 788bc000
      Íñigo Huguet authored
      Commit 99ba0ea6 ("sfc: adjust efx->xdp_tx_queue_count with the real
      number of initialized queues") intended to fix a problem caused by a
      round up when calculating the number of XDP channels and queues.
      However, this was not the real problem. The real problem was that the
      number of XDP TX queues had been reduced to half in
      commit e26ca4b5 ("sfc: reduce the number of requested xdp ev queues"),
      but the variable xdp_tx_queue_count had remained the same.
      
      Once the correct number of XDP TX queues is created again in the
      previous patch of this series, this also can be reverted since the error
      doesn't actually exist.
      
      Only in the case that there is a bug in the code we can have different
      values in xdp_queue_number and efx->xdp_tx_queue_count. Because of this,
      and per Edward Cree's suggestion, I add instead a WARN_ON to catch if it
      happens again in the future.
      
      Note that the number of allocated queues can be higher than the number
      of used ones due to the round up, as explained in the existing comment
      in the code. That's why we also have to stop increasing xdp_queue_number
      beyond efx->xdp_tx_queue_count.
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      788bc000
    • Íñigo Huguet's avatar
      sfc: fix lack of XDP TX queues - error XDP TX failed (-22) · f28100cb
      Íñigo Huguet authored
      Fixes: e26ca4b5 sfc: reduce the number of requested xdp ev queues
      
      The buggy commit intended to allocate less channels for XDP in order to
      be more unlikely to reach the limit of 32 channels of the driver.
      
      The idea was to use each IRQ/eventqeue for more XDP TX queues than
      before, calculating which is the maximum number of TX queues that one
      event queue can handle. For example, in EF10 each event queue could
      handle up to 8 queues, better than the 4 they were handling before the
      change. This way, it would have to allocate half of channels than before
      for XDP TX.
      
      The problem is that the TX queues are also contained inside the channel
      structs, and there are only 4 queues per channel. Reducing the number of
      channels means also reducing the number of queues, resulting in not
      having the desired number of 1 queue per CPU.
      
      This leads to getting errors on XDP_TX and XDP_REDIRECT if they're
      executed from a high numbered CPU, because there only exist queues for
      the low half of CPUs, actually. If XDP_TX/REDIRECT is executed in a low
      numbered CPU, the error doesn't happen. This is the error in the logs
      (repeated many times, even rate limited):
      sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22)
      
      This errors happens in function efx_xdp_tx_buffers, where it expects to
      have a dedicated XDP TX queue per CPU.
      
      Reverting the change makes again more likely to reach the limit of 32
      channels in machines with many CPUs. If this happen, no XDP_TX/REDIRECT
      will be possible at all, and we will have this log error messages:
      
      At interface probe:
      sfc 0000:5e:00.0: Insufficient resources for 12 XDP event queues (24 other channels, max 32)
      
      At every subsequent XDP_TX/REDIRECT failure, rate limited:
      sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22)
      
      However, without reverting the change, it makes the user to think that
      everything is OK at probe time, but later it fails in an unpredictable
      way, depending on the CPU that handles the packet.
      
      It is better to restore the predictable behaviour. If the user sees the
      error message at probe time, he/she can try to configure the best way it
      fits his/her needs. At least, he/she will have 2 options:
      - Accept that XDP_TX/REDIRECT is not available (he/she may not need it)
      - Load sfc module with modparam 'rss_cpus' with a lower number, thus
        creating less normal RX queues/channels, letting more free resources
        for XDP, with some performance penalty.
      
      Anyway, let the calculation of maximum TX queues that can be handled by
      a single event queue, and use it only if it's less than the number of TX
      queues per channel. This doesn't happen in practice, but could happen if
      some constant values are tweaked in the future, such us
      EFX_MAX_TXQ_PER_CHANNEL, EFX_MAX_EVQ_SIZE or EFX_MAX_DMAQ_SIZE.
      
      Related mailing list thread:
      https://lore.kernel.org/bpf/20201215104327.2be76156@carbon/Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f28100cb
    • Pavel Skripkin's avatar
      net: fddi: fix UAF in fza_probe · deb7178e
      Pavel Skripkin authored
      fp is netdev private data and it cannot be
      used after free_netdev() call. Using fp after free_netdev()
      can cause UAF bug. Fix it by moving free_netdev() after error message.
      
      Fixes: 61414f5e ("FDDI: defza: Add support for DEC FDDIcontroller 700
      TURBOchannel adapter")
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      deb7178e
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix address learning getting disabled on the CPU port · b0b33b04
      Vladimir Oltean authored
      In May 2019 when commit 640f763f ("net: dsa: sja1105: Add support
      for Spanning Tree Protocol") was introduced, the comment that "STP does
      not get called for the CPU port" was true. This changed after commit
      0394a63a ("net: dsa: enable and disable all ports") in August 2019
      and went largely unnoticed, because the sja1105_bridge_stp_state_set()
      method did nothing different compared to the static setup done by
      sja1105_init_mac_settings().
      
      With the ability to turn address learning off introduced by the blamed
      commit, there is a new priv->learn_ena port mask in the driver. When
      sja1105_bridge_stp_state_set() gets called and we are in
      BR_STATE_LEARNING or later, address learning is enabled or not depending
      on priv->learn_ena & BIT(port).
      
      So what happens is that priv->learn_ena is not being set from anywhere
      for the CPU port, and the static configuration done by
      sja1105_init_mac_settings() is being overwritten.
      
      To solve this, acknowledge that the static configuration of STP state is
      no longer necessary because the STP state is being set by the DSA core
      now, but what is necessary is to set priv->learn_ena for the CPU port.
      
      Fixes: 4d942354 ("net: dsa: sja1105: offload bridge port flags to device")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0b33b04
    • Vladimir Oltean's avatar
      net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload · e56c6bbd
      Vladimir Oltean authored
      The point with a *dev and a *brport_dev is that when we have a LAG net
      device that is a bridge port, *dev is an ocelot net device and
      *brport_dev is the bonding/team net device. The ocelot net device
      beneath the LAG does not exist from the bridge's perspective, so we need
      to sync the switchdev objects belonging to the brport_dev and not to the
      dev.
      
      Fixes: e4bd44e8 ("net: ocelot: replay switchdev events when joining bridge")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e56c6bbd
    • Yajun Deng's avatar
      net: Use nlmsg_unicast() instead of netlink_unicast() · 01757f53
      Yajun Deng authored
      It has 'if (err >0 )' statement in nlmsg_unicast(), so use nlmsg_unicast()
      instead of netlink_unicast(), this looks more concise.
      
      v2: remove the change in netfilter.
      Signed-off-by: default avatarYajun Deng <yajun.deng@linux.dev>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      01757f53
  4. 12 Jul, 2021 6 commits
  5. 11 Jul, 2021 14 commits
    • Linus Torvalds's avatar
      Linux 5.14-rc1 · e73f0f0e
      Linus Torvalds authored
      e73f0f0e
    • Hugh Dickins's avatar
      mm/rmap: try_to_migrate() skip zone_device !device_private · 6c855fce
      Hugh Dickins authored
      I know nothing about zone_device pages and !device_private pages; but if
      try_to_migrate_one() will do nothing for them, then it's better that
      try_to_migrate() filter them first, than trawl through all their vmas.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarAlistair Popple <apopple@nvidia.com>
      Link: https://lore.kernel.org/lkml/1241d356-8ec9-f47b-a5ec-9b2bf66d242@google.com/
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6c855fce
    • Hugh Dickins's avatar
      mm/rmap: fix new bug: premature return from page_mlock_one() · 023e1a8d
      Hugh Dickins authored
      In the unlikely race case that page_mlock_one() finds VM_LOCKED has been
      cleared by the time it got page table lock, page_vma_mapped_walk_done()
      must be called before returning, either explicitly, or by a final call
      to page_vma_mapped_walk() - otherwise the page table remains locked.
      
      Fixes: cd62734c ("mm/rmap: split try_to_munlock from try_to_unmap")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarAlistair Popple <apopple@nvidia.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Link: https://lore.kernel.org/lkml/20210711151446.GB4070@xsang-OptiPlex-9020/
      Link: https://lore.kernel.org/lkml/f71f8523-cba7-3342-40a7-114abc5d1f51@google.com/
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      023e1a8d
    • Hugh Dickins's avatar
      mm/rmap: fix old bug: munlocking THP missed other mlocks · d9770fcc
      Hugh Dickins authored
      The kernel recovers in due course from missing Mlocked pages: but there
      was no point in calling page_mlock() (formerly known as
      try_to_munlock()) on a THP, because nothing got done even when it was
      found to be mapped in another VM_LOCKED vma.
      
      It's true that we need to be careful: Mlocked accounting of pte-mapped
      THPs is too difficult (so consistently avoided); but Mlocked accounting
      of only-pmd-mapped THPs is supposed to work, even when multiple mappings
      are mlocked and munlocked or munmapped.  Refine the tests.
      
      There is already a VM_BUG_ON_PAGE(PageDoubleMap) in page_mlock(), so
      page_mlock_one() does not even have to worry about that complication.
      
      (I said the kernel recovers: but would page reclaim be likely to split
      THP before rediscovering that it's VM_LOCKED? I've not followed that up)
      
      Fixes: 9a73f61b ("thp, mlock: do not mlock PTE-mapped file huge pages")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Link: https://lore.kernel.org/lkml/cfa154c-d595-406-eb7d-eb9df730f944@google.com/
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d9770fcc
    • Hugh Dickins's avatar
      mm/rmap: fix comments left over from recent changes · 64b586d1
      Hugh Dickins authored
      Parallel developments in mm/rmap.c have left behind some out-of-date
      comments: try_to_migrate_one() also accepts TTU_SYNC (already commented
      in try_to_migrate() itself), and try_to_migrate() returns nothing at
      all.
      
      TTU_SPLIT_FREEZE has just been deleted, so reword the comment about it
      in mm/huge_memory.c; and TTU_IGNORE_ACCESS was removed in 5.11, so
      delete the "recently referenced" comment from try_to_unmap_one() (once
      upon a time the comment was near the removed codeblock, but they drifted
      apart).
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarAlistair Popple <apopple@nvidia.com>
      Link: https://lore.kernel.org/lkml/563ce5b2-7a44-5b4d-1dfd-59a0e65932a9@google.com/
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64b586d1
    • David S. Miller's avatar
      Merge branch 'bridge-mc-fixes' · d2eecc59
      David S. Miller authored
      Nikolay Aleksandrov says:
      
      ====================
      net: bridge: multicast: fix automatic router port marking races
      
      While working on per-vlan multicast snooping I found two race conditions
      when multicast snooping is enabled. They're identical and happen when
      the router port list is modified without the multicast lock. One requires
      a PIM hello message to be received on a port and the other an MRD
      advertisement. To fix them we just need to take the multicast_lock when
      adding the ports to the router port list (marking them as router ports).
      Tested on an affected setup by generating the required packets while
      modifying the port list in parallel.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2eecc59
    • Nikolay Aleksandrov's avatar
      net: bridge: multicast: fix MRD advertisement router port marking race · 000b7287
      Nikolay Aleksandrov authored
      When an MRD advertisement is received on a bridge port with multicast
      snooping enabled, we mark it as a router port automatically, that
      includes adding that port to the router port list. The multicast lock
      protects that list, but it is not acquired in the MRD advertisement case
      leading to a race condition, we need to take it to fix the race.
      
      Cc: stable@vger.kernel.org
      Cc: linus.luessing@c0d3.blue
      Fixes: 4b3087c7 ("bridge: Snoop Multicast Router Advertisements")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      000b7287
    • Nikolay Aleksandrov's avatar
      net: bridge: multicast: fix PIM hello router port marking race · 04bef83a
      Nikolay Aleksandrov authored
      When a PIM hello packet is received on a bridge port with multicast
      snooping enabled, we mark it as a router port automatically, that
      includes adding that port the router port list. The multicast lock
      protects that list, but it is not acquired in the PIM message case
      leading to a race condition, we need to take it to fix the race.
      
      Cc: stable@vger.kernel.org
      Fixes: 91b02d3d ("bridge: mcast: add router port on PIM hello message")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04bef83a
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 98f7fdce
      Linus Torvalds authored
      Pull irq fixes from Ingo Molnar:
       "Two fixes:
      
         - Fix a MIPS IRQ handling RCU bug
      
         - Remove a DocBook annotation for a parameter that doesn't exist
           anymore"
      
      * tag 'irq-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/mips: Fix RCU violation when using irqdomain lookup on interrupt entry
        genirq/irqdesc: Drop excess kernel-doc entry @lookup
      98f7fdce
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 877029d9
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "Three fixes:
      
         - Fix load tracking bug/inconsistency
      
         - Fix a sporadic CFS bandwidth constraints enforcement bug
      
         - Fix a uclamp utilization tracking bug for newly woken tasks"
      
      * tag 'sched-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/uclamp: Ignore max aggregation if rq is idle
        sched/fair: Fix CFS bandwidth hrtimer expiry type
        sched/fair: Sync load_sum with load_avg after dequeue
      877029d9
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 936b664f
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "A fix and a hardware-enablement addition:
      
         - Robustify uncore_snbep's skx_iio_set_mapping()'s error cleanup
      
         - Add cstate event support for Intel ICELAKE_X and ICELAKE_D"
      
      * tag 'perf-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel/uncore: Clean up error handling path of iio mapping
        perf/x86/cstate: Add ICELAKE_X and ICELAKE_D support
      936b664f
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 301c8b1d
      Linus Torvalds authored
      Pull locking fixes from Ingo Molnar:
      
       - Fix a Sparc crash
      
       - Fix a number of objtool warnings
      
       - Fix /proc/lockdep output on certain configs
      
       - Restore a kprobes fail-safe
      
      * tag 'locking-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/atomic: sparc: Fix arch_cmpxchg64_local()
        kprobe/static_call: Restore missing static_call_text_reserved()
        static_call: Fix static_call_text_reserved() vs __init
        jump_label: Fix jump_label_text_reserved() vs __init
        locking/lockdep: Fix meaningless /proc/lockdep output of lock classes on !CONFIG_PROVE_LOCKING
      301c8b1d
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 8b9cc17a
      Linus Torvalds authored
      Pull more SCSI updates from James Bottomley:
       "This is a set of minor fixes and clean ups in the core and various
        drivers.
      
        The only core change in behaviour is the I/O retry for spinup notify,
        but that shouldn't impact anything other than the failing case"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (23 commits)
        scsi: virtio_scsi: Add validation for residual bytes from response
        scsi: ipr: System crashes when seeing type 20 error
        scsi: core: Retry I/O for Notify (Enable Spinup) Required error
        scsi: mpi3mr: Fix warnings reported by smatch
        scsi: qedf: Add check to synchronize abort and flush
        scsi: MAINTAINERS: Add mpi3mr driver maintainers
        scsi: libfc: Fix array index out of bound exception
        scsi: mvsas: Use DEVICE_ATTR_RO()/RW() macro
        scsi: megaraid_mbox: Use DEVICE_ATTR_ADMIN_RO() macro
        scsi: qedf: Use DEVICE_ATTR_RO() macro
        scsi: qedi: Use DEVICE_ATTR_RO() macro
        scsi: message: mptfc: Switch from pci_ to dma_ API
        scsi: be2iscsi: Fix some missing space in some messages
        scsi: be2iscsi: Fix an error handling path in beiscsi_dev_probe()
        scsi: ufs: Fix build warning without CONFIG_PM
        scsi: bnx2fc: Remove meaningless bnx2fc_abts_cleanup() return value assignment
        scsi: qla2xxx: Add heartbeat check
        scsi: virtio_scsi: Do not overwrite SCSI status
        scsi: libsas: Add LUN number check in .slave_alloc callback
        scsi: core: Inline scsi_mq_alloc_queue()
        ...
      8b9cc17a
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-for-v5.14-2021-07-10' of... · b1412bd7
      Linus Torvalds authored
      Merge tag 'perf-tools-for-v5.14-2021-07-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull more perf tool updates from Arnaldo Carvalho de Melo:
       "New features:
      
         - Enable use of BPF counters with 'perf stat --for-each-cgroup',
           using per-CPU 'cgroup-switch' events with an attached BPF program
           that does aggregation per-cgroup in the kernel instead of using
           per-cgroup perf events.
      
         - Add Topdown metrics L2 events as default events in 'perf stat' for
           systems having those events.
      
        Hardware tracing:
      
         - Add a config for max loops without consuming a packet in the Intel
           PT packet decoder, set via 'perf config intel-pt.max-loops=N'
      
        Hardware enablement:
      
         - Disable misleading NMI watchdog message in 'perf stat' on hybrid
           systems such as Intel Alder Lake.
      
         - Add a dummy event on hybrid systems to collect metadata records.
      
         - Add 24x7 nest metric events for the Power10 platform.
      
        Fixes:
      
         - Fix event parsing for PMUs starting with the same prefix.
      
         - Fix the 'perf trace' 'trace' alias installation dir.
      
         - Fix buffer size to report iregs in perf script python scripts,
           supporting the extended registers in PowerPC.
      
         - Fix overflow in elf_sec__is_text().
      
         - Fix 's' on source line when disasm is empty in the annotation TUI,
           accessible via 'perf annotate', 'perf report' and 'perf top'.
      
         - Plug leaks in scandir() returned dirent entries in 'perf test' when
           sorting the shell tests.
      
         - Fix --task and --stat with pipe input in 'perf report'.
      
         - Fix 'perf probe' use of debuginfo files by build id.
      
         - If a DSO has both dynsym and symtab ELF sections, read from both
           when loading the symbol table, fixing a problem processing Fedora
           32 glibc DSOs.
      
        Libraries:
      
         - Add grouping of events to libperf, from code in tools/perf,
           allowing libperf users to use that mode.
      
        Misc:
      
         - Filter plt stubs from the 'perf probe --functions' output.
      
         - Update UAPI header copies for asound, DRM, mman-common.h and the
           ones affected by the quotactl_fd syscall"
      
      * tag 'perf-tools-for-v5.14-2021-07-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (29 commits)
        perf test: Add free() calls for scandir() returned dirent entries
        libperf: Add tests for perf_evlist__set_leader()
        libperf: Remove BUG_ON() from library code in get_group_fd()
        libperf: Add group support to perf_evsel__open()
        perf tools: Fix pattern matching for same substring in different PMU type
        perf record: Add a dummy event on hybrid systems to collect metadata records
        perf stat: Add Topdown metrics L2 events as default events
        libperf: Adopt evlist__set_leader() from tools/perf as perf_evlist__set_leader()
        libperf: Move 'nr_groups' from tools/perf to evlist::nr_groups
        libperf: Move 'leader' from tools/perf to perf_evsel::leader
        libperf: Move 'idx' from tools/perf to perf_evsel::idx
        libperf: Change tests to single static and shared binaries
        perf intel-pt: Add a config for max loops without consuming a packet
        perf stat: Disable the NMI watchdog message on hybrid
        perf vendor events power10: Adds 24x7 nest metric events for power10 platform
        perf script python: Fix buffer size to report iregs in perf script
        perf trace: Fix the perf trace link location
        perf top: Fix overflow in elf_sec__is_text()
        perf annotate: Fix 's' on source line when disasm is empty
        perf probe: Do not show @plt function by default
        ...
      b1412bd7