1. 16 Jun, 2022 4 commits
    • Tejun Heo's avatar
      cgroup: Use separate src/dst nodes when preloading css_sets for migration · 07fd5b6c
      Tejun Heo authored
      Each cset (css_set) is pinned by its tasks. When we're moving tasks around
      across csets for a migration, we need to hold the source and destination
      csets to ensure that they don't go away while we're moving tasks about. This
      is done by linking cset->mg_preload_node on either the
      mgctx->preloaded_src_csets or mgctx->preloaded_dst_csets list. Using the
      same cset->mg_preload_node for both the src and dst lists was deemed okay as
      a cset can't be both the source and destination at the same time.
      
      Unfortunately, this overloading becomes problematic when multiple tasks are
      involved in a migration and some of them are identity noop migrations while
      others are actually moving across cgroups. For example, this can happen with
      the following sequence on cgroup1:
      
       #1> mkdir -p /sys/fs/cgroup/misc/a/b
       #2> echo $$ > /sys/fs/cgroup/misc/a/cgroup.procs
       #3> RUN_A_COMMAND_WHICH_CREATES_MULTIPLE_THREADS &
       #4> PID=$!
       #5> echo $PID > /sys/fs/cgroup/misc/a/b/tasks
       #6> echo $PID > /sys/fs/cgroup/misc/a/cgroup.procs
      
      the process including the group leader back into a. In this final migration,
      non-leader threads would be doing identity migration while the group leader
      is doing an actual one.
      
      After #3, let's say the whole process was in cset A, and that after #4, the
      leader moves to cset B. Then, during #6, the following happens:
      
       1. cgroup_migrate_add_src() is called on B for the leader.
      
       2. cgroup_migrate_add_src() is called on A for the other threads.
      
       3. cgroup_migrate_prepare_dst() is called. It scans the src list.
      
       4. It notices that B wants to migrate to A, so it tries to A to the dst
          list but realizes that its ->mg_preload_node is already busy.
      
       5. and then it notices A wants to migrate to A as it's an identity
          migration, it culls it by list_del_init()'ing its ->mg_preload_node and
          putting references accordingly.
      
       6. The rest of migration takes place with B on the src list but nothing on
          the dst list.
      
      This means that A isn't held while migration is in progress. If all tasks
      leave A before the migration finishes and the incoming task pins it, the
      cset will be destroyed leading to use-after-free.
      
      This is caused by overloading cset->mg_preload_node for both src and dst
      preload lists. We wanted to exclude the cset from the src list but ended up
      inadvertently excluding it from the dst list too.
      
      This patch fixes the issue by separating out cset->mg_preload_node into
      ->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst
      preloadings don't interfere with each other.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarMukesh Ojha <quic_mojha@quicinc.com>
      Reported-by: default avatarshisiyuan <shisiyuan19870131@gmail.com>
      Link: http://lkml.kernel.org/r/1654187688-27411-1-git-send-email-shisiyuan@xiaomi.com
      Link: https://www.spinics.net/lists/cgroups/msg33313.html
      Fixes: f817de98 ("cgroup: prepare migration path for unified hierarchy")
      Cc: stable@vger.kernel.org # v3.16+
      07fd5b6c
    • Linus Torvalds's avatar
      Merge tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 48a23ec6
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Mostly driver fixes.
      
        Current release - regressions:
      
         - Revert "net: Add a second bind table hashed by port and address",
           needs more work
      
         - amd-xgbe: use platform_irq_count(), static setup of IRQ resources
           had been removed from DT core
      
         - dts: at91: ksz9477_evb: add phy-mode to fix port/phy validation
      
        Current release - new code bugs:
      
         - hns3: modify the ring param print info
      
        Previous releases - always broken:
      
         - axienet: make the 64b addressable DMA depends on 64b architectures
      
         - iavf: fix issue with MAC address of VF shown as zero
      
         - ice: fix PTP TX timestamp offset calculation
      
         - usb: ax88179_178a needs FLAG_SEND_ZLP
      
        Misc:
      
         - document some net.sctp.* sysctls"
      
      * tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
        net: axienet: add missing error return code in axienet_probe()
        Revert "net: Add a second bind table hashed by port and address"
        net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg
        net: usb: ax88179_178a needs FLAG_SEND_ZLP
        MAINTAINERS: add include/dt-bindings/net to NETWORKING DRIVERS
        ARM: dts: at91: ksz9477_evb: fix port/phy validation
        net: bgmac: Fix an erroneous kfree() in bgmac_remove()
        ice: Fix memory corruption in VF driver
        ice: Fix queue config fail handling
        ice: Sync VLAN filtering features for DVM
        ice: Fix PTP TX timestamp offset calculation
        mlxsw: spectrum_cnt: Reorder counter pools
        docs: networking: phy: Fix a typo
        amd-xgbe: Use platform_irq_count()
        octeontx2-vf: Add support for adaptive interrupt coalescing
        xilinx:  Fix build on x86.
        net: axienet: Use iowrite64 to write all 64b descriptor pointers
        net: axienet: make the 64b addresable DMA depends on 64b archectures
        net: hns3: fix tm port shapping of fibre port is incorrect after driver initialization
        net: hns3: fix PF rss size initialization bug
        ...
      48a23ec6
    • Yang Yingliang's avatar
      net: axienet: add missing error return code in axienet_probe() · 2e7bf4a6
      Yang Yingliang authored
      It should return error code in error path in axienet_probe().
      
      Fixes: 00be43a7 ("net: axienet: make the 64b addresable DMA depends on 64b archectures")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Link: https://lore.kernel.org/r/20220616062917.3601-1-yangyingliang@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2e7bf4a6
    • Joanne Koong's avatar
      Revert "net: Add a second bind table hashed by port and address" · 593d1ebe
      Joanne Koong authored
      This reverts:
      
      commit d5a42de8 ("net: Add a second bind table hashed by port and address")
      commit 538aaf9b ("selftests: Add test for timing a bind request to a port with a populated bhash entry")
      Link: https://lore.kernel.org/netdev/20220520001834.2247810-1-kuba@kernel.org/
      
      There are a few things that need to be fixed here:
      * Updating bhash2 in cases where the socket's rcv saddr changes
      * Adding bhash2 hashbucket locks
      
      Links to syzbot reports:
      https://lore.kernel.org/netdev/00000000000022208805e0df247a@google.com/
      https://lore.kernel.org/netdev/0000000000003f33bc05dfaf44fe@google.com/
      
      Fixes: d5a42de8 ("net: Add a second bind table hashed by port and address")
      Reported-by: syzbot+015d756bbd1f8b5c8f09@syzkaller.appspotmail.com
      Reported-by: syzbot+98fd2d1422063b0f8c44@syzkaller.appspotmail.com
      Reported-by: syzbot+0a847a982613c6438fba@syzkaller.appspotmail.com
      Signed-off-by: default avatarJoanne Koong <joannelkoong@gmail.com>
      Link: https://lore.kernel.org/r/20220615193213.2419568-1-joannelkoong@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      593d1ebe
  2. 15 Jun, 2022 11 commits
  3. 14 Jun, 2022 11 commits
    • Linus Torvalds's avatar
      netfs: fix up netfs_inode_init() docbook comment · 018ab4fa
      Linus Torvalds authored
      Commit e81fb419 ("netfs: Further cleanups after struct netfs_inode
      wrapper introduced") changed the argument types and names, and actually
      updated the comment too (although that was thanks to David Howells, not
      me: my original patch only changed the code).
      
      But the comment fixup didn't go quite far enough, and didn't change the
      argument name in the comment, resulting in
      
        include/linux/netfs.h:314: warning: Function parameter or member 'ctx' not described in 'netfs_inode_init'
        include/linux/netfs.h:314: warning: Excess function parameter 'inode' description in 'netfs_inode_init'
      
      during htmldoc generation.
      
      Fixes: e81fb419 ("netfs: Further cleanups after struct netfs_inode wrapper introduced")
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      018ab4fa
    • Przemyslaw Patynowski's avatar
      ice: Fix memory corruption in VF driver · efe41860
      Przemyslaw Patynowski authored
      Disable VF's RX/TX queues, when it's disabled. VF can have queues enabled,
      when it requests a reset. If PF driver assumes that VF is disabled,
      while VF still has queues configured, VF may unmap DMA resources.
      In such scenario device still can map packets to memory, which ends up
      silently corrupting it.
      Previously, VF driver could experience memory corruption, which lead to
      crash:
      [ 5119.170157] BUG: unable to handle kernel paging request at 00001b9780003237
      [ 5119.170166] PGD 0 P4D 0
      [ 5119.170173] Oops: 0002 [#1] PREEMPT_RT SMP PTI
      [ 5119.170181] CPU: 30 PID: 427592 Comm: kworker/u96:2 Kdump: loaded Tainted: G        W I      --------- -  - 4.18.0-372.9.1.rt7.166.el8.x86_64 #1
      [ 5119.170189] Hardware name: Dell Inc. PowerEdge R740/014X06, BIOS 2.3.10 08/15/2019
      [ 5119.170193] Workqueue: iavf iavf_adminq_task [iavf]
      [ 5119.170219] RIP: 0010:__page_frag_cache_drain+0x5/0x30
      [ 5119.170238] Code: 0f 0f b6 77 51 85 f6 74 07 31 d2 e9 05 df ff ff e9 90 fe ff ff 48 8b 05 49 db 33 01 eb b4 0f 1f 80 00 00 00 00 0f 1f 44 00 00 <f0> 29 77 34 74 01 c3 48 8b 07 f6 c4 80 74 0f 0f b6 77 51 85 f6 74
      [ 5119.170244] RSP: 0018:ffffa43b0bdcfd78 EFLAGS: 00010282
      [ 5119.170250] RAX: ffffffff896b3e40 RBX: ffff8fb282524000 RCX: 0000000000000002
      [ 5119.170254] RDX: 0000000049000000 RSI: 0000000000000000 RDI: 00001b9780003203
      [ 5119.170259] RBP: ffff8fb248217b00 R08: 0000000000000022 R09: 0000000000000009
      [ 5119.170262] R10: 2b849d6300000000 R11: 0000000000000020 R12: 0000000000000000
      [ 5119.170265] R13: 0000000000001000 R14: 0000000000000009 R15: 0000000000000000
      [ 5119.170269] FS:  0000000000000000(0000) GS:ffff8fb1201c0000(0000) knlGS:0000000000000000
      [ 5119.170274] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 5119.170279] CR2: 00001b9780003237 CR3: 00000008f3e1a003 CR4: 00000000007726e0
      [ 5119.170283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 5119.170286] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 5119.170290] PKRU: 55555554
      [ 5119.170292] Call Trace:
      [ 5119.170298]  iavf_clean_rx_ring+0xad/0x110 [iavf]
      [ 5119.170324]  iavf_free_rx_resources+0xe/0x50 [iavf]
      [ 5119.170342]  iavf_free_all_rx_resources.part.51+0x30/0x40 [iavf]
      [ 5119.170358]  iavf_virtchnl_completion+0xd8a/0x15b0 [iavf]
      [ 5119.170377]  ? iavf_clean_arq_element+0x210/0x280 [iavf]
      [ 5119.170397]  iavf_adminq_task+0x126/0x2e0 [iavf]
      [ 5119.170416]  process_one_work+0x18f/0x420
      [ 5119.170429]  worker_thread+0x30/0x370
      [ 5119.170437]  ? process_one_work+0x420/0x420
      [ 5119.170445]  kthread+0x151/0x170
      [ 5119.170452]  ? set_kthread_struct+0x40/0x40
      [ 5119.170460]  ret_from_fork+0x35/0x40
      [ 5119.170477] Modules linked in: iavf sctp ip6_udp_tunnel udp_tunnel mlx4_en mlx4_core nfp tls vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr iTCO_wdt iTCO_vendor_support dell_smbios wmi_bmof dell_wmi_descriptor dcdbas kvm_intel kvm irqbypass intel_rapl_common isst_if_common skx_edac irdma nfit libnvdimm x86_pkg_temp_thermal i40e intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ib_uverbs rapl ipmi_ssif intel_cstate intel_uncore mei_me pcspkr acpi_ipmi ib_core mei lpc_ich i2c_i801 ipmi_si ipmi_devintf wmi ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ice ahci drm libahci crc32c_intel libata tg3 megaraid_sas
      [ 5119.170613]  i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: iavf]
      [ 5119.170627] CR2: 00001b9780003237
      
      Fixes: ec4f5a43 ("ice: Check if VF is disabled for Opcode and other operations")
      Signed-off-by: default avatarPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
      Co-developed-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      efe41860
    • Przemyslaw Patynowski's avatar
      ice: Fix queue config fail handling · be2af714
      Przemyslaw Patynowski authored
      Disable VF's RX/TX queues, when VIRTCHNL_OP_CONFIG_VSI_QUEUES fail.
      Not disabling them might lead to scenario, where PF driver leaves VF
      queues enabled, when VF's VSI failed queue config.
      In this scenario VF should not have RX/TX queues enabled. If PF failed
      to set up VF's queues, VF will reset due to TX timeouts in VF driver.
      Initialize iterator 'i' to -1, so if error happens prior to configuring
      queues then error path code will not disable queue 0. Loop that
      configures queues will is using same iterator, so error path code will
      only disable queues that were configured.
      
      Fixes: 77ca27c4 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap")
      Suggested-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      be2af714
    • Roman Storozhenko's avatar
      ice: Sync VLAN filtering features for DVM · 9542ef4f
      Roman Storozhenko authored
      VLAN filtering features, that is C-Tag and S-Tag, in DVM mode must be
      both enabled or disabled.
      In case of turning off/on only one of the features, another feature must
      be turned off/on automatically with issuing an appropriate message to
      the kernel log.
      
      Fixes: 1babaf77 ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev")
      Signed-off-by: default avatarRoman Storozhenko <roman.storozhenko@intel.com>
      Co-developed-by: default avatarAnatolii Gerasymenko <anatolii.gerasymenko@intel.com>
      Signed-off-by: default avatarAnatolii Gerasymenko <anatolii.gerasymenko@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      9542ef4f
    • Michal Michalik's avatar
      ice: Fix PTP TX timestamp offset calculation · 71a579f0
      Michal Michalik authored
      The offset was being incorrectly calculated for E822 - that led to
      collisions in choosing TX timestamp register location when more than
      one port was trying to use timestamping mechanism.
      
      In E822 one quad is being logically split between ports, so quad 0 is
      having trackers for ports 0-3, quad 1 ports 4-7 etc. Each port should
      have separate memory location for tracking timestamps. Due to error for
      example ports 1 and 2 had been assigned to quad 0 with same offset (0),
      while port 1 should have offset 0 and 1 offset 16.
      
      Fix it by correctly calculating quad offset.
      
      Fixes: 3a749623 ("ice: implement basic E822 PTP support")
      Signed-off-by: default avatarMichal Michalik <michal.michalik@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      71a579f0
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 24625f7d
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "While last week's pull request contained miscellaneous fixes for x86,
        this one covers other architectures, selftests changes, and a bigger
        series for APIC virtualization bugs that were discovered during 5.20
        development. The idea is to base 5.20 development for KVM on top of
        this tag.
      
        ARM64:
      
         - Properly reset the SVE/SME flags on vcpu load
      
         - Fix a vgic-v2 regression regarding accessing the pending state of a
           HW interrupt from userspace (and make the code common with vgic-v3)
      
         - Fix access to the idreg range for protected guests
      
         - Ignore 'kvm-arm.mode=protected' when using VHE
      
         - Return an error from kvm_arch_init_vm() on allocation failure
      
         - A bunch of small cleanups (comments, annotations, indentation)
      
        RISC-V:
      
         - Typo fix in arch/riscv/kvm/vmid.c
      
         - Remove broken reference pattern from MAINTAINERS entry
      
        x86-64:
      
         - Fix error in page tables with MKTME enabled
      
         - Dirty page tracking performance test extended to running a nested
           guest
      
         - Disable APICv/AVIC in cases that it cannot implement correctly"
      
      [ This merge also fixes a misplaced end parenthesis bug introduced in
        commit 3743c2f0 ("KVM: x86: inhibit APICv/AVIC on changes to APIC
        ID or APIC base") pointed out by Sean Christopherson ]
      
      Link: https://lore.kernel.org/all/20220610191813.371682-1-seanjc@google.com/
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits)
        KVM: selftests: Restrict test region to 48-bit physical addresses when using nested
        KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
        KVM: selftests: Clean up LIBKVM files in Makefile
        KVM: selftests: Link selftests directly with lib object files
        KVM: selftests: Drop unnecessary rule for STATIC_LIBS
        KVM: selftests: Add a helper to check EPT/VPID capabilities
        KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h
        KVM: selftests: Refactor nested_map() to specify target level
        KVM: selftests: Drop stale function parameter comment for nested_map()
        KVM: selftests: Add option to create 2M and 1G EPT mappings
        KVM: selftests: Replace x86_page_size with PG_LEVEL_XX
        KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE
        KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put
        KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking
        KVM: x86: disable preemption while updating apicv inhibition
        KVM: x86: SVM: fix avic_kick_target_vcpus_fast
        KVM: x86: SVM: remove avic's broken code that updated APIC ID
        KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base
        KVM: x86: document AVIC/APICv inhibit reasons
        KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs
        ...
      24625f7d
    • Linus Torvalds's avatar
      Merge tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8e8afafb
      Linus Torvalds authored
      Pull x86 MMIO stale data fixes from Thomas Gleixner:
       "Yet another hw vulnerability with a software mitigation: Processor
        MMIO Stale Data.
      
        They are a class of MMIO-related weaknesses which can expose stale
        data by propagating it into core fill buffers. Data which can then be
        leaked using the usual speculative execution methods.
      
        Mitigations include this set along with microcode updates and are
        similar to MDS and TAA vulnerabilities: VERW now clears those buffers
        too"
      
      * tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/speculation/mmio: Print SMT warning
        KVM: x86/speculation: Disable Fill buffer clear within guests
        x86/speculation/mmio: Reuse SRBDS mitigation for SBDS
        x86/speculation/srbds: Update SRBDS mitigation selection
        x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data
        x86/speculation/mmio: Enable CPU Fill buffer clearing on idle
        x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations
        x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data
        x86/speculation: Add a common function for MD_CLEAR mitigation update
        x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug
        Documentation: Add documentation for Processor MMIO Stale Data
      8e8afafb
    • Petr Machata's avatar
      mlxsw: spectrum_cnt: Reorder counter pools · 4b7a632a
      Petr Machata authored
      Both RIF and ACL flow counters use a 24-bit SW-managed counter address to
      communicate which counter they want to bind.
      
      In a number of Spectrum FW releases, binding a RIF counter is broken and
      slices the counter index to 16 bits. As a result, on Spectrum-2 and above,
      no more than about 410 RIF counters can be effectively used. This
      translates to 205 netdevices for which L3 HW stats can be enabled. (This
      does not happen on Spectrum-1, because there are fewer counters available
      overall and the counter index never exceeds 16 bits.)
      
      Binding counters to ACLs does not have this issue. Therefore reorder the
      counter allocation scheme so that RIF counters come first and therefore get
      lower indices that are below the 16-bit barrier.
      
      Fixes: 98e60dce ("Merge branch 'mlxsw-Introduce-initial-Spectrum-2-support'")
      Reported-by: default avatarMaksym Yaremchuk <maksymy@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/20220613125017.2018162-1-idosch@nvidia.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4b7a632a
    • Christian Brauner's avatar
      fs: account for group membership · 168f9128
      Christian Brauner authored
      When calling setattr_prepare() to determine the validity of the
      attributes the ia_{g,u}id fields contain the value that will be written
      to inode->i_{g,u}id. This is exactly the same for idmapped and
      non-idmapped mounts and allows callers to pass in the values they want
      to see written to inode->i_{g,u}id.
      
      When group ownership is changed a caller whose fsuid owns the inode can
      change the group of the inode to any group they are a member of. When
      searching through the caller's groups we need to use the gid mapped
      according to the idmapped mount otherwise we will fail to change
      ownership for unprivileged users.
      
      Consider a caller running with fsuid and fsgid 1000 using an idmapped
      mount that maps id 65534 to 1000 and 65535 to 1001. Consequently, a file
      owned by 65534:65535 in the filesystem will be owned by 1000:1001 in the
      idmapped mount.
      
      The caller now requests the gid of the file to be changed to 1000 going
      through the idmapped mount. In the vfs we will immediately map the
      requested gid to the value that will need to be written to inode->i_gid
      and place it in attr->ia_gid. Since this idmapped mount maps 65534 to
      1000 we place 65534 in attr->ia_gid.
      
      When we check whether the caller is allowed to change group ownership we
      first validate that their fsuid matches the inode's uid. The
      inode->i_uid is 65534 which is mapped to uid 1000 in the idmapped mount.
      Since the caller's fsuid is 1000 we pass the check.
      
      We now check whether the caller is allowed to change inode->i_gid to the
      requested gid by calling in_group_p(). This will compare the passed in
      gid to the caller's fsgid and search the caller's additional groups.
      
      Since we're dealing with an idmapped mount we need to pass in the gid
      mapped according to the idmapped mount. This is akin to checking whether
      a caller is privileged over the future group the inode is owned by. And
      that needs to take the idmapped mount into account. Note, all helpers
      are nops without idmapped mounts.
      
      New regression test sent to xfstests.
      
      Link: https://github.com/lxc/lxd/issues/10537
      Link: https://lore.kernel.org/r/20220613111517.2186646-1-brauner@kernel.org
      Fixes: 2f221d6f ("attr: handle idmapped mounts")
      Cc: Seth Forshee <sforshee@digitalocean.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org # 5.15+
      CC: linux-fsdevel@vger.kernel.org
      Reviewed-by: default avatarSeth Forshee <sforshee@digitalocean.com>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      168f9128
    • Jonathan Neuschäfer's avatar
      docs: networking: phy: Fix a typo · 9cc8ea99
      Jonathan Neuschäfer authored
      Write "to be operated" instead of "to be operate".
      Signed-off-by: default avatarJonathan Neuschäfer <j.neuschaefer@gmx.net>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20220610072809.352962-1-j.neuschaefer@gmx.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9cc8ea99
    • Jean-Philippe Brucker's avatar
      amd-xgbe: Use platform_irq_count() · 884c65e4
      Jean-Philippe Brucker authored
      The AMD XGbE driver currently counts the number of interrupts assigned
      to the device by inspecting the pdev->resource array. Since commit
      a1a2b712 ("of/platform: Drop static setup of IRQ resource from DT
      core") removed IRQs from this array, the driver now attempts to get all
      interrupts from 1 to -1U and gives up probing once it reaches an invalid
      interrupt index.
      
      Obtain the number of IRQs with platform_irq_count() instead.
      
      Fixes: a1a2b712 ("of/platform: Drop static setup of IRQ resource from DT core")
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
      Acked-by: default avatarRob Herring <robh@kernel.org>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Link: https://lore.kernel.org/r/20220609161457.69614-1-jean-philippe@linaro.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      884c65e4
  4. 13 Jun, 2022 14 commits