1. 21 Dec, 2022 6 commits
    • Dave Marchevsky's avatar
      bpf, x86: Improve PROBE_MEM runtime load check · 90156f4b
      Dave Marchevsky authored
      This patch rewrites the runtime PROBE_MEM check insns emitted by the BPF
      JIT in order to ensure load safety. The changes in the patch fix two
      issues with the previous logic and more generally improve size of
      emitted code. Paragraphs between this one and "FIX 1" below explain the
      purpose of the runtime check and examine the current implementation.
      
      When a load is marked PROBE_MEM - e.g. due to PTR_UNTRUSTED access - the
      address being loaded from is not necessarily valid. The BPF jit sets up
      exception handlers for each such load which catch page faults and 0 out
      the destination register.
      
      Arbitrary register-relative loads can escape this exception handling
      mechanism. Specifically, a load like dst_reg = *(src_reg + off) will not
      trigger BPF exception handling if (src_reg + off) is outside of kernel
      address space, resulting in an uncaught page fault. A concrete example
      of such behavior is a program like:
      
        struct result {
          char space[40];
          long a;
        };
      
        /* if err, returns ERR_PTR(-EINVAL) */
        struct result *ptr = get_ptr_maybe_err();
        long x = ptr->a;
      
      If get_ptr_maybe_err returns ERR_PTR(-EINVAL) and the result isn't
      checked for err, 'result' will be (u64)-EINVAL, a number close to
      U64_MAX. The ptr->a load will be > U64_MAX and will wrap over to a small
      positive u64, which will be in userspace and thus not covered by BPF
      exception handling mechanism.
      
      In order to prevent such loads from occurring, the BPF jit emits some
      instructions which do runtime checking of (src_reg + off) and skip the
      actual load if it's out of range. As an example, here are instructions
      emitted for a %rdi = *(%rdi + 0x10) PROBE_MEM load:
      
        72:   movabs $0x800000000010,%r11 --|
        7c:   cmp    %r11,%rdi              |- 72 - 7f: Check 1
        7f:    jb    0x000000000000008d   --|
        81:   mov    %rdi,%r11             -----|
        84:   add    $0x0000000000000010,%r11   |- 81-8b: Check 2
        8b:   jnc    0x0000000000000091    -----|
        8d:   xor    %edi,%edi             ---- 0 out dest
        8f:   jmp    0x0000000000000095
        91:   mov    0x10(%rdi),%rdi       ---- Actual load
        95:
      
      The JIT considers kernel address space to start at MAX_TASK_SIZE +
      PAGE_SIZE. Determining whether a load will be outside of kernel address
      space should be a simple check:
      
        (src_reg + off) >= MAX_TASK_SIZE + PAGE_SIZE
      
      But because there is only one spare register when the checking logic is
      emitted, this logic is split into two checks:
      
        Check 1: src_reg >= (MAX_TASK_SIZE + PAGE_SIZE - off)
        Check 2: src_reg + off doesn't wrap over U64_MAX and result in small pos u64
      
      Emitted insns implementing Checks 1 and 2 are annotated in the above
      example. Check 1 can be done with a single spare register since the
      source reg by definition is the left-hand-side of the inequality.
      Since adding 'off' to both sides of Check 1's inequality results in the
      original inequality we want, it's equivalent to testing that inequality.
      Except in the case where src_reg + off wraps past U64_MAX, which is why
      Check 2 needs to actually add src_reg + off if Check 1 passes - again
      using the single spare reg.
      
      FIX 1: The Check 1 inequality listed above is not what current code is
      doing. Current code is a bit more pessimistic, instead checking:
      
        src_reg >= (MAX_TASK_SIZE + PAGE_SIZE + abs(off))
      
      The 0x800000000010 in above example is from this current check. If Check
      1 was corrected to use the correct right-hand-side, the value would be
      0x7ffffffffff0. This patch changes the checking logic more broadly (FIX
      2 below will elaborate), fixing this issue as a side-effect of the
      rewrite. Regardless, it's important to understand why Check 1 should've
      been doing MAX_TASK_SIZE + PAGE_SIZE - off before proceeding.
      
      FIX 2: Current code relies on a 'jnc' to determine whether src_reg + off
      addition wrapped over. For negative offsets this logic is incorrect.
      Consider Check 2 insns emitted when off = -0x10:
      
        81:   mov    %rdi,%r11
        84:   add    0xfffffffffffffff0,%r11
        8b:   jnc    0x0000000000000091
      
      2's complement representation of -0x10 is a large positive u64. Any
      value of src_reg that passes Check 1 will result in carry flag being set
      after (src_reg + off) addition. So a load with any negative offset will
      always fail Check 2 at runtime and never do the actual load. This patch
      fixes the negative offset issue by rewriting both checks in order to not
      rely on carry flag.
      
      The rewrite takes advantage of the fact that, while we only have one
      scratch reg to hold arbitrary values, we know the offset at JIT time.
      This we can use src_reg as a temporary scratch reg to hold src_reg +
      offset since we can return it to its original value by later subtracting
      offset. As a result we can directly check the original inequality we
      care about:
      
        (src_reg + off) >= MAX_TASK_SIZE + PAGE_SIZE
      
      For a load like %rdi = *(%rsi + -0x10), this results in emitted code:
      
        43:   movabs $0x800000000000,%r11
        4d:   add    $0xfffffffffffffff0,%rsi --- src_reg += off
        54:   cmp    %r11,%rsi                --- Check original inequality
        57:   jae    0x000000000000005d
        59:   xor    %edi,%edi
        5b:   jmp    0x0000000000000061
        5d:   mov    0x0(%rdi),%rsi           --- Actual Load
        61:   sub    $0xfffffffffffffff0,%rsi --- src_reg -= off
      
      Note that the actual load is always done with offset 0, since previous
      insns have already done src_reg += off. Regardless of whether the new
      check succeeds or fails, insn 61 is always executed, returning src_reg
      to its original value.
      
      Because the goal of these checks is to ensure that loaded-from address
      will be protected by BPF exception handler, the new check can safely
      ignore any wrapover from insn 4d. If such wrapped-over address passes
      insn 54 + 57's cmp-and-jmp it will have such protection so the load can
      proceed.
      
      IMPROVEMENTS: The above improved logic is 8 insns vs original logic's 9,
      and has 1 fewer jmp. The number of checking insns can be further
      improved in common scenarios:
      
      If src_reg == dst_reg, the actual load insn will clobber src_reg, so
      there's no original src_reg state for the sub insn immediately following
      the load to restore, so it can be omitted. In fact, it must be omitted
      since it would incorrectly subtract from the result of the load if it
      wasn't. So for src_reg == dst_reg, JIT emits these insns:
      
        3c:   movabs $0x800000000000,%r11
        46:   add    $0xfffffffffffffff0,%rdi
        4d:   cmp    %r11,%rdi
        50:   jae    0x0000000000000056
        52:   xor    %edi,%edi
        54:   jmp    0x000000000000005a
        56:   mov    0x0(%rdi),%rdi
        5a:
      
      The only difference from larger example being the omitted sub, which
      would've been insn 5a in this example.
      
      If offset == 0, we can similarly omit the sub as in previous case, since
      there's nothing added to subtract. For the same reason we can omit the
      addition as well, resulting in JIT emitting these insns:
      
        46:   movabs $0x800000000000,%r11
        4d:   cmp    %r11,%rdi
        50:   jae    0x0000000000000056
        52:   xor    %edi,%edi
        54:   jmp    0x000000000000005a
        56:   mov    0x0(%rdi),%rdi
        5a:
      
      Although the above example also has src_reg == dst_reg, the same
      offset == 0 optimization is valid to apply if src_reg != dst_reg.
      
      To summarize the improvements in emitted insn count for the
      check-and-load:
      
      BEFORE:                8 check insns, 3 jmps
      AFTER (general case):  7 check insns, 2 jmps (12.5% fewer insn, 33% jmp)
      AFTER (src == dst):    6 check insns, 2 jmps (25% fewer insn)
      AFTER (offset == 0):   5 check insns, 2 jmps (37.5% fewer insn)
      
      (Above counts don't include the 1 load insn, just checking around it)
      
      Based on BPF bytecode + JITted x86 insn I saw while experimenting with
      these improvements, I expect the src_reg == dst_reg case to occur most
      often, followed by offset == 0, then the general case.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20221216214319.3408356-1-davemarchevsky@fb.com
      90156f4b
    • Andrii Nakryiko's avatar
      libbpf: start v1.2 development cycle · 4ec38eda
      Andrii Nakryiko authored
      Bump current version for new development cycle to v1.2.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/20221221180049.853365-1-andrii@kernel.orgSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      4ec38eda
    • Martin KaFai Lau's avatar
      bpf: Reduce smap->elem_size · 552d42a3
      Martin KaFai Lau authored
      'struct bpf_local_storage_elem' has an unused 56 byte padding at the
      end due to struct's cache-line alignment requirement. This padding
      space is overlapped by storage value contents, so if we use sizeof()
      to calculate the total size, we overinflate it by 56 bytes. Use
      offsetof() instead to calculate more exact memory use.
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20221221013036.3427431-1-martin.lau@linux.dev
      552d42a3
    • Andrii Nakryiko's avatar
      Merge branch 'bpftool: improve error handing for missing .BTF section' · 7b43df6c
      Andrii Nakryiko authored
      Changbin Du says:
      
      ====================
      Display error message for missing ".BTF" section and clean up empty
      vmlinux.h file.
      
      v3:
       - fix typo and make error message consistent. (Andrii Nakryiko)
       - split out perf change.
      v2:
       - remove vmlinux specific error info.
       - use builtin target .DELETE_ON_ERROR: to delete empty vmlinux.h
      ====================
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      7b43df6c
    • Changbin Du's avatar
      bpf: makefiles: Do not generate empty vmlinux.h · e7f0d5cd
      Changbin Du authored
      Remove the empty vmlinux.h if bpftool failed to dump btf info.
      The empty vmlinux.h can hide real error when reading output
      of make.
      
      This is done by adding .DELETE_ON_ERROR special target in related
      makefiles.
      Signed-off-by: default avatarChangbin Du <changbin.du@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/bpf/20221217223509.88254-3-changbin.du@gmail.com
      e7f0d5cd
    • Changbin Du's avatar
      libbpf: Show error info about missing ".BTF" section · e6b4e1d7
      Changbin Du authored
      Show the real problem instead of just saying "No such file or directory".
      
      Now will print below info:
      libbpf: failed to find '.BTF' ELF section in /home/changbin/work/linux/vmlinux
      Error: failed to load BTF from /home/changbin/work/linux/vmlinux: No such file or directory
      Signed-off-by: default avatarChangbin Du <changbin.du@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20221217223509.88254-2-changbin.du@gmail.com
      e6b4e1d7
  2. 20 Dec, 2022 2 commits
  3. 19 Dec, 2022 9 commits
  4. 15 Dec, 2022 2 commits
  5. 14 Dec, 2022 7 commits
  6. 13 Dec, 2022 14 commits
    • Linus Torvalds's avatar
      Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · 7e68dd7d
      Linus Torvalds authored
      Pull networking updates from Paolo Abeni:
       "Core:
      
         - Allow live renaming when an interface is up
      
         - Add retpoline wrappers for tc, improving considerably the
           performances of complex queue discipline configurations
      
         - Add inet drop monitor support
      
         - A few GRO performance improvements
      
         - Add infrastructure for atomic dev stats, addressing long standing
           data races
      
         - De-duplicate common code between OVS and conntrack offloading
           infrastructure
      
         - A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements
      
         - Netfilter: introduce packet parser for tunneled packets
      
         - Replace IPVS timer-based estimators with kthreads to scale up the
           workload with the number of available CPUs
      
         - Add the helper support for connection-tracking OVS offload
      
        BPF:
      
         - Support for user defined BPF objects: the use case is to allocate
           own objects, build own object hierarchies and use the building
           blocks to build own data structures flexibly, for example, linked
           lists in BPF
      
         - Make cgroup local storage available to non-cgroup attached BPF
           programs
      
         - Avoid unnecessary deadlock detection and failures wrt BPF task
           storage helpers
      
         - A relevant bunch of BPF verifier fixes and improvements
      
         - Veristat tool improvements to support custom filtering, sorting,
           and replay of results
      
         - Add LLVM disassembler as default library for dumping JITed code
      
         - Lots of new BPF documentation for various BPF maps
      
         - Add bpf_rcu_read_{,un}lock() support for sleepable programs
      
         - Add RCU grace period chaining to BPF to wait for the completion of
           access from both sleepable and non-sleepable BPF programs
      
         - Add support storing struct task_struct objects as kptrs in maps
      
         - Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
           values
      
         - Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions
      
        Protocols:
      
         - TCP: implement Protective Load Balancing across switch links
      
         - TCP: allow dynamically disabling TCP-MD5 static key, reverting back
           to fast[er]-path
      
         - UDP: Introduce optional per-netns hash lookup table
      
         - IPv6: simplify and cleanup sockets disposal
      
         - Netlink: support different type policies for each generic netlink
           operation
      
         - MPTCP: add MSG_FASTOPEN and FastOpen listener side support
      
         - MPTCP: add netlink notification support for listener sockets events
      
         - SCTP: add VRF support, allowing sctp sockets binding to VRF devices
      
         - Add bridging MAC Authentication Bypass (MAB) support
      
         - Extensions for Ethernet VPN bridging implementation to better
           support multicast scenarios
      
         - More work for Wi-Fi 7 support, comprising conversion of all the
           existing drivers to internal TX queue usage
      
         - IPSec: introduce a new offload type (packet offload) allowing
           complete header processing and crypto offloading
      
         - IPSec: extended ack support for more descriptive XFRM error
           reporting
      
         - RXRPC: increase SACK table size and move processing into a
           per-local endpoint kernel thread, reducing considerably the
           required locking
      
         - IEEE 802154: synchronous send frame and extended filtering support,
           initial support for scanning available 15.4 networks
      
         - Tun: bump the link speed from 10Mbps to 10Gbps
      
         - Tun/VirtioNet: implement UDP segmentation offload support
      
        Driver API:
      
         - PHY/SFP: improve power level switching between standard level 1 and
           the higher power levels
      
         - New API for netdev <-> devlink_port linkage
      
         - PTP: convert existing drivers to new frequency adjustment
           implementation
      
         - DSA: add support for rx offloading
      
         - Autoload DSA tagging driver when dynamically changing protocol
      
         - Add new PCP and APPTRUST attributes to Data Center Bridging
      
         - Add configuration support for 800Gbps link speed
      
         - Add devlink port function attribute to enable/disable RoCE and
           migratable
      
         - Extend devlink-rate to support strict prioriry and weighted fair
           queuing
      
         - Add devlink support to directly reading from region memory
      
         - New device tree helper to fetch MAC address from nvmem
      
         - New big TCP helper to simplify temporary header stripping
      
        New hardware / drivers:
      
         - Ethernet:
            - Marvel Octeon CNF95N and CN10KB Ethernet Switches
            - Marvel Prestera AC5X Ethernet Switch
            - WangXun 10 Gigabit NIC
            - Motorcomm yt8521 Gigabit Ethernet
            - Microchip ksz9563 Gigabit Ethernet Switch
            - Microsoft Azure Network Adapter
            - Linux Automation 10Base-T1L adapter
      
         - PHY:
            - Aquantia AQR112 and AQR412
            - Motorcomm YT8531S
      
         - PTP:
            - Orolia ART-CARD
      
         - WiFi:
            - MediaTek Wi-Fi 7 (802.11be) devices
            - RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB
              devices
      
         - Bluetooth:
            - Broadcom BCM4377/4378/4387 Bluetooth chipsets
            - Realtek RTL8852BE and RTL8723DS
            - Cypress.CYW4373A0 WiFi + Bluetooth combo device
      
        Drivers:
      
         - CAN:
            - gs_usb: bus error reporting support
            - kvaser_usb: listen only and bus error reporting support
      
         - Ethernet NICs:
            - Intel (100G):
               - extend action skbedit to RX queue mapping
               - implement devlink-rate support
               - support direct read from memory
            - nVidia/Mellanox (mlx5):
               - SW steering improvements, increasing rules update rate
               - Support for enhanced events compression
               - extend H/W offload packet manipulation capabilities
               - implement IPSec packet offload mode
            - nVidia/Mellanox (mlx4):
               - better big TCP support
            - Netronome Ethernet NICs (nfp):
               - IPsec offload support
               - add support for multicast filter
            - Broadcom:
               - RSS and PTP support improvements
            - AMD/SolarFlare:
               - netlink extened ack improvements
               - add basic flower matches to offload, and related stats
            - Virtual NICs:
               - ibmvnic: introduce affinity hint support
            - small / embedded:
               - FreeScale fec: add initial XDP support
               - Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood
               - TI am65-cpsw: add suspend/resume support
               - Mediatek MT7986: add RX wireless wthernet dispatch support
               - Realtek 8169: enable GRO software interrupt coalescing per
                 default
      
         - Ethernet high-speed switches:
            - Microchip (sparx5):
               - add support for Sparx5 TC/flower H/W offload via VCAP
            - Mellanox mlxsw:
               - add 802.1X and MAC Authentication Bypass offload support
               - add ip6gre support
      
         - Embedded Ethernet switches:
            - Mediatek (mtk_eth_soc):
               - improve PCS implementation, add DSA untag support
               - enable flow offload support
            - Renesas:
               - add rswitch R-Car Gen4 gPTP support
            - Microchip (lan966x):
               - add full XDP support
               - add TC H/W offload via VCAP
               - enable PTP on bridge interfaces
            - Microchip (ksz8):
               - add MTU support for KSZ8 series
      
         - Qualcomm 802.11ax WiFi (ath11k):
            - support configuring channel dwell time during scan
      
         - MediaTek WiFi (mt76):
            - enable Wireless Ethernet Dispatch (WED) offload support
            - add ack signal support
            - enable coredump support
            - remain_on_channel support
      
         - Intel WiFi (iwlwifi):
            - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities
            - 320 MHz channels support
      
         - RealTek WiFi (rtw89):
            - new dynamic header firmware format support
            - wake-over-WLAN support"
      
      * tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits)
        ipvs: fix type warning in do_div() on 32 bit
        net: lan966x: Remove a useless test in lan966x_ptp_add_trap()
        net: ipa: add IPA v4.7 support
        dt-bindings: net: qcom,ipa: Add SM6350 compatible
        bnxt: Use generic HBH removal helper in tx path
        IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver
        selftests: forwarding: Add bridge MDB test
        selftests: forwarding: Rename bridge_mdb test
        bridge: mcast: Support replacement of MDB port group entries
        bridge: mcast: Allow user space to specify MDB entry routing protocol
        bridge: mcast: Allow user space to add (*, G) with a source list and filter mode
        bridge: mcast: Add support for (*, G) with a source list and filter mode
        bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source
        bridge: mcast: Add a flag for user installed source entries
        bridge: mcast: Expose __br_multicast_del_group_src()
        bridge: mcast: Expose br_multicast_new_group_src()
        bridge: mcast: Add a centralized error path
        bridge: mcast: Place netlink policy before validation functions
        bridge: mcast: Split (*, G) and (S, G) addition into different functions
        bridge: mcast: Do not derive entry type from its filter mode
        ...
      7e68dd7d
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20221213' of https://github.com/jcmvbkbc/linux-xtensa · 1ca06f1c
      Linus Torvalds authored
      Pull Xtensa updates from Max Filippov:
      
       - fix kernel build with gcc-13
      
       - various minor fixes
      
      * tag 'xtensa-20221213' of https://github.com/jcmvbkbc/linux-xtensa:
        xtensa: add __umulsidi3 helper
        xtensa: update config files
        MAINTAINERS: update the 'T:' entry for xtensa
      1ca06f1c
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · 4cb1fc6f
      Linus Torvalds authored
      Pull ARM updates from Russell King:
      
       - update unwinder to cope with module PLTs
      
       - enable UBSAN on ARM
      
       - improve kernel fault message
      
       - update UEFI runtime page tables dump
      
       - avoid clang's __aeabi_uldivmod generated in NWFPE code
      
       - disable FIQs on CPU shutdown paths
      
       - update XOR register usage
      
       - a number of build updates (using .arch, thread pointer, removal of
         lazy evaluation in Makefile)
      
       - conversion of stacktrace code to stackwalk
      
       - findbit assembly updates
      
       - hwcap feature updates for ARMv8 CPUs
      
       - instruction dump updates for big-endian platforms
      
       - support for function error injection
      
      * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (31 commits)
        ARM: 9279/1: support function error injection
        ARM: 9277/1: Make the dumped instructions are consistent with the disassembled ones
        ARM: 9276/1: Refactor dump_instr()
        ARM: 9275/1: Drop '-mthumb' from AFLAGS_ISA
        ARM: 9274/1: Add hwcap for Speculative Store Bypassing Safe
        ARM: 9273/1: Add hwcap for Speculation Barrier(SB)
        ARM: 9272/1: vfp: Add hwcap for FEAT_AA32I8MM
        ARM: 9271/1: vfp: Add hwcap for FEAT_AA32BF16
        ARM: 9270/1: vfp: Add hwcap for FEAT_FHM
        ARM: 9269/1: vfp: Add hwcap for FEAT_DotProd
        ARM: 9268/1: vfp: Add hwcap FPHP and ASIMDHP for FEAT_FP16
        ARM: 9267/1: Define Armv8 registers in AArch32 state
        ARM: findbit: add unwinder information
        ARM: findbit: operate by words
        ARM: findbit: convert to macros
        ARM: findbit: provide more efficient ARMv7 implementation
        ARM: findbit: document ARMv5 bit offset calculation
        ARM: 9259/1: stacktrace: Convert stacktrace to generic ARCH_STACKWALK
        ARM: 9258/1: stacktrace: Make stack walk callback consistent with generic code
        ARM: 9265/1: pass -march= only to compiler
        ...
      4cb1fc6f
    • Linus Torvalds's avatar
      Merge tag 'x86_sev_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 740afa4d
      Linus Torvalds authored
      Pull x86 sev updates from Borislav Petkov:
      
       - Two minor fixes to the sev-guest driver
      
      * tag 'x86_sev_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        virt/sev-guest: Add a MODULE_ALIAS
        virt/sev-guest: Remove unnecessary free in init_crypto()
      740afa4d
    • Linus Torvalds's avatar
      Merge tag 'x86_paravirt_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 82c72902
      Linus Torvalds authored
      Pull x86 paravirt update from Borislav Petkov:
      
       - Simplify paravirt patching machinery by removing the now unused
         clobber mask
      
      * tag 'x86_paravirt_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/paravirt: Remove clobber bitmask from .parainstructions
      82c72902
    • Linus Torvalds's avatar
      Merge tag 'x86_microcode_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a70210f4
      Linus Torvalds authored
      Pull x86 microcode and IFS updates from Borislav Petkov:
       "The IFS (In-Field Scan) stuff goes through tip because the IFS driver
        uses the same structures and similar functionality as the microcode
        loader and it made sense to route it all through this branch so that
        there are no conflicts.
      
         - Add support for multiple testing sequences to the Intel In-Field
           Scan driver in order to be able to run multiple different test
           patterns. Rework things and remove the BROKEN dependency so that
           the driver can be enabled (Jithu Joseph)
      
         - Remove the subsys interface usage in the microcode loader because
           it is not really needed
      
         - A couple of smaller fixes and cleanups"
      
      * tag 'x86_microcode_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
        x86/microcode/intel: Do not retry microcode reloading on the APs
        x86/microcode/intel: Do not print microcode revision and processor flags
        platform/x86/intel/ifs: Add missing kernel-doc entry
        Revert "platform/x86/intel/ifs: Mark as BROKEN"
        Documentation/ABI: Update IFS ABI doc
        platform/x86/intel/ifs: Add current_batch sysfs entry
        platform/x86/intel/ifs: Remove reload sysfs entry
        platform/x86/intel/ifs: Add metadata validation
        platform/x86/intel/ifs: Use generic microcode headers and functions
        platform/x86/intel/ifs: Add metadata support
        x86/microcode/intel: Use a reserved field for metasize
        x86/microcode/intel: Add hdr_type to intel_microcode_sanity_check()
        x86/microcode/intel: Reuse microcode_sanity_check()
        x86/microcode/intel: Use appropriate type in microcode_sanity_check()
        x86/microcode/intel: Reuse find_matching_signature()
        platform/x86/intel/ifs: Remove memory allocation from load path
        platform/x86/intel/ifs: Remove image loading during init
        platform/x86/intel/ifs: Return a more appropriate error code
        platform/x86/intel/ifs: Remove unused selection
        x86/microcode: Drop struct ucode_cpu_info.valid
        ...
      a70210f4
    • Linus Torvalds's avatar
      Merge tag 'x86_cpu_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3ef3ace4
      Linus Torvalds authored
      Pull x86 cpu updates from Borislav Petkov:
      
       - Split MTRR and PAT init code to accomodate at least Xen PV and TDX
         guests which do not get MTRRs exposed but only PAT. (TDX guests do
         not support the cache disabling dance when setting up MTRRs so they
         fall under the same category)
      
         This is a cleanup work to remove all the ugly workarounds for such
         guests and init things separately (Juergen Gross)
      
       - Add two new Intel CPUs to the list of CPUs with "normal" Energy
         Performance Bias, leading to power savings
      
       - Do not do bus master arbitration in C3 (ARB_DISABLE) on modern
         Centaur CPUs
      
      * tag 'x86_cpu_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
        x86/mtrr: Make message for disabled MTRRs more descriptive
        x86/pat: Handle TDX guest PAT initialization
        x86/cpuid: Carve out all CPUID functionality
        x86/cpu: Switch to cpu_feature_enabled() for X86_FEATURE_XENPV
        x86/cpu: Remove X86_FEATURE_XENPV usage in setup_cpu_entry_area()
        x86/cpu: Drop 32-bit Xen PV guest code in update_task_stack()
        x86/cpu: Remove unneeded 64-bit dependency in arch_enter_from_user_mode()
        x86/cpufeatures: Add X86_FEATURE_XENPV to disabled-features.h
        x86/acpi/cstate: Optimize ARB_DISABLE on Centaur CPUs
        x86/mtrr: Simplify mtrr_ops initialization
        x86/cacheinfo: Switch cache_ap_init() to hotplug callback
        x86: Decouple PAT and MTRR handling
        x86/mtrr: Add a stop_machine() handler calling only cache_cpu_init()
        x86/mtrr: Let cache_aps_delayed_init replace mtrr_aps_delayed_init
        x86/mtrr: Get rid of __mtrr_enabled bool
        x86/mtrr: Simplify mtrr_bp_init()
        x86/mtrr: Remove set_all callback from struct mtrr_ops
        x86/mtrr: Disentangle MTRR init from PAT init
        x86/mtrr: Move cache control code to cacheinfo.c
        x86/mtrr: Split MTRR-specific handling from cache dis/enabling
        ...
      3ef3ace4
    • Linus Torvalds's avatar
      Merge tag 'x86_boot_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4eb77fa1
      Linus Torvalds authored
      Pull x86 boot updates from Borislav Petkov:
       "A  of early boot cleanups and fixes.
      
         - Do some spring cleaning to the compressed boot code by moving the
           EFI mixed-mode code to a separate compilation unit, the AMD memory
           encryption early code where it belongs and fixing up build
           dependencies. Make the deprecated EFI handover protocol optional
           with the goal of removing it at some point (Ard Biesheuvel)
      
         - Skip realmode init code on Xen PV guests as it is not needed there
      
         - Remove an old 32-bit PIC code compiler workaround"
      
      * tag 'x86_boot_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/boot: Remove x86_32 PIC using %ebx workaround
        x86/boot: Skip realmode init code when running as Xen PV guest
        x86/efi: Make the deprecated EFI handover protocol optional
        x86/boot/compressed: Only build mem_encrypt.S if AMD_MEM_ENCRYPT=y
        x86/boot/compressed: Adhere to calling convention in get_sev_encryption_bit()
        x86/boot/compressed: Move startup32_check_sev_cbit() out of head_64.S
        x86/boot/compressed: Move startup32_check_sev_cbit() into .text
        x86/boot/compressed: Move startup32_load_idt() out of head_64.S
        x86/boot/compressed: Move startup32_load_idt() into .text section
        x86/boot/compressed: Pull global variable reference into startup32_load_idt()
        x86/boot/compressed: Avoid touching ECX in startup32_set_idt_entry()
        x86/boot/compressed: Simplify IDT/GDT preserve/restore in the EFI thunk
        x86/boot/compressed, efi: Merge multiple definitions of image_offset into one
        x86/boot/compressed: Move efi32_pe_entry() out of head_64.S
        x86/boot/compressed: Move efi32_entry out of head_64.S
        x86/boot/compressed: Move efi32_pe_entry into .text section
        x86/boot/compressed: Move bootargs parsing out of 32-bit startup code
        x86/boot/compressed: Move 32-bit entrypoint code into .text section
        x86/boot/compressed: Rename efi_thunk_64.S to efi-mixed.S
      4eb77fa1
    • Linus Torvalds's avatar
      Merge tag 'x86_asm_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8b9ed79c
      Linus Torvalds authored
      Pull x86 asm updates from Borislav Petkov:
      
       - Move the 32-bit memmove() asm implementation out-of-line in order to
         fix a 32-bit full LTO build failure with clang where it would fail at
         register allocation.
      
         Move it to an asm file and clean it up while at it, similar to what
         has been already done on 64-bit
      
      * tag 'x86_asm_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mem: Move memmove to out of line assembler
      8b9ed79c
    • Linus Torvalds's avatar
      Merge tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · fc4c9f45
      Linus Torvalds authored
      Pull EFI updates from Ard Biesheuvel:
       "Another fairly sizable pull request, by EFI subsystem standards.
      
        Most of the work was done by me, some of it in collaboration with the
        distro and bootloader folks (GRUB, systemd-boot), where the main focus
        has been on removing pointless per-arch differences in the way EFI
        boots a Linux kernel.
      
         - Refactor the zboot code so that it incorporates all the EFI stub
           logic, rather than calling the decompressed kernel as a EFI app.
      
         - Add support for initrd= command line option to x86 mixed mode.
      
         - Allow initrd= to be used with arbitrary EFI accessible file systems
           instead of just the one the kernel itself was loaded from.
      
         - Move some x86-only handling and manipulation of the EFI memory map
           into arch/x86, as it is not used anywhere else.
      
         - More flexible handling of any random seeds provided by the boot
           environment (i.e., systemd-boot) so that it becomes available much
           earlier during the boot.
      
         - Allow improved arch-agnostic EFI support in loaders, by setting a
           uniform baseline of supported features, and adding a generic magic
           number to the DOS/PE header. This should allow loaders such as GRUB
           or systemd-boot to reduce the amount of arch-specific handling
           substantially.
      
         - (arm64) Run EFI runtime services from a dedicated stack, and use it
           to recover from synchronous exceptions that might occur in the
           firmware code.
      
         - (arm64) Ensure that we don't allocate memory outside of the 48-bit
           addressable physical range.
      
         - Make EFI pstore record size configurable
      
         - Add support for decoding CXL specific CPER records"
      
      * tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (43 commits)
        arm64: efi: Recover from synchronous exceptions occurring in firmware
        arm64: efi: Execute runtime services from a dedicated stack
        arm64: efi: Limit allocations to 48-bit addressable physical region
        efi: Put Linux specific magic number in the DOS header
        efi: libstub: Always enable initrd command line loader and bump version
        efi: stub: use random seed from EFI variable
        efi: vars: prohibit reading random seed variables
        efi: random: combine bootloader provided RNG seed with RNG protocol output
        efi/cper, cxl: Decode CXL Error Log
        efi/cper, cxl: Decode CXL Protocol Error Section
        efi: libstub: fix efi_load_initrd_dev_path() kernel-doc comment
        efi: x86: Move EFI runtime map sysfs code to arch/x86
        efi: runtime-maps: Clarify purpose and enable by default for kexec
        efi: pstore: Add module parameter for setting the record size
        efi: xen: Set EFI_PARAVIRT for Xen dom0 boot on all architectures
        efi: memmap: Move manipulation routines into x86 arch tree
        efi: memmap: Move EFI fake memmap support into x86 arch tree
        efi: libstub: Undeprecate the command line initrd loader
        efi: libstub: Add mixed mode support to command line initrd loader
        efi: libstub: Permit mixed mode return types other than efi_status_t
        ...
      fc4c9f45
    • Linus Torvalds's avatar
      Merge tag 'integrity-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity · 717e6eb4
      Linus Torvalds authored
      Pull integrity updates from Mimi Zohar:
       "Aside from the one cleanup, the other changes are bug fixes:
      
        Cleanup:
      
         - Include missing iMac Pro 2017 in list of Macs with T2 security chip
      
        Bug fixes:
      
         - Improper instantiation of "encrypted" keys with user provided data
      
         - Not handling delay in updating LSM label based IMA policy rules
           (-ESTALE)
      
         - IMA and integrity memory leaks on error paths
      
         - CONFIG_IMA_DEFAULT_HASH_SM3 hash algorithm renamed"
      
      * tag 'integrity-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
        ima: Fix hash dependency to correct algorithm
        ima: Fix misuse of dereference of pointer in template_desc_init_fields()
        integrity: Fix memory leakage in keyring allocation error path
        ima: Fix memory leak in __ima_inode_hash()
        ima: Handle -ESTALE returned by ima_filter_rule_match()
        ima: Simplify ima_lsm_copy_rule
        ima: Fix a potential NULL pointer access in ima_restore_measurement_list
        efi: Add iMac Pro 2017 to uefi skip cert quirk
        KEYS: encrypted: fix key instantiation with user-provided data
      717e6eb4
    • Linus Torvalds's avatar
      Merge tag 'sysctl-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux · 8fa37a68
      Linus Torvalds authored
      Pull sysctl updates from Luis Chamberlain:
       "Only a small step forward on the sysctl cleanups for this cycle"
      
      * tag 'sysctl-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
        sched: Move numa_balancing sysctls to its own file
      8fa37a68
    • Linus Torvalds's avatar
      Merge tag 'modules-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux · 3ba2c3ff
      Linus Torvalds authored
      Pull modules updates from Luis Chamberlain:
       "Tux gets for xmas an improvement to the average lookup performance of
        kallsyms_lookup_name() by 715x thanks to the work by Zhen Lei, which
        upgraded our old implementation from being O(n) to O(log(n)), while
        also retaining the old implementation support on /proc/kallsyms.
      
        The only penalty was increasing the memory footprint by 3 *
        kallsyms_num_syms. Folks who want to improve this further now also
        have a dedicated selftest facility through KALLSYMS_SELFTEST.
      
        Stephen Boyd added zstd in-kernel decompression support, but the only
        users of this would be folks using the load-pin LSM because otherwise
        we do module decompression in userspace.
      
        The only other thing with mentioning is a minor boot time optimization
        by Rasmus Villemoes which deferes param_sysfs_init() to late init. The
        rest is cleanups and minor fixes"
      
      * tag 'modules-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
        livepatch: Call klp_match_callback() in klp_find_callback() to avoid code duplication
        module/decompress: Support zstd in-kernel decompression
        kallsyms: Remove unneeded semicolon
        kallsyms: Add self-test facility
        livepatch: Use kallsyms_on_each_match_symbol() to improve performance
        kallsyms: Add helper kallsyms_on_each_match_symbol()
        kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[]
        kallsyms: Correctly sequence symbols when CONFIG_LTO_CLANG=y
        kallsyms: Improve the performance of kallsyms_lookup_name()
        scripts/kallsyms: rename build_initial_tok_table()
        module: Fix NULL vs IS_ERR checking for module_get_next_page
        kernel/params.c: defer most of param_sysfs_init() to late_initcall time
        module: Remove unused macros module_addr_min/max
        module: remove redundant module_sysfs_initialized variable
      3ba2c3ff
    • Linus Torvalds's avatar
      Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 0015edd6
      Linus Torvalds authored
      Pull clk driver updates from Stephen Boyd:
       "A pile of clk driver updates with a small tracepoint patch to the clk
        core this time around.
      
        The core framework is effectively unchanged, with the majority of the
        diff going to the Qualcomm clk driver directory because they added two
        3k line files that are almost all clk data (Abel Vesa from Linaro
        tried to shrink the number of lines down, but it doesn't seem to be
        possible without sacrificing readability).
      
        The second big driver this time around is the Rockchip rk3588 clk and
        reset unit, at _only_ 2.5k lines.
      
        Ignoring the big clk drivers from the familiar SoC vendors, there's
        just a bunch of little clk driver updates and fixes throughout here.
      
        It's the usual set of clk data fixups to describe proper parents, or
        add frequencies to frequency tables, or plug memory leaks when
        function calls fail. Also, some drivers are converted to use modern
        clk_hw APIs, which is always nice to see. And data is deduplicated,
        leading to a smaller kernel Image.
      
        Overall this batch has a larger collection of cleanups than it
        typically does. Maybe that means there are less new SoCs right now
        that need supporting, and the focus has shifted to quality and
        reliability. I can dream.
      
        New Drivers:
         - Frequency hopping controller hardware on MediaTek MT8186
         - Global clock controller for Qualcomm SM8550
         - Display clock controller for Qualcomm SC8280XP
         - RPMh clock controller for Qualcomm QDU1000 and QRU1000 SoCs
         - CPU PLL on MStar/SigmaStar SoCs
         - Support for the clock and reset unit of the Rockchip rk3588
      
        Updates:
         - Tracepoints for clk_rate_request structures
         - Debugfs support for fractional divider clk
         - Make MxL's CGU driver secure compatible
         - Ingenic JZ4755 SoC clk support
         - Support audio clks on X1000 SoCs
         - Remove flags from univ/main/syspll child fixed factor clocks across
           MediaTek platforms
         - Fix clock dependency for ADC on MediaTek MT7986
         - Fix parent for FlexSPI clock for i.MX93
         - Add USB suspend clock on i.MX8MP
         - Unmap anatop base on error for i.MX93 driver
         - Change enet clock parent to wakeup_axi_root for i.MX93
         - Drop LPIT1, LPIT2, TPM1 and TPM3 clocks for i.MX93
         - Mark HSIO bus clock and SYS_CNT clock as critical on i.MX93
         - Add 320MHz and 640MHz entries to PLL146x
         - Add audio shared gate and SAI clocks for i.MX8MP
         - Fix a possible memory leak in the error path of rockchip PLL
           creation
         - Fix header guard for V3S clocks
         - Add IR module clock for f1c100s
         - Correct the parent clocks for the (High Speed) Serial Communication
           Interfaces with FIFO ((H)SCIF) modules and the mixed-up Ethernet
           Switch clocks on Renesas R-Car S4-8
         - Add timer (TMU, CMT) and Cortex-A76 CPU core (Z0) clocks on Renesas
           R-Car V4H
         - Two PLL driver fixups for the Amlogic clk driver
         - Round SD clock rate to improve parent clock selection
         - Add Ethernet Switch and internal SASYNCPER clocks on Renesas R-Car
           S4-8
         - Add DMA (SYS-DMAC), SPI (MSIOF), external interrupt (INTC-EX)
           serial (SCIF), PWM (PWM and TPU), SDHI, and HyperFLASH/QSPI
           (RPC-IF) clocks on Renesas R-Car V4H
         - Add Multi-Function Timer Pulse Unit (MTU3a) clock and reset on
           Renesas RZ/G2L
         - Fix endless loop on Renesas RZ/N1
         - Correct the parent clocks for the High Speed Serial Communication
           Interfaces with FIFO (HSCIF) modules on the Renesas R-Car V4H SoC
           Note: HSCIF0 is used for the serial console on the White-Hawk
           development board
         - Various clk DT binding improvements and conversions to YAML
         - Qualcomm SM8150/SM8250 display clock controller cleaned up
         - Some missing clocks for Qualcomm SM8350 added
         - Qualcomm MSM8974 Global and Multimedia clock controllers
           transitioned to parent_data and parent_hws
         - Use parent_data and add network resets for Qualcomm IPQ8074
         - Qualcomm Krait clock controller modernized
         - Fix pm_runtime usage in Qualcomm SC7180 and SC7280 LPASS clock
           controllers
         - Enable retention mode on Qualcomm SM8250 USB GDSCs
         - Cleanup Qualcomm RPM and RPMh clock drivers to avoid duplicating
           clocks which definition could be shared between platforms
         - Various NULL pointer checks added for allocations"
      
      * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (188 commits)
        clk: nomadik: correct struct name kernel-doc warning
        clk: lmk04832: fix kernel-doc warnings
        clk: lmk04832: drop superfluous #include
        clk: lmk04832: drop unnecessary semicolons
        clk: lmk04832: declare variables as const when possible
        clk: socfpga: Fix memory leak in socfpga_gate_init()
        clk: microchip: enable the MPFS clk driver by default if SOC_MICROCHIP_POLARFIRE
        clk: st: Fix memory leak in st_of_quadfs_setup()
        clk: samsung: Fix memory leak in _samsung_clk_register_pll()
        clk: Add trace events for rate requests
        clk: Store clk_core for clk_rate_request
        clk: qcom: rpmh: add support for SM6350 rpmh IPA clock
        clk: qcom: mmcc-msm8974: use parent_hws/_data instead of parent_names
        clk: qcom: mmcc-msm8974: move clock parent tables down
        clk: qcom: mmcc-msm8974: use ARRAY_SIZE instead of specifying num_parents
        clk: qcom: gcc-msm8974: use parent_hws/_data instead of parent_names
        clk: qcom: gcc-msm8974: move clock parent tables down
        clk: qcom: gcc-msm8974: use ARRAY_SIZE instead of specifying num_parents
        dt-bindings: clocks: qcom,mmcc: define clocks/clock-names for MSM8974
        dt-bindings: clock: split qcom,gcc-msm8974,-msm8226 to the separate file
        ...
      0015edd6