1. 28 Sep, 2022 2 commits
    • Jason A. Donenfeld's avatar
      random: use expired timer rather than wq for mixing fast pool · 748bc4dd
      Jason A. Donenfeld authored
      Previously, the fast pool was dumped into the main pool periodically in
      the fast pool's hard IRQ handler. This worked fine and there weren't
      problems with it, until RT came around. Since RT converts spinlocks into
      sleeping locks, problems cropped up. Rather than switching to raw
      spinlocks, the RT developers preferred we make the transformation from
      originally doing:
      
          do_some_stuff()
          spin_lock()
          do_some_other_stuff()
          spin_unlock()
      
      to doing:
      
          do_some_stuff()
          queue_work_on(some_other_stuff_worker)
      
      This is an ordinary pattern done all over the kernel. However, Sherry
      noticed a 10% performance regression in qperf TCP over a 40gbps
      InfiniBand card. Quoting her message:
      
      > MT27500 Family [ConnectX-3] cards:
      > Infiniband device 'mlx4_0' port 1 status:
      > default gid: fe80:0000:0000:0000:0010:e000:0178:9eb1
      > base lid: 0x6
      > sm lid: 0x1
      > state: 4: ACTIVE
      > phys state: 5: LinkUp
      > rate: 40 Gb/sec (4X QDR)
      > link_layer: InfiniBand
      >
      > Cards are configured with IP addresses on private subnet for IPoIB
      > performance testing.
      > Regression identified in this bug is in TCP latency in this stack as reported
      > by qperf tcp_lat metric:
      >
      > We have one system listen as a qperf server:
      > [root@yourQperfServer ~]# qperf
      >
      > Have the other system connect to qperf server as a client (in this
      > case, it’s X7 server with Mellanox card):
      > [root@yourQperfClient ~]# numactl -m0 -N0 qperf 20.20.20.101 -v -uu -ub --time 60 --wait_server 20 -oo msg_size:4K:1024K:*2 tcp_lat
      
      Rather than incur the scheduling latency from queue_work_on, we can
      instead switch to running on the next timer tick, on the same core. This
      also batches things a bit more -- once per jiffy -- which is okay now
      that mix_interrupt_randomness() can credit multiple bits at once.
      Reported-by: default avatarSherry Yang <sherry.yang@oracle.com>
      Tested-by: default avatarPaul Webb <paul.x.webb@oracle.com>
      Cc: Sherry Yang <sherry.yang@oracle.com>
      Cc: Phillip Goerl <phillip.goerl@oracle.com>
      Cc: Jack Vogel <jack.vogel@oracle.com>
      Cc: Nicky Veitch <nicky.veitch@oracle.com>
      Cc: Colm Harrington <colm.harrington@oracle.com>
      Cc: Ramanan Govindarajan <ramanan.govindarajan@oracle.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Sultan Alsawaf <sultan@kerneltoast.com>
      Cc: stable@vger.kernel.org
      Fixes: 58340f8e ("random: defer fast pool mixing to worker")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      748bc4dd
    • Jason A. Donenfeld's avatar
      random: avoid reading two cache lines on irq randomness · 9ee0507e
      Jason A. Donenfeld authored
      In order to avoid reading and dirtying two cache lines on every IRQ,
      move the work_struct to the bottom of the fast_pool struct. add_
      interrupt_randomness() always touches .pool and .count, which are
      currently split, because .mix pushes everything down. Instead, move .mix
      to the bottom, so that .pool and .count are always in the first cache
      line, since .mix is only accessed when the pool is full.
      
      Fixes: 58340f8e ("random: defer fast pool mixing to worker")
      Reviewed-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      9ee0507e
  2. 23 Sep, 2022 4 commits
  3. 22 Sep, 2022 20 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 504c25cb
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from wifi, netfilter and can.
      
        A handful of awaited fixes here - revert of the FEC changes, bluetooth
        fix, fixes for iwlwifi spew.
      
        We added a warning in PHY/MDIO code which is triggering on a couple of
        platforms in a false-positive-ish way. If we can't iron that out over
        the week we'll drop it and re-add for 6.1.
      
        I've added a new "follow up fixes" section for fixes to fixes in
        6.0-rcs but it may actually give the false impression that those are
        problematic or that more testing time would have caught them. So
        likely a one time thing.
      
        Follow up fixes:
      
         - nf_tables_addchain: fix nft_counters_enabled underflow
      
         - ebtables: fix memory leak when blob is malformed
      
         - nf_ct_ftp: fix deadlock when nat rewrite is needed
      
        Current release - regressions:
      
         - Revert "fec: Restart PPS after link state change" and the related
           "net: fec: Use a spinlock to guard `fep->ptp_clk_on`"
      
         - Bluetooth: fix HCIGETDEVINFO regression
      
         - wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2
      
         - mptcp: fix fwd memory accounting on coalesce
      
         - rwlock removal fall out:
            - ipmr: always call ip{,6}_mr_forward() from RCU read-side
              critical section
            - ipv6: fix crash when IPv6 is administratively disabled
      
         - tcp: read multiple skbs in tcp_read_skb()
      
         - mdio_bus_phy_resume state warning fallout:
            - eth: ravb: fix PHY state warning splat during system resume
            - eth: sh_eth: fix PHY state warning splat during system resume
      
        Current release - new code bugs:
      
         - wifi: iwlwifi: don't spam logs with NSS>2 messages
      
         - eth: mtk_eth_soc: enable XDP support just for MT7986 SoC
      
        Previous releases - regressions:
      
         - bonding: fix NULL deref in bond_rr_gen_slave_id
      
         - wifi: iwlwifi: mark IWLMEI as broken
      
        Previous releases - always broken:
      
         - nf_conntrack helpers:
            - irc: tighten matching on DCC message
            - sip: fix ct_sip_walk_headers
            - osf: fix possible bogus match in nf_osf_find()
      
         - ipvlan: fix out-of-bound bugs caused by unset skb->mac_header
      
         - core: fix flow symmetric hash
      
         - bonding, team: unsync device addresses on ndo_stop
      
         - phy: micrel: fix shared interrupt on LAN8814"
      
      * tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
        selftests: forwarding: add shebang for sch_red.sh
        bnxt: prevent skb UAF after handing over to PTP worker
        net: marvell: Fix refcounting bugs in prestera_port_sfp_bind()
        net: sched: fix possible refcount leak in tc_new_tfilter()
        net: sunhme: Fix packet reception for len < RX_COPY_THRESHOLD
        udp: Use WARN_ON_ONCE() in udp_read_skb()
        selftests: bonding: cause oops in bond_rr_gen_slave_id
        bonding: fix NULL deref in bond_rr_gen_slave_id
        net: phy: micrel: fix shared interrupt on LAN8814
        net/smc: Stop the CLC flow if no link to map buffers on
        ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient
        net: atlantic: fix potential memory leak in aq_ndev_close()
        can: gs_usb: gs_usb_set_phys_id(): return with error if identify is not supported
        can: gs_usb: gs_can_open(): fix race dev->can.state condition
        can: flexcan: flexcan_mailbox_read() fix return value for drop = true
        net: sh_eth: Fix PHY state warning splat during system resume
        net: ravb: Fix PHY state warning splat during system resume
        netfilter: nf_ct_ftp: fix deadlock when nat rewrite is needed
        netfilter: ebtables: fix memory leak when blob is malformed
        netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain()
        ...
      504c25cb
    • Linus Torvalds's avatar
      Merge tag 'efi-urgent-for-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · 129e7152
      Linus Torvalds authored
      Pull EFI fixes from Ard Biesheuvel:
      
       - Use the right variable to check for shim insecure mode
      
       - Wipe setup_data field when booting via EFI
      
       - Add missing error check to efibc driver
      
      * tag 'efi-urgent-for-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        efi: libstub: check Shim mode using MokSBStateRT
        efi: x86: Wipe setup_data on pure EFI boot
        efi: efibc: Guard against allocation failure
      129e7152
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 5e0a93e4
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
      
       - fix a NULL-pointer dereference at driver unbind and a potential
         resource leak in error path in gpio-mockup
      
       - make the irqchip immutable in gpio-ftgpio010
      
       - fix dereferencing a potentially uninitialized variable in gpio-tqmx86
      
       - fix interrupt registering in gpiolib's character device code
      
      * tag 'gpio-fixes-for-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpiolib: cdev: Set lineevent_state::irq after IRQ register successfully
        gpio: tqmx86: fix uninitialized variable girq
        gpio: ftgpio010: Make irqchip immutable
        gpio: mockup: Fix potential resource leakage when register a chip
        gpio: mockup: fix NULL pointer dereference when removing debugfs
      5e0a93e4
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v6.0-2022-09-21' of... · 9597f088
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v6.0-2022-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Fix polling of system-wide events related to mixing per-cpu and
         per-thread events.
      
       - Do not check if /proc/modules is unchanged when copying /proc/kcore,
         that doesn't get in the way of post processing analysis.
      
       - Include program header in ELF files generated for JIT files, so that
         they can be opened by tools using elfutils libraries.
      
       - Enter namespaces when synthesizing build-ids.
      
       - Fix some bugs related to a recent cpu_map overhaul where we should be
         using an index and not the cpu number.
      
       - Fix BPF program ELF section name, using the naming expected by libbpf
         when using BPF counters in 'perf stat'.
      
       - Add a new test for perf stat cgroup BPF counter.
      
       - Adjust check on 'perf test wp' for older kernels, where the
         PERF_EVENT_IOC_MODIFY_ATTRIBUTES ioctl isn't supported.
      
       - Sync x86 cpufeatures with the kernel sources, no changes in tooling.
      
      * tag 'perf-tools-fixes-for-v6.0-2022-09-21' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        perf tools: Honor namespace when synthesizing build-ids
        tools headers cpufeatures: Sync with the kernel sources
        perf kcore_copy: Do not check /proc/modules is unchanged
        libperf evlist: Fix polling of system-wide events
        perf record: Fix cpu mask bit setting for mixed mmaps
        perf test: Skip wp modify test on old kernels
        perf jit: Include program header in ELF files
        perf test: Add a new test for perf stat cgroup BPF counter
        perf stat: Use evsel->core.cpus to iterate cpus in BPF cgroup counters
        perf stat: Fix cpu map index in bperf cgroup code
        perf stat: Fix BPF program section name
      9597f088
    • Hangbin Liu's avatar
      selftests: forwarding: add shebang for sch_red.sh · 83e4b196
      Hangbin Liu authored
      RHEL/Fedora RPM build checks are stricter, and complain when executable
      files don't have a shebang line, e.g.
      
      *** WARNING: ./kselftests/net/forwarding/sch_red.sh is executable but has no shebang, removing executable bit
      
      Fix it by adding shebang line.
      
      Fixes: 6cf0291f ("selftests: forwarding: Add a RED test for SW datapath")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Link: https://lore.kernel.org/r/20220922024453.437757-1-liuhangbin@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      83e4b196
    • Jakub Kicinski's avatar
      bnxt: prevent skb UAF after handing over to PTP worker · c31f26c8
      Jakub Kicinski authored
      When reading the timestamp is required bnxt_tx_int() hands
      over the ownership of the completed skb to the PTP worker.
      The skb should not be used afterwards, as the worker may
      run before the rest of our code and free the skb, leading
      to a use-after-free.
      
      Since dev_kfree_skb_any() accepts NULL make the loss of
      ownership more obvious and set skb to NULL.
      
      Fixes: 83bb623c ("bnxt_en: Transmit and retrieve packet timestamps")
      Reviewed-by: default avatarAndy Gospodarek <gospo@broadcom.com>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Link: https://lore.kernel.org/r/20220921201005.335390-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c31f26c8
    • Liang He's avatar
      net: marvell: Fix refcounting bugs in prestera_port_sfp_bind() · 3aac7ada
      Liang He authored
      In prestera_port_sfp_bind(), there are two refcounting bugs:
      (1) we should call of_node_get() before of_find_node_by_name() as
      it will automaitcally decrease the refcount of 'from' argument;
      (2) we should call of_node_put() for the break of the iteration
      for_each_child_of_node() as it will automatically increase and
      decrease the 'child'.
      
      Fixes: 52323ef7 ("net: marvell: prestera: add phylink support")
      Signed-off-by: default avatarLiang He <windhl@126.com>
      Reviewed-by: default avatarYevhen Orlov <yevhen.orlov@plvision.eu>
      Link: https://lore.kernel.org/r/20220921133245.4111672-1-windhl@126.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3aac7ada
    • Hangyu Hua's avatar
      net: sched: fix possible refcount leak in tc_new_tfilter() · c2e1cfef
      Hangyu Hua authored
      tfilter_put need to be called to put the refount got by tp->ops->get to
      avoid possible refcount leak when chain->tmplt_ops != NULL and
      chain->tmplt_ops != tp->ops.
      
      Fixes: 7d5509fa ("net: sched: extend proto ops with 'put' callback")
      Signed-off-by: default avatarHangyu Hua <hbh25y@gmail.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Link: https://lore.kernel.org/r/20220921092734.31700-1-hbh25y@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c2e1cfef
    • Sean Anderson's avatar
      net: sunhme: Fix packet reception for len < RX_COPY_THRESHOLD · 878e2405
      Sean Anderson authored
      There is a separate receive path for small packets (under 256 bytes).
      Instead of allocating a new dma-capable skb to be used for the next packet,
      this path allocates a skb and copies the data into it (reusing the existing
      sbk for the next packet). There are two bytes of junk data at the beginning
      of every packet. I believe these are inserted in order to allow aligned DMA
      and IP headers. We skip over them using skb_reserve. Before copying over
      the data, we must use a barrier to ensure we see the whole packet. The
      current code only synchronizes len bytes, starting from the beginning of
      the packet, including the junk bytes. However, this leaves off the final
      two bytes in the packet. Synchronize the whole packet.
      
      To reproduce this problem, ping a HME with a payload size between 17 and
      214
      
      	$ ping -s 17 <hme_address>
      
      which will complain rather loudly about the data mismatch. Small packets
      (below 60 bytes on the wire) do not have this issue. I suspect this is
      related to the padding added to increase the minimum packet size.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarSean Anderson <seanga2@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20220920235018.1675956-1-seanga2@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      878e2405
    • Peilin Ye's avatar
      udp: Use WARN_ON_ONCE() in udp_read_skb() · db39dfdc
      Peilin Ye authored
      Prevent udp_read_skb() from flooding the syslog.
      Suggested-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Link: https://lore.kernel.org/r/20220921005915.2697-1-yepeilin.cs@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      db39dfdc
    • Jakub Kicinski's avatar
      Merge branch 'bonding-fix-null-deref-in-bond_rr_gen_slave_id' · c5da4b68
      Jakub Kicinski authored
      Jonathan Toppins says:
      
      ====================
      bonding: fix NULL deref in bond_rr_gen_slave_id
      
      Fix a NULL dereference of the struct bonding.rr_tx_counter member because
      if a bond is initially created with an initial mode != zero (Round Robin)
      the memory required for the counter is never created and when the mode is
      changed there is never any attempt to verify the memory is allocated upon
      switching modes.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1663694476.git.jtoppins@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c5da4b68
    • Jonathan Toppins's avatar
      selftests: bonding: cause oops in bond_rr_gen_slave_id · 2ffd5732
      Jonathan Toppins authored
      This bonding selftest used to cause a kernel oops on aarch64
      and should be architectures agnostic.
      Signed-off-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2ffd5732
    • Jonathan Toppins's avatar
      bonding: fix NULL deref in bond_rr_gen_slave_id · 0e400d60
      Jonathan Toppins authored
      Fix a NULL dereference of the struct bonding.rr_tx_counter member because
      if a bond is initially created with an initial mode != zero (Round Robin)
      the memory required for the counter is never created and when the mode is
      changed there is never any attempt to verify the memory is allocated upon
      switching modes.
      
      This causes the following Oops on an aarch64 machine:
          [  334.686773] Unable to handle kernel paging request at virtual address ffff2c91ac905000
          [  334.694703] Mem abort info:
          [  334.697486]   ESR = 0x0000000096000004
          [  334.701234]   EC = 0x25: DABT (current EL), IL = 32 bits
          [  334.706536]   SET = 0, FnV = 0
          [  334.709579]   EA = 0, S1PTW = 0
          [  334.712719]   FSC = 0x04: level 0 translation fault
          [  334.717586] Data abort info:
          [  334.720454]   ISV = 0, ISS = 0x00000004
          [  334.724288]   CM = 0, WnR = 0
          [  334.727244] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000008044d662000
          [  334.733944] [ffff2c91ac905000] pgd=0000000000000000, p4d=0000000000000000
          [  334.740734] Internal error: Oops: 96000004 [#1] SMP
          [  334.745602] Modules linked in: bonding tls veth rfkill sunrpc arm_spe_pmu vfat fat acpi_ipmi ipmi_ssif ixgbe igb i40e mdio ipmi_devintf ipmi_msghandler arm_cmn arm_dsu_pmu cppc_cpufreq acpi_tad fuse zram crct10dif_ce ast ghash_ce sbsa_gwdt nvme drm_vram_helper drm_ttm_helper nvme_core ttm xgene_hwmon
          [  334.772217] CPU: 7 PID: 2214 Comm: ping Not tainted 6.0.0-rc4-00133-g64ae13ed #4
          [  334.779950] Hardware name: GIGABYTE R272-P31-00/MP32-AR1-00, BIOS F18v (SCP: 1.08.20211002) 12/01/2021
          [  334.789244] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
          [  334.796196] pc : bond_rr_gen_slave_id+0x40/0x124 [bonding]
          [  334.801691] lr : bond_xmit_roundrobin_slave_get+0x38/0xdc [bonding]
          [  334.807962] sp : ffff8000221733e0
          [  334.811265] x29: ffff8000221733e0 x28: ffffdbac8572d198 x27: ffff80002217357c
          [  334.818392] x26: 000000000000002a x25: ffffdbacb33ee000 x24: ffff07ff980fa000
          [  334.825519] x23: ffffdbacb2e398ba x22: ffff07ff98102000 x21: ffff07ff981029c0
          [  334.832646] x20: 0000000000000001 x19: ffff07ff981029c0 x18: 0000000000000014
          [  334.839773] x17: 0000000000000000 x16: ffffdbacb1004364 x15: 0000aaaabe2f5a62
          [  334.846899] x14: ffff07ff8e55d968 x13: ffff07ff8e55db30 x12: 0000000000000000
          [  334.854026] x11: ffffdbacb21532e8 x10: 0000000000000001 x9 : ffffdbac857178ec
          [  334.861153] x8 : ffff07ff9f6e5a28 x7 : 0000000000000000 x6 : 000000007c2b3742
          [  334.868279] x5 : ffff2c91ac905000 x4 : ffff2c91ac905000 x3 : ffff07ff9f554400
          [  334.875406] x2 : ffff2c91ac905000 x1 : 0000000000000001 x0 : ffff07ff981029c0
          [  334.882532] Call trace:
          [  334.884967]  bond_rr_gen_slave_id+0x40/0x124 [bonding]
          [  334.890109]  bond_xmit_roundrobin_slave_get+0x38/0xdc [bonding]
          [  334.896033]  __bond_start_xmit+0x128/0x3a0 [bonding]
          [  334.901001]  bond_start_xmit+0x54/0xb0 [bonding]
          [  334.905622]  dev_hard_start_xmit+0xb4/0x220
          [  334.909798]  __dev_queue_xmit+0x1a0/0x720
          [  334.913799]  arp_xmit+0x3c/0xbc
          [  334.916932]  arp_send_dst+0x98/0xd0
          [  334.920410]  arp_solicit+0xe8/0x230
          [  334.923888]  neigh_probe+0x60/0xb0
          [  334.927279]  __neigh_event_send+0x3b0/0x470
          [  334.931453]  neigh_resolve_output+0x70/0x90
          [  334.935626]  ip_finish_output2+0x158/0x514
          [  334.939714]  __ip_finish_output+0xac/0x1a4
          [  334.943800]  ip_finish_output+0x40/0xfc
          [  334.947626]  ip_output+0xf8/0x1a4
          [  334.950931]  ip_send_skb+0x5c/0x100
          [  334.954410]  ip_push_pending_frames+0x3c/0x60
          [  334.958758]  raw_sendmsg+0x458/0x6d0
          [  334.962325]  inet_sendmsg+0x50/0x80
          [  334.965805]  sock_sendmsg+0x60/0x6c
          [  334.969286]  __sys_sendto+0xc8/0x134
          [  334.972853]  __arm64_sys_sendto+0x34/0x4c
          [  334.976854]  invoke_syscall+0x78/0x100
          [  334.980594]  el0_svc_common.constprop.0+0x4c/0xf4
          [  334.985287]  do_el0_svc+0x38/0x4c
          [  334.988591]  el0_svc+0x34/0x10c
          [  334.991724]  el0t_64_sync_handler+0x11c/0x150
          [  334.996072]  el0t_64_sync+0x190/0x194
          [  334.999726] Code: b9001062 f9403c02 d53cd044 8b040042 (b8210040)
          [  335.005810] ---[ end trace 0000000000000000 ]---
          [  335.010416] Kernel panic - not syncing: Oops: Fatal exception in interrupt
          [  335.017279] SMP: stopping secondary CPUs
          [  335.021374] Kernel Offset: 0x5baca8eb0000 from 0xffff800008000000
          [  335.027456] PHYS_OFFSET: 0x80000000
          [  335.030932] CPU features: 0x0000,0085c029,19805c82
          [  335.035713] Memory Limit: none
          [  335.038756] Rebooting in 180 seconds..
      
      The fix is to allocate the memory in bond_open() which is guaranteed
      to be called before any packets are processed.
      
      Fixes: 848ca918 ("net: bonding: Use per-cpu rr_tx_counter")
      CC: Jussi Maki <joamaki@gmail.com>
      Signed-off-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0e400d60
    • Michael Walle's avatar
      net: phy: micrel: fix shared interrupt on LAN8814 · 2002fbac
      Michael Walle authored
      Since commit ece19502 ("net: phy: micrel: 1588 support for LAN8814
      phy") the handler always returns IRQ_HANDLED, except in an error case.
      Before that commit, the interrupt status register was checked and if
      it was empty, IRQ_NONE was returned. Restore that behavior to play nice
      with the interrupt line being shared with others.
      
      Fixes: ece19502 ("net: phy: micrel: 1588 support for LAN8814 phy")
      Signed-off-by: default avatarMichael Walle <michael@walle.cc>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Reviewed-by: default avatarDivya Koppera <Divya.Koppera@microchip.com>
      Link: https://lore.kernel.org/r/20220920141619.808117-1-michael@walle.ccSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2002fbac
    • Wen Gu's avatar
      net/smc: Stop the CLC flow if no link to map buffers on · e738455b
      Wen Gu authored
      There might be a potential race between SMC-R buffer map and
      link group termination.
      
      smc_smcr_terminate_all()     | smc_connect_rdma()
      --------------------------------------------------------------
                                   | smc_conn_create()
      for links in smcibdev        |
              schedule links down  |
                                   | smc_buf_create()
                                   |  \- smcr_buf_map_usable_links()
                                   |      \- no usable links found,
                                   |         (rmb->mr = NULL)
                                   |
                                   | smc_clc_send_confirm()
                                   |  \- access conn->rmb_desc->mr[]->rkey
                                   |     (panic)
      
      During reboot and IB device module remove, all links will be set
      down and no usable links remain in link groups. In such situation
      smcr_buf_map_usable_links() should return an error and stop the
      CLC flow accessing to uninitialized mr.
      
      Fixes: b9247544 ("net/smc: convert static link ID instances to support multiple links")
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Link: https://lore.kernel.org/r/1663656189-32090-1-git-send-email-guwen@linux.alibaba.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e738455b
    • Ard Biesheuvel's avatar
      efi: libstub: check Shim mode using MokSBStateRT · 5f56a74c
      Ard Biesheuvel authored
      We currently check the MokSBState variable to decide whether we should
      treat UEFI secure boot as being disabled, even if the firmware thinks
      otherwise. This is used by shim to indicate that it is not checking
      signatures on boot images. In the kernel, we use this to relax lockdown
      policies.
      
      However, in cases where shim is not even being used, we don't want this
      variable to interfere with lockdown, given that the variable may be
      non-volatile and therefore persist across a reboot. This means setting
      it once will persistently disable lockdown checks on a given system.
      
      So switch to the mirrored version of this variable, called MokSBStateRT,
      which is supposed to be volatile, and this is something we can check.
      
      Cc: <stable@vger.kernel.org> # v4.19+
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Reviewed-by: default avatarPeter Jones <pjones@redhat.com>
      5f56a74c
    • Ard Biesheuvel's avatar
      efi: x86: Wipe setup_data on pure EFI boot · 63bf28ce
      Ard Biesheuvel authored
      When booting the x86 kernel via EFI using the LoadImage/StartImage boot
      services [as opposed to the deprecated EFI handover protocol], the setup
      header is taken from the image directly, and given that EFI's LoadImage
      has no Linux/x86 specific knowledge regarding struct bootparams or
      struct setup_header, any absolute addresses in the setup header must
      originate from the file and not from a prior loading stage.
      
      Since we cannot generally predict where LoadImage() decides to load an
      image (*), such absolute addresses must be treated as suspect: even if a
      prior boot stage intended to make them point somewhere inside the
      [signed] image, there is no way to validate that, and if they point at
      an arbitrary location in memory, the setup_data nodes will not be
      covered by any signatures or TPM measurements either, and could be made
      to contain an arbitrary sequence of SETUP_xxx nodes, which could
      interfere quite badly with the early x86 boot sequence.
      
      (*) Note that, while LoadImage() does take a buffer/size tuple in
      addition to a device path, which can be used to provide the image
      contents directly, it will re-allocate such images, as the memory
      footprint of an image is generally larger than the PE/COFF file
      representation.
      
      Cc: <stable@vger.kernel.org> # v5.10+
      Link: https://lore.kernel.org/all/20220904165321.1140894-1-Jason@zx2c4.com/Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Acked-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      63bf28ce
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 624aea6b
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-09-20 (ice)
      
      Michal re-sets TC configuration when changing number of queues.
      
      Mateusz moves the check and call for link-down-on-close to the specific
      path for downing/closing the interface.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        ice: Fix interface being down after reset with link-down-on-close flag on
        ice: config netdev tc before setting queues number
      ====================
      
      Link: https://lore.kernel.org/r/20220920205344.1860934-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      624aea6b
    • Larysa Zaremba's avatar
      ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient · 114f398d
      Larysa Zaremba authored
      The original patch added the static branch to handle the situation,
      when assigning an XDP TX queue to every CPU is not possible,
      so they have to be shared.
      
      However, in the XDP transmit handler ice_xdp_xmit(), an error was
      returned in such cases even before static condition was checked,
      thus making queue sharing still impossible.
      
      Fixes: 22bf877e ("ice: introduce XDP_TX fallback path")
      Signed-off-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Link: https://lore.kernel.org/r/20220919134346.25030-1-larysa.zaremba@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      114f398d
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · f64780e3
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-09-19 (iavf, i40e)
      
      Norbert adds checking of buffer size for Rx buffer checks in iavf.
      
      Michal corrects setting of max MTU in iavf to account for MTU data provided
      by PF, fixes i40e to set VF max MTU, and resolves lack of rate limiting
      when value was less than divisor for i40e.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        i40e: Fix set max_tx_rate when it is lower than 1 Mbps
        i40e: Fix VF set max MTU size
        iavf: Fix set max MTU size with port VLAN and jumbo frames
        iavf: Fix bad page state
      ====================
      
      Link: https://lore.kernel.org/r/20220919223428.572091-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f64780e3
  4. 21 Sep, 2022 14 commits