1. 21 Apr, 2023 2 commits
    • Vlad Buslov's avatar
      net/mlx5e: Release the label when replacing existing ct entry · 8ac04a28
      Vlad Buslov authored
      Cited commit doesn't release the label mapping when replacing existing ct
      entry which leads to following memleak report:
      
      unreferenced object 0xffff8881854cf280 (size 96):
        comm "kworker/u48:74", pid 23093, jiffies 4296664564 (age 175.944s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000002722d368>] __kmalloc+0x4b/0x1c0
          [<00000000cc44e18f>] mapping_add+0x6e8/0xc90 [mlx5_core]
          [<000000003ad942a7>] mlx5_get_label_mapping+0x66/0xe0 [mlx5_core]
          [<00000000266308ac>] mlx5_tc_ct_entry_create_mod_hdr+0x1c4/0xf50 [mlx5_core]
          [<000000009a768b4f>] mlx5_tc_ct_entry_add_rule+0x16f/0xaf0 [mlx5_core]
          [<00000000a178f3e5>] mlx5_tc_ct_block_flow_offload_add+0x10cb/0x1f90 [mlx5_core]
          [<000000007b46c496>] mlx5_tc_ct_block_flow_offload+0x14a/0x630 [mlx5_core]
          [<00000000a9a18ac5>] nf_flow_offload_tuple+0x1a3/0x390 [nf_flow_table]
          [<00000000d0881951>] flow_offload_work_handler+0x257/0xd30 [nf_flow_table]
          [<000000009e4935a4>] process_one_work+0x7c2/0x13e0
          [<00000000f5cd36a7>] worker_thread+0x59d/0xec0
          [<00000000baed1daf>] kthread+0x28f/0x330
          [<0000000063d282a4>] ret_from_fork+0x1f/0x30
      
      Fix the issue by correctly releasing the label mapping.
      
      Fixes: 94ceffb4 ("net/mlx5e: Implement CT entry update")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8ac04a28
    • Vlad Buslov's avatar
      net/mlx5e: Don't clone flow post action attributes second time · e9fce818
      Vlad Buslov authored
      The code already clones post action attributes in
      mlx5e_clone_flow_attr_for_post_act(). Creating another copy in
      mlx5e_tc_post_act_add() is a erroneous leftover from original
      implementation. Instead, assign handle->attribute to post_attr provided by
      the caller. Note that cloning the attribute second time is not just
      wasteful but also causes issues like second copy not being properly updated
      in neigh update code which leads to following use-after-free:
      
      Feb 21 09:02:00 c-237-177-40-045 kernel: BUG: KASAN: use-after-free in mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_report+0xbb/0x1a0
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_save_stack+0x1e/0x40
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_set_track+0x21/0x30
      Feb 21 09:02:00 c-237-177-40-045 kernel:  __kasan_kmalloc+0x7a/0x90
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_save_stack+0x1e/0x40
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_set_track+0x21/0x30
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_save_free_info+0x2a/0x40
      Feb 21 09:02:00 c-237-177-40-045 kernel:  ____kasan_slab_free+0x11a/0x1b0
      Feb 21 09:02:00 c-237-177-40-045 kernel: page dumped because: kasan: bad access detected
      Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 8833): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xf2ff71), err(-22)
      Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0 enp8s0f0: Failed to add post action rule
      Feb 21 09:02:00 c-237-177-40-045 kernel: mlx5_core 0000:08:00.0: mlx5e_tc_encap_flows_add:190:(pid 8833): Failed to update flow post acts, -22
      Feb 21 09:02:00 c-237-177-40-045 kernel: Call Trace:
      Feb 21 09:02:00 c-237-177-40-045 kernel:  <TASK>
      Feb 21 09:02:00 c-237-177-40-045 kernel:  dump_stack_lvl+0x57/0x7d
      Feb 21 09:02:00 c-237-177-40-045 kernel:  print_report+0x170/0x471
      Feb 21 09:02:00 c-237-177-40-045 kernel:  ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_report+0xbb/0x1a0
      Feb 21 09:02:00 c-237-177-40-045 kernel:  ? mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5_cmd_set_fte+0x200d/0x24c0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  ? __module_address.part.0+0x62/0x200
      Feb 21 09:02:00 c-237-177-40-045 kernel:  ? mlx5_cmd_stub_create_flow_table+0xd0/0xd0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  ? __raw_spin_lock_init+0x3b/0x110
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5_cmd_create_fte+0x80/0xb0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  add_rule_fg+0xe80/0x19c0 [mlx5_core]
      --
      Feb 21 09:02:00 c-237-177-40-045 kernel: Allocated by task 13476:
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_save_stack+0x1e/0x40
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_set_track+0x21/0x30
      Feb 21 09:02:00 c-237-177-40-045 kernel:  __kasan_kmalloc+0x7a/0x90
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5_packet_reformat_alloc+0x7b/0x230 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_tc_tun_create_header_ipv4+0x977/0xf10 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_attach_encap+0x15b4/0x1e10 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  post_process_attr+0x305/0xa30 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_tc_add_fdb_flow+0x4c0/0xcf0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_configure_flower+0xcaa/0x4b90 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_rep_setup_tc_cls_flower+0x99/0x1b0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_rep_setup_tc_cb+0x133/0x1e0 [mlx5_core]
      --
      Feb 21 09:02:00 c-237-177-40-045 kernel: Freed by task 8833:
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_save_stack+0x1e/0x40
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_set_track+0x21/0x30
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kasan_save_free_info+0x2a/0x40
      Feb 21 09:02:00 c-237-177-40-045 kernel:  ____kasan_slab_free+0x11a/0x1b0
      Feb 21 09:02:00 c-237-177-40-045 kernel:  __kmem_cache_free+0x1de/0x400
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5_packet_reformat_dealloc+0xad/0x100 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_tc_encap_flows_del+0x3c0/0x500 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_rep_update_flows+0x40c/0xa80 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  mlx5e_rep_neigh_update+0x473/0x7a0 [mlx5_core]
      Feb 21 09:02:00 c-237-177-40-045 kernel:  process_one_work+0x7c2/0x1310
      Feb 21 09:02:00 c-237-177-40-045 kernel:  worker_thread+0x59d/0xec0
      Feb 21 09:02:00 c-237-177-40-045 kernel:  kthread+0x28f/0x330
      
      Fixes: 8300f225 ("net/mlx5e: Create new flow attr for multi table actions")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      e9fce818
  2. 20 Apr, 2023 12 commits
  3. 19 Apr, 2023 24 commits
    • Linus Torvalds's avatar
      Merge tag 'spi-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 23990b1a
      Linus Torvalds authored
      Pull spi fix from Mark Brown:
       "A small fix in the error handling for the rockchip driver, ensuring we
        don't leak clock enables if we fail to request the interrupt for the
        device"
      
      * tag 'spi-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: spi-rockchip: Fix missing unwind goto in rockchip_sfc_probe()
      23990b1a
    • Linus Torvalds's avatar
      Merge tag 'regulator-fix-v6.3-rc7' of... · 72b4fb4c
      Linus Torvalds authored
      Merge tag 'regulator-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fixes from Mark Brown:
       "A few driver specific fixes, one build coverage issue and a couple of
        'someone typed in the wrong number' style errors in describing devices
        to the subsystem"
      
      * tag 'regulator-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: sm5703: Fix missing n_voltages for fixed regulators
        regulator: fan53555: Fix wrong TCS_SLEW_MASK
        regulator: fan53555: Explicitly include bits header
      72b4fb4c
    • Christophe JAILLET's avatar
      net: dsa: microchip: ksz8795: Correctly handle huge frame configuration · 3d2f8f1f
      Christophe JAILLET authored
      Because of the logic in place, SW_HUGE_PACKET can never be set.
      (If the first condition is true, then the 2nd one is also true, but is not
      executed)
      
      Change the logic and update each bit individually.
      
      Fixes: 29d1e85f ("net: dsa: microchip: ksz8: add MTU configuration support")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/43107d9e8b5b8b05f0cbd4e1f47a2bb88c8747b2.1681755535.git.christophe.jaillet@wanadoo.frSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3d2f8f1f
    • Andrea Righi's avatar
      rust: allow to use INIT_STACK_ALL_ZERO · d966c3ca
      Andrea Righi authored
      With CONFIG_INIT_STACK_ALL_ZERO enabled, bindgen passes
      -ftrivial-auto-var-init=zero to clang, that triggers the following
      error:
      
       error: '-ftrivial-auto-var-init=zero' hasn't been enabled; enable it at your own peril for benchmarking purpose only with '-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang'
      
      However, this additional option that is currently required by clang is
      deprecated since clang-16 and going to be removed in the future,
      likely with clang-18.
      
      So, make sure bindgen is using this extra option if the major version of
      the libclang used by bindgen is < 16.
      
      In this way we can enable CONFIG_INIT_STACK_ALL_ZERO with CONFIG_RUST
      without triggering any build error.
      
      Link: https://github.com/llvm/llvm-project/issues/44842
      Link: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.0-rc2/clang/docs/ReleaseNotes.rst#deprecated-compiler-flagsSigned-off-by: default avatarAndrea Righi <andrea.righi@canonical.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      [Changed to < 16, added link and reworded]
      Signed-off-by: default avatarMiguel Ojeda <ojeda@kernel.org>
      d966c3ca
    • Andrea Righi's avatar
      rust: fix regexp in scripts/is_rust_module.sh · ccc45054
      Andrea Righi authored
      nm can use "R" or "r" to show read-only data sections, but
      scripts/is_rust_module.sh can only recognize "r", so with some versions
      of binutils it can fail to detect if a module is a Rust module or not.
      
      Right now we're using this script only to determine if we need to skip
      BTF generation (that is disabled globally if CONFIG_RUST is enabled),
      but it's still nice to fix this script to do the proper job.
      
      Moreover, with this patch applied I can also relax the constraint of
      "RUST depends on !DEBUG_INFO_BTF" and build a kernel with Rust and BTF
      enabled at the same time (of course BTF generation is still skipped for
      Rust modules).
      
      [ Miguel: The actual reason is likely to be a change on the Rust
        compiler between 1.61.0 and 1.62.0:
      
          echo '#[used] static S: () = ();' |
              rustup run 1.61.0 rustc --emit=obj --crate-type=lib - &&
              nm rust_out.o
      
          echo '#[used] static S: () = ();' |
              rustup run 1.62.0 rustc --emit=obj --crate-type=lib - &&
              nm rust_out.o
      
        Gives:
      
          0000000000000000 r _ZN8rust_out1S17h48027ce0da975467E
          0000000000000000 R _ZN8rust_out1S17h58e1f3d9c0e97cefE
      
        See https://godbolt.org/z/KE6jneoo4. ]
      Signed-off-by: default avatarAndrea Righi <andrea.righi@canonical.com>
      Reviewed-by: default avatarVincenzo Palazzo <vincenzopalazzodev@gmail.com>
      Reviewed-by: default avatarEric Curtin <ecurtin@redhat.com>
      Reviewed-by: default avatarMartin Rodriguez Reboredo <yakoyoku@gmail.com>
      Signed-off-by: default avatarMiguel Ojeda <ojeda@kernel.org>
      ccc45054
    • Daniel Borkmann's avatar
      bpf: Fix incorrect verifier pruning due to missing register precision taints · 71b547f5
      Daniel Borkmann authored
      Juan Jose et al reported an issue found via fuzzing where the verifier's
      pruning logic prematurely marks a program path as safe.
      
      Consider the following program:
      
         0: (b7) r6 = 1024
         1: (b7) r7 = 0
         2: (b7) r8 = 0
         3: (b7) r9 = -2147483648
         4: (97) r6 %= 1025
         5: (05) goto pc+0
         6: (bd) if r6 <= r9 goto pc+2
         7: (97) r6 %= 1
         8: (b7) r9 = 0
         9: (bd) if r6 <= r9 goto pc+1
        10: (b7) r6 = 0
        11: (b7) r0 = 0
        12: (63) *(u32 *)(r10 -4) = r0
        13: (18) r4 = 0xffff888103693400 // map_ptr(ks=4,vs=48)
        15: (bf) r1 = r4
        16: (bf) r2 = r10
        17: (07) r2 += -4
        18: (85) call bpf_map_lookup_elem#1
        19: (55) if r0 != 0x0 goto pc+1
        20: (95) exit
        21: (77) r6 >>= 10
        22: (27) r6 *= 8192
        23: (bf) r1 = r0
        24: (0f) r0 += r6
        25: (79) r3 = *(u64 *)(r0 +0)
        26: (7b) *(u64 *)(r1 +0) = r3
        27: (95) exit
      
      The verifier treats this as safe, leading to oob read/write access due
      to an incorrect verifier conclusion:
      
        func#0 @0
        0: R1=ctx(off=0,imm=0) R10=fp0
        0: (b7) r6 = 1024                     ; R6_w=1024
        1: (b7) r7 = 0                        ; R7_w=0
        2: (b7) r8 = 0                        ; R8_w=0
        3: (b7) r9 = -2147483648              ; R9_w=-2147483648
        4: (97) r6 %= 1025                    ; R6_w=scalar()
        5: (05) goto pc+0
        6: (bd) if r6 <= r9 goto pc+2         ; R6_w=scalar(umin=18446744071562067969,var_off=(0xffffffff00000000; 0xffffffff)) R9_w=-2147483648
        7: (97) r6 %= 1                       ; R6_w=scalar()
        8: (b7) r9 = 0                        ; R9=0
        9: (bd) if r6 <= r9 goto pc+1         ; R6=scalar(umin=1) R9=0
        10: (b7) r6 = 0                       ; R6_w=0
        11: (b7) r0 = 0                       ; R0_w=0
        12: (63) *(u32 *)(r10 -4) = r0
        last_idx 12 first_idx 9
        regs=1 stack=0 before 11: (b7) r0 = 0
        13: R0_w=0 R10=fp0 fp-8=0000????
        13: (18) r4 = 0xffff8ad3886c2a00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
        17: (07) r2 += -4                     ; R2_w=fp-4
        18: (85) call bpf_map_lookup_elem#1   ; R0=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0)
        19: (55) if r0 != 0x0 goto pc+1       ; R0=0
        20: (95) exit
      
        from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
        21: (77) r6 >>= 10                    ; R6_w=0
        22: (27) r6 *= 8192                   ; R6_w=0
        23: (bf) r1 = r0                      ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0)
        24: (0f) r0 += r6
        last_idx 24 first_idx 19
        regs=40 stack=0 before 23: (bf) r1 = r0
        regs=40 stack=0 before 22: (27) r6 *= 8192
        regs=40 stack=0 before 21: (77) r6 >>= 10
        regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1
        parent didn't have regs=40 stack=0 marks: R0_rw=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) R6_rw=P0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
        last_idx 18 first_idx 9
        regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
        regs=40 stack=0 before 17: (07) r2 += -4
        regs=40 stack=0 before 16: (bf) r2 = r10
        regs=40 stack=0 before 15: (bf) r1 = r4
        regs=40 stack=0 before 13: (18) r4 = 0xffff8ad3886c2a00
        regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
        regs=40 stack=0 before 11: (b7) r0 = 0
        regs=40 stack=0 before 10: (b7) r6 = 0
        25: (79) r3 = *(u64 *)(r0 +0)         ; R0_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
        26: (7b) *(u64 *)(r1 +0) = r3         ; R1_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
        27: (95) exit
      
        from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0
        11: (b7) r0 = 0                       ; R0_w=0
        12: (63) *(u32 *)(r10 -4) = r0
        last_idx 12 first_idx 11
        regs=1 stack=0 before 11: (b7) r0 = 0
        13: R0_w=0 R10=fp0 fp-8=0000????
        13: (18) r4 = 0xffff8ad3886c2a00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
        17: (07) r2 += -4                     ; R2_w=fp-4
        18: (85) call bpf_map_lookup_elem#1
        frame 0: propagating r6
        last_idx 19 first_idx 11
        regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
        regs=40 stack=0 before 17: (07) r2 += -4
        regs=40 stack=0 before 16: (bf) r2 = r10
        regs=40 stack=0 before 15: (bf) r1 = r4
        regs=40 stack=0 before 13: (18) r4 = 0xffff8ad3886c2a00
        regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
        regs=40 stack=0 before 11: (b7) r0 = 0
        parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_r=P0 R7=0 R8=0 R9=0 R10=fp0
        last_idx 9 first_idx 9
        regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1
        parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar() R7_w=0 R8_w=0 R9_rw=0 R10=fp0
        last_idx 8 first_idx 0
        regs=40 stack=0 before 8: (b7) r9 = 0
        regs=40 stack=0 before 7: (97) r6 %= 1
        regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
        regs=40 stack=0 before 5: (05) goto pc+0
        regs=40 stack=0 before 4: (97) r6 %= 1025
        regs=40 stack=0 before 3: (b7) r9 = -2147483648
        regs=40 stack=0 before 2: (b7) r8 = 0
        regs=40 stack=0 before 1: (b7) r7 = 0
        regs=40 stack=0 before 0: (b7) r6 = 1024
        19: safe
        frame 0: propagating r6
        last_idx 9 first_idx 0
        regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
        regs=40 stack=0 before 5: (05) goto pc+0
        regs=40 stack=0 before 4: (97) r6 %= 1025
        regs=40 stack=0 before 3: (b7) r9 = -2147483648
        regs=40 stack=0 before 2: (b7) r8 = 0
        regs=40 stack=0 before 1: (b7) r7 = 0
        regs=40 stack=0 before 0: (b7) r6 = 1024
      
        from 6 to 9: safe
        verification time 110 usec
        stack depth 4
        processed 36 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2
      
      The verifier considers this program as safe by mistakenly pruning unsafe
      code paths. In the above func#0, code lines 0-10 are of interest. In line
      0-3 registers r6 to r9 are initialized with known scalar values. In line 4
      the register r6 is reset to an unknown scalar given the verifier does not
      track modulo operations. Due to this, the verifier can also not determine
      precisely which branches in line 6 and 9 are taken, therefore it needs to
      explore them both.
      
      As can be seen, the verifier starts with exploring the false/fall-through
      paths first. The 'from 19 to 21' path has both r6=0 and r9=0 and the pointer
      arithmetic on r0 += r6 is therefore considered safe. Given the arithmetic,
      r6 is correctly marked for precision tracking where backtracking kicks in
      where it walks back the current path all the way where r6 was set to 0 in
      the fall-through branch.
      
      Next, the pruning logics pops the path 'from 9 to 11' from the stack. Also
      here, the state of the registers is the same, that is, r6=0 and r9=0, so
      that at line 19 the path can be pruned as it is considered safe. It is
      interesting to note that the conditional in line 9 turned r6 into a more
      precise state, that is, in the fall-through path at the beginning of line
      10, it is R6=scalar(umin=1), and in the branch-taken path (which is analyzed
      here) at the beginning of line 11, r6 turned into a known const r6=0 as
      r9=0 prior to that and therefore (unsigned) r6 <= 0 concludes that r6 must
      be 0 (**):
      
        [...]                                 ; R6_w=scalar()
        9: (bd) if r6 <= r9 goto pc+1         ; R6=scalar(umin=1) R9=0
        [...]
      
        from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0
        [...]
      
      The next path is 'from 6 to 9'. The verifier considers the old and current
      state equivalent, and therefore prunes the search incorrectly. Looking into
      the two states which are being compared by the pruning logic at line 9, the
      old state consists of R6_rwD=Pscalar() R9_rwD=0 R10=fp0 and the new state
      consists of R1=ctx(off=0,imm=0) R6_w=scalar(umax=18446744071562067968)
      R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0. While r6 had the reg->precise flag
      correctly set in the old state, r9 did not. Both r6'es are considered as
      equivalent given the old one is a superset of the current, more precise one,
      however, r9's actual values (0 vs 0x80000000) mismatch. Given the old r9
      did not have reg->precise flag set, the verifier does not consider the
      register as contributing to the precision state of r6, and therefore it
      considered both r9 states as equivalent. However, for this specific pruned
      path (which is also the actual path taken at runtime), register r6 will be
      0x400 and r9 0x80000000 when reaching line 21, thus oob-accessing the map.
      
      The purpose of precision tracking is to initially mark registers (including
      spilled ones) as imprecise to help verifier's pruning logic finding equivalent
      states it can then prune if they don't contribute to the program's safety
      aspects. For example, if registers are used for pointer arithmetic or to pass
      constant length to a helper, then the verifier sets reg->precise flag and
      backtracks the BPF program instruction sequence and chain of verifier states
      to ensure that the given register or stack slot including their dependencies
      are marked as precisely tracked scalar. This also includes any other registers
      and slots that contribute to a tracked state of given registers/stack slot.
      This backtracking relies on recorded jmp_history and is able to traverse
      entire chain of parent states. This process ends only when all the necessary
      registers/slots and their transitive dependencies are marked as precise.
      
      The backtrack_insn() is called from the current instruction up to the first
      instruction, and its purpose is to compute a bitmask of registers and stack
      slots that need precision tracking in the parent's verifier state. For example,
      if a current instruction is r6 = r7, then r6 needs precision after this
      instruction and r7 needs precision before this instruction, that is, in the
      parent state. Hence for the latter r7 is marked and r6 unmarked.
      
      For the class of jmp/jmp32 instructions, backtrack_insn() today only looks
      at call and exit instructions and for all other conditionals the masks
      remain as-is. However, in the given situation register r6 has a dependency
      on r9 (as described above in **), so also that one needs to be marked for
      precision tracking. In other words, if an imprecise register influences a
      precise one, then the imprecise register should also be marked precise.
      Meaning, in the parent state both dest and src register need to be tracked
      for precision and therefore the marking must be more conservative by setting
      reg->precise flag for both. The precision propagation needs to cover both
      for the conditional: if the src reg was marked but not the dst reg and vice
      versa.
      
      After the fix the program is correctly rejected:
      
        func#0 @0
        0: R1=ctx(off=0,imm=0) R10=fp0
        0: (b7) r6 = 1024                     ; R6_w=1024
        1: (b7) r7 = 0                        ; R7_w=0
        2: (b7) r8 = 0                        ; R8_w=0
        3: (b7) r9 = -2147483648              ; R9_w=-2147483648
        4: (97) r6 %= 1025                    ; R6_w=scalar()
        5: (05) goto pc+0
        6: (bd) if r6 <= r9 goto pc+2         ; R6_w=scalar(umin=18446744071562067969,var_off=(0xffffffff80000000; 0x7fffffff),u32_min=-2147483648) R9_w=-2147483648
        7: (97) r6 %= 1                       ; R6_w=scalar()
        8: (b7) r9 = 0                        ; R9=0
        9: (bd) if r6 <= r9 goto pc+1         ; R6=scalar(umin=1) R9=0
        10: (b7) r6 = 0                       ; R6_w=0
        11: (b7) r0 = 0                       ; R0_w=0
        12: (63) *(u32 *)(r10 -4) = r0
        last_idx 12 first_idx 9
        regs=1 stack=0 before 11: (b7) r0 = 0
        13: R0_w=0 R10=fp0 fp-8=0000????
        13: (18) r4 = 0xffff9290dc5bfe00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
        17: (07) r2 += -4                     ; R2_w=fp-4
        18: (85) call bpf_map_lookup_elem#1   ; R0=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0)
        19: (55) if r0 != 0x0 goto pc+1       ; R0=0
        20: (95) exit
      
        from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
        21: (77) r6 >>= 10                    ; R6_w=0
        22: (27) r6 *= 8192                   ; R6_w=0
        23: (bf) r1 = r0                      ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0)
        24: (0f) r0 += r6
        last_idx 24 first_idx 19
        regs=40 stack=0 before 23: (bf) r1 = r0
        regs=40 stack=0 before 22: (27) r6 *= 8192
        regs=40 stack=0 before 21: (77) r6 >>= 10
        regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1
        parent didn't have regs=40 stack=0 marks: R0_rw=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) R6_rw=P0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
        last_idx 18 first_idx 9
        regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
        regs=40 stack=0 before 17: (07) r2 += -4
        regs=40 stack=0 before 16: (bf) r2 = r10
        regs=40 stack=0 before 15: (bf) r1 = r4
        regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00
        regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
        regs=40 stack=0 before 11: (b7) r0 = 0
        regs=40 stack=0 before 10: (b7) r6 = 0
        25: (79) r3 = *(u64 *)(r0 +0)         ; R0_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
        26: (7b) *(u64 *)(r1 +0) = r3         ; R1_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
        27: (95) exit
      
        from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0
        11: (b7) r0 = 0                       ; R0_w=0
        12: (63) *(u32 *)(r10 -4) = r0
        last_idx 12 first_idx 11
        regs=1 stack=0 before 11: (b7) r0 = 0
        13: R0_w=0 R10=fp0 fp-8=0000????
        13: (18) r4 = 0xffff9290dc5bfe00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
        17: (07) r2 += -4                     ; R2_w=fp-4
        18: (85) call bpf_map_lookup_elem#1
        frame 0: propagating r6
        last_idx 19 first_idx 11
        regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
        regs=40 stack=0 before 17: (07) r2 += -4
        regs=40 stack=0 before 16: (bf) r2 = r10
        regs=40 stack=0 before 15: (bf) r1 = r4
        regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00
        regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
        regs=40 stack=0 before 11: (b7) r0 = 0
        parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_r=P0 R7=0 R8=0 R9=0 R10=fp0
        last_idx 9 first_idx 9
        regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1
        parent didn't have regs=240 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar() R7_w=0 R8_w=0 R9_rw=P0 R10=fp0
        last_idx 8 first_idx 0
        regs=240 stack=0 before 8: (b7) r9 = 0
        regs=40 stack=0 before 7: (97) r6 %= 1
        regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
        regs=240 stack=0 before 5: (05) goto pc+0
        regs=240 stack=0 before 4: (97) r6 %= 1025
        regs=240 stack=0 before 3: (b7) r9 = -2147483648
        regs=40 stack=0 before 2: (b7) r8 = 0
        regs=40 stack=0 before 1: (b7) r7 = 0
        regs=40 stack=0 before 0: (b7) r6 = 1024
        19: safe
      
        from 6 to 9: R1=ctx(off=0,imm=0) R6_w=scalar(umax=18446744071562067968) R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0
        9: (bd) if r6 <= r9 goto pc+1
        last_idx 9 first_idx 0
        regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
        regs=240 stack=0 before 5: (05) goto pc+0
        regs=240 stack=0 before 4: (97) r6 %= 1025
        regs=240 stack=0 before 3: (b7) r9 = -2147483648
        regs=40 stack=0 before 2: (b7) r8 = 0
        regs=40 stack=0 before 1: (b7) r7 = 0
        regs=40 stack=0 before 0: (b7) r6 = 1024
        last_idx 9 first_idx 0
        regs=200 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
        regs=240 stack=0 before 5: (05) goto pc+0
        regs=240 stack=0 before 4: (97) r6 %= 1025
        regs=240 stack=0 before 3: (b7) r9 = -2147483648
        regs=40 stack=0 before 2: (b7) r8 = 0
        regs=40 stack=0 before 1: (b7) r7 = 0
        regs=40 stack=0 before 0: (b7) r6 = 1024
        11: R6=scalar(umax=18446744071562067968) R9=-2147483648
        11: (b7) r0 = 0                       ; R0_w=0
        12: (63) *(u32 *)(r10 -4) = r0
        last_idx 12 first_idx 11
        regs=1 stack=0 before 11: (b7) r0 = 0
        13: R0_w=0 R10=fp0 fp-8=0000????
        13: (18) r4 = 0xffff9290dc5bfe00      ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        15: (bf) r1 = r4                      ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
        16: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
        17: (07) r2 += -4                     ; R2_w=fp-4
        18: (85) call bpf_map_lookup_elem#1   ; R0_w=map_value_or_null(id=3,off=0,ks=4,vs=48,imm=0)
        19: (55) if r0 != 0x0 goto pc+1       ; R0_w=0
        20: (95) exit
      
        from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=scalar(umax=18446744071562067968) R7=0 R8=0 R9=-2147483648 R10=fp0 fp-8=mmmm????
        21: (77) r6 >>= 10                    ; R6_w=scalar(umax=18014398507384832,var_off=(0x0; 0x3fffffffffffff))
        22: (27) r6 *= 8192                   ; R6_w=scalar(smax=9223372036854767616,umax=18446744073709543424,var_off=(0x0; 0xffffffffffffe000),s32_max=2147475456,u32_max=-8192)
        23: (bf) r1 = r0                      ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0)
        24: (0f) r0 += r6
        last_idx 24 first_idx 21
        regs=40 stack=0 before 23: (bf) r1 = r0
        regs=40 stack=0 before 22: (27) r6 *= 8192
        regs=40 stack=0 before 21: (77) r6 >>= 10
        parent didn't have regs=40 stack=0 marks: R0_rw=map_value(off=0,ks=4,vs=48,imm=0) R6_r=Pscalar(umax=18446744071562067968) R7=0 R8=0 R9=-2147483648 R10=fp0 fp-8=mmmm????
        last_idx 19 first_idx 11
        regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1
        regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
        regs=40 stack=0 before 17: (07) r2 += -4
        regs=40 stack=0 before 16: (bf) r2 = r10
        regs=40 stack=0 before 15: (bf) r1 = r4
        regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00
        regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
        regs=40 stack=0 before 11: (b7) r0 = 0
        parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar(umax=18446744071562067968) R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0
        last_idx 9 first_idx 0
        regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1
        regs=240 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
        regs=240 stack=0 before 5: (05) goto pc+0
        regs=240 stack=0 before 4: (97) r6 %= 1025
        regs=240 stack=0 before 3: (b7) r9 = -2147483648
        regs=40 stack=0 before 2: (b7) r8 = 0
        regs=40 stack=0 before 1: (b7) r7 = 0
        regs=40 stack=0 before 0: (b7) r6 = 1024
        math between map_value pointer and register with unbounded min value is not allowed
        verification time 886 usec
        stack depth 4
        processed 49 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 2
      
      Fixes: b5dc0163 ("bpf: precise scalar_value tracking")
      Reported-by: default avatarJuan Jose Lopez Jaimez <jjlopezjaimez@google.com>
      Reported-by: default avatarMeador Inge <meadori@google.com>
      Reported-by: default avatarSimon Scannell <simonscannell@google.com>
      Reported-by: default avatarNenad Stojanovski <thenenadx@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Co-developed-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarJuan Jose Lopez Jaimez <jjlopezjaimez@google.com>
      Reviewed-by: default avatarMeador Inge <meadori@google.com>
      Reviewed-by: default avatarSimon Scannell <simonscannell@google.com>
      71b547f5
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · 789b4a41
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
      
       - Address two issues with the new GSS krb5 Kunit tests
      
      * tag 'nfsd-6.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        SUNRPC: Fix failures of checksum Kunit tests
        sunrpc: Fix RFC6803 encryption test
      789b4a41
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.3-1' of... · 40aacb31
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen:
       "Some bug fixes, some build fixes, a comment fix and a trivial cleanup"
      
      * tag 'loongarch-fixes-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        tools/loongarch: Use __SIZEOF_LONG__ to define __BITS_PER_LONG
        LoongArch: Replace hard-coded values in comments with VALEN
        LoongArch: Clean up plat_swiotlb_setup() related code
        LoongArch: Check unwind_error() in arch_stack_walk()
        LoongArch: Adjust user_regset_copyin parameter to the correct offset
        LoongArch: Adjust user_watch_state for explicit alignment
        LoongArch: module: set section addresses to 0x0
        LoongArch: Mark 3 symbol exports as non-GPL
        LoongArch: Enable PG when wakeup from suspend
        LoongArch: Fix _CONST64_(x) as unsigned
        LoongArch: Fix build error if CONFIG_SUSPEND is not set
        LoongArch: Fix probing of the CRC32 feature
        LoongArch: Make WriteCombine configurable for ioremap()
      40aacb31
    • Li Lanzhe's avatar
      spi: spi-rockchip: Fix missing unwind goto in rockchip_sfc_probe() · 359f5b0d
      Li Lanzhe authored
      If devm_request_irq() fails, then we are directly return 'ret' without
      clk_disable_unprepare(sfc->clk) and clk_disable_unprepare(sfc->hclk).
      
      Fix this by changing direct return to a goto 'err_irq'.
      
      Fixes: 0b89fc0a ("spi: rockchip-sfc: add rockchip serial flash controller")
      Signed-off-by: default avatarLi Lanzhe <u202212060@hust.edu.cn>
      Reviewed-by: default avatarDongliang Mu <dzm91@hust.edu.cn>
      Link: https://lore.kernel.org/r/20230419115030.6029-1-u202212060@hust.edu.cnSigned-off-by: default avatarMark Brown <broonie@kernel.org>
      359f5b0d
    • Arnd Bergmann's avatar
      hamradio: drop ISA_DMA_API dependency · fcd4843a
      Arnd Bergmann authored
      It looks like the dependency got added accidentally in commit a5532606
      ("[PATCH] ISA DMA Kconfig fixes - part 3"). Unlike the previously removed
      dmascc driver, the scc driver never used DMA.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fcd4843a
    • Ido Schimmel's avatar
      mlxsw: pci: Fix possible crash during initialization · 1f64757e
      Ido Schimmel authored
      During initialization the driver issues a reset command via its command
      interface in order to remove previous configuration from the device.
      
      After issuing the reset, the driver waits for 200ms before polling on
      the "system_status" register using memory-mapped IO until the device
      reaches a ready state (0x5E). The wait is necessary because the reset
      command only triggers the reset, but the reset itself happens
      asynchronously. If the driver starts polling too soon, the read of the
      "system_status" register will never return and the system will crash
      [1].
      
      The issue was discovered when the device was flashed with a development
      firmware version where the reset routine took longer to complete. The
      issue was fixed in the firmware, but it exposed the fact that the
      current wait time is borderline.
      
      Fix by increasing the wait time from 200ms to 400ms. With this patch and
      the buggy firmware version, the issue did not reproduce in 10 reboots
      whereas without the patch the issue is reproduced quite consistently.
      
      [1]
      mce: CPUs not responding to MCE broadcast (may include false positives): 0,4
      mce: CPUs not responding to MCE broadcast (may include false positives): 0,4
      Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
      Shutting down cpus with NMI
      Kernel Offset: 0x12000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      
      Fixes: ac004e84 ("mlxsw: pci: Wait longer before accessing the device after reset")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f64757e
    • David S. Miller's avatar
      Merge branch 'mptcp-fixes' · ed7f9c01
      David S. Miller authored
      Matthieu Baerts says:
      
      ====================
      mptcp: fixes around listening sockets and the MPTCP worker
      
      Christoph Paasch reported a couple of issues found by syzkaller and
      linked to operations done by the MPTCP worker on (un)accepted sockets.
      
      Fixing these issues was not obvious and rather complex but Paolo Abeni
      nicely managed to propose these excellent patches that seem to satisfy
      syzkaller.
      
      Patch 1 partially reverts a recent fix but while still providing a
      solution for the previous issue, it also prevents the MPTCP worker from
      running concurrently with inet_csk_listen_stop(). A warning is then
      avoided. The partially reverted patch has been introduced in v6.3-rc3,
      backported up to v6.1 and fixing an issue visible from v5.18.
      
      Patch 2 prevents the MPTCP worker to race with mptcp_accept() causing a
      UaF when a fallback to TCP is done while in parallel, the socket is
      being accepted by the userspace. This is also a fix of a previous fix
      introduced in v6.3-rc3, backported up to v6.1 but here fixing an issue
      that is in theory there from v5.7. There is no need to backport it up
      to here as it looks like it is only visible later, around v5.18, see the
      previous cover-letter linked to this original fix.
      ====================
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      ed7f9c01
    • Paolo Abeni's avatar
      mptcp: fix accept vs worker race · 63740448
      Paolo Abeni authored
      The mptcp worker and mptcp_accept() can race, as reported by Christoph:
      
      refcount_t: addition on 0; use-after-free.
      WARNING: CPU: 1 PID: 14351 at lib/refcount.c:25 refcount_warn_saturate+0x105/0x1b0 lib/refcount.c:25
      Modules linked in:
      CPU: 1 PID: 14351 Comm: syz-executor.2 Not tainted 6.3.0-rc1-gde5e8fd0123c #11
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      RIP: 0010:refcount_warn_saturate+0x105/0x1b0 lib/refcount.c:25
      Code: 02 31 ff 89 de e8 1b f0 a7 ff 84 db 0f 85 6e ff ff ff e8 3e f5 a7 ff 48 c7 c7 d8 c7 34 83 c6 05 6d 2d 0f 02 01 e8 cb 3d 90 ff <0f> 0b e9 4f ff ff ff e8 1f f5 a7 ff 0f b6 1d 54 2d 0f 02 31 ff 89
      RSP: 0018:ffffc90000a47bf8 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: ffff88802eae98c0 RSI: ffffffff81097d4f RDI: 0000000000000001
      RBP: ffff88802e712180 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000001 R11: ffff88802eaea148 R12: ffff88802e712100
      R13: ffff88802e712a88 R14: ffff888005cb93a8 R15: ffff88802e712a88
      FS:  0000000000000000(0000) GS:ffff88803ed00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f277fd89120 CR3: 0000000035486002 CR4: 0000000000370ee0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       __refcount_add include/linux/refcount.h:199 [inline]
       __refcount_inc include/linux/refcount.h:250 [inline]
       refcount_inc include/linux/refcount.h:267 [inline]
       sock_hold include/net/sock.h:775 [inline]
       __mptcp_close+0x4c6/0x4d0 net/mptcp/protocol.c:3051
       mptcp_close+0x24/0xe0 net/mptcp/protocol.c:3072
       inet_release+0x56/0xa0 net/ipv4/af_inet.c:429
       __sock_release+0x51/0xf0 net/socket.c:653
       sock_close+0x18/0x20 net/socket.c:1395
       __fput+0x113/0x430 fs/file_table.c:321
       task_work_run+0x96/0x100 kernel/task_work.c:179
       exit_task_work include/linux/task_work.h:38 [inline]
       do_exit+0x4fc/0x10c0 kernel/exit.c:869
       do_group_exit+0x51/0xf0 kernel/exit.c:1019
       get_signal+0x12b0/0x1390 kernel/signal.c:2859
       arch_do_signal_or_restart+0x25/0x260 arch/x86/kernel/signal.c:306
       exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
       exit_to_user_mode_prepare+0x131/0x1a0 kernel/entry/common.c:203
       __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
       syscall_exit_to_user_mode+0x19/0x40 kernel/entry/common.c:296
       do_syscall_64+0x46/0x90 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      RIP: 0033:0x7fec4b4926a9
      Code: Unable to access opcode bytes at 0x7fec4b49267f.
      RSP: 002b:00007fec49f9dd78 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
      RAX: fffffffffffffe00 RBX: 00000000006bc058 RCX: 00007fec4b4926a9
      RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00000000006bc058
      RBP: 00000000006bc050 R08: 00000000007df998 R09: 00000000007df998
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006bc05c
      R13: fffffffffffffea8 R14: 000000000000000b R15: 000000000001fe40
       </TASK>
      
      The root cause is that the worker can force fallback to TCP the first
      mptcp subflow, actually deleting the unaccepted msk socket.
      
      We can explicitly prevent the race delaying the unaccepted msk deletion
      at listener shutdown time. In case the closed subflow is later accepted,
      just drop the mptcp context and let the user-space deal with the
      paired mptcp socket.
      
      Fixes: b6985b9b ("mptcp: use the workqueue to destroy unaccepted sockets")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/375Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Tested-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63740448
    • Paolo Abeni's avatar
      mptcp: stops worker on unaccepted sockets at listener close · 2a6a870e
      Paolo Abeni authored
      This is a partial revert of the blamed commit, with a relevant
      change: mptcp_subflow_queue_clean() now just change the msk
      socket status and stop the worker, so that the UaF issue addressed
      by the blamed commit is not re-introduced.
      
      The above prevents the mptcp worker from running concurrently with
      inet_csk_listen_stop(), as such race would trigger a warning, as
      reported by Christoph:
      
      RSP: 002b:00007f784fe09cd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      WARNING: CPU: 0 PID: 25807 at net/ipv4/inet_connection_sock.c:1387 inet_csk_listen_stop+0x664/0x870 net/ipv4/inet_connection_sock.c:1387
      RAX: ffffffffffffffda RBX: 00000000006bc050 RCX: 00007f7850afd6a9
      RDX: 0000000000000000 RSI: 0000000020000340 RDI: 0000000000000004
      Modules linked in:
      RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006bc05c
      R13: fffffffffffffea8 R14: 00000000006bc050 R15: 000000000001fe40
      
       </TASK>
      CPU: 0 PID: 25807 Comm: syz-executor.7 Not tainted 6.2.0-g778e54711659 #7
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      RIP: 0010:inet_csk_listen_stop+0x664/0x870 net/ipv4/inet_connection_sock.c:1387
      RAX: 0000000000000000 RBX: ffff888100dfbd40 RCX: 0000000000000000
      RDX: ffff8881363aab80 RSI: ffffffff81c494f4 RDI: 0000000000000005
      RBP: ffff888126dad080 R08: 0000000000000005 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000000 R12: ffff888100dfe040
      R13: 0000000000000001 R14: 0000000000000000 R15: ffff888100dfbdd8
      FS:  00007f7850a2c800(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b32d26000 CR3: 000000012fdd8006 CR4: 0000000000770ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <TASK>
       __tcp_close+0x5b2/0x620 net/ipv4/tcp.c:2875
       __mptcp_close_ssk+0x145/0x3d0 net/mptcp/protocol.c:2427
       mptcp_destroy_common+0x8a/0x1c0 net/mptcp/protocol.c:3277
       mptcp_destroy+0x41/0x60 net/mptcp/protocol.c:3304
       __mptcp_destroy_sock+0x56/0x140 net/mptcp/protocol.c:2965
       __mptcp_close+0x38f/0x4a0 net/mptcp/protocol.c:3057
       mptcp_close+0x24/0xe0 net/mptcp/protocol.c:3072
       inet_release+0x53/0xa0 net/ipv4/af_inet.c:429
       __sock_release+0x4e/0xf0 net/socket.c:651
       sock_close+0x15/0x20 net/socket.c:1393
       __fput+0xff/0x420 fs/file_table.c:321
       task_work_run+0x8b/0xe0 kernel/task_work.c:179
       resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0x113/0x120 kernel/entry/common.c:203
       __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
       syscall_exit_to_user_mode+0x1d/0x40 kernel/entry/common.c:296
       do_syscall_64+0x46/0x90 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      RIP: 0033:0x7f7850af70dc
      RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f7850af70dc
      RDX: 00007f7850a2c800 RSI: 0000000000000002 RDI: 0000000000000003
      RBP: 00000000006bd980 R08: 0000000000000000 R09: 00000000000018a0
      R10: 00000000316338a4 R11: 0000000000000293 R12: 0000000000211e31
      R13: 00000000006bc05c R14: 00007f785062c000 R15: 0000000000211af0
      
      Fixes: 0a3f4f1f ("mptcp: fix UaF in listener shutdown")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/371Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a6a870e
    • Alexander Aring's avatar
      net: rpl: fix rpl header size calculation · 4e006c7a
      Alexander Aring authored
      This patch fixes a missing 8 byte for the header size calculation. The
      ipv6_rpl_srh_size() is used to check a skb_pull() on skb->data which
      points to skb_transport_header(). Currently we only check on the
      calculated addresses fields using CmprI and CmprE fields, see:
      
      https://www.rfc-editor.org/rfc/rfc6554#section-3
      
      there is however a missing 8 byte inside the calculation which stands
      for the fields before the addresses field. Those 8 bytes are represented
      by sizeof(struct ipv6_rpl_sr_hdr) expression.
      
      Fixes: 8610c7c6 ("net: ipv6: add support for rpl sr exthdr")
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Reported-by: default avatarmaxpl0it <maxpl0it@protonmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e006c7a
    • Seiji Nishikawa's avatar
      net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete() · 6f483338
      Seiji Nishikawa authored
      When vmxnet3_rq_create() fails to allocate rq->data_ring.base due to page
      allocation failure, subsequent call to vmxnet3_rq_rx_complete() can result in
      NULL pointer dereference.
      
      To fix this bug, check not only that rxDataRingUsed is true but also that
      adapter->rxdataring_enabled is true before calling memcpy() in
      vmxnet3_rq_rx_complete().
      
      [1728352.477993] ethtool: page allocation failure: order:9, mode:0x6000c0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0
      ...
      [1728352.478009] Call Trace:
      [1728352.478028]  dump_stack+0x41/0x60
      [1728352.478035]  warn_alloc.cold.120+0x7b/0x11b
      [1728352.478038]  ? _cond_resched+0x15/0x30
      [1728352.478042]  ? __alloc_pages_direct_compact+0x15f/0x170
      [1728352.478043]  __alloc_pages_slowpath+0xcd3/0xd10
      [1728352.478047]  __alloc_pages_nodemask+0x2e2/0x320
      [1728352.478049]  __dma_direct_alloc_pages.constprop.25+0x8a/0x120
      [1728352.478053]  dma_direct_alloc+0x5a/0x2a0
      [1728352.478056]  vmxnet3_rq_create.part.57+0x17c/0x1f0 [vmxnet3]
      ...
      [1728352.478188] vmxnet3 0000:0b:00.0 ens192: rx data ring will be disabled
      ...
      [1728352.515347] BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
      ...
      [1728352.515440] RIP: 0010:memcpy_orig+0x54/0x130
      ...
      [1728352.515655] Call Trace:
      [1728352.515665]  <IRQ>
      [1728352.515672]  vmxnet3_rq_rx_complete+0x419/0xef0 [vmxnet3]
      [1728352.515690]  vmxnet3_poll_rx_only+0x31/0xa0 [vmxnet3]
      ...
      Signed-off-by: default avatarSeiji Nishikawa <snishika@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f483338
    • Ido Schimmel's avatar
      bonding: Fix memory leak when changing bond type to Ethernet · c484fcc0
      Ido Schimmel authored
      When a net device is put administratively up, its 'IFF_UP' flag is set
      (if not set already) and a 'NETDEV_UP' notification is emitted, which
      causes the 8021q driver to add VLAN ID 0 on the device. The reverse
      happens when a net device is put administratively down.
      
      When changing the type of a bond to Ethernet, its 'IFF_UP' flag is
      incorrectly cleared, resulting in the kernel skipping the above process
      and VLAN ID 0 being leaked [1].
      
      Fix by restoring the flag when changing the type to Ethernet, in a
      similar fashion to the restoration of the 'IFF_SLAVE' flag.
      
      The issue can be reproduced using the script in [2], with example out
      before and after the fix in [3].
      
      [1]
      unreferenced object 0xffff888103479900 (size 256):
        comm "ip", pid 329, jiffies 4294775225 (age 28.561s)
        hex dump (first 32 bytes):
          00 a0 0c 15 81 88 ff ff 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81a6051a>] kmalloc_trace+0x2a/0xe0
          [<ffffffff8406426c>] vlan_vid_add+0x30c/0x790
          [<ffffffff84068e21>] vlan_device_event+0x1491/0x21a0
          [<ffffffff81440c8e>] notifier_call_chain+0xbe/0x1f0
          [<ffffffff8372383a>] call_netdevice_notifiers_info+0xba/0x150
          [<ffffffff837590f2>] __dev_notify_flags+0x132/0x2e0
          [<ffffffff8375ad9f>] dev_change_flags+0x11f/0x180
          [<ffffffff8379af36>] do_setlink+0xb96/0x4060
          [<ffffffff837adf6a>] __rtnl_newlink+0xc0a/0x18a0
          [<ffffffff837aec6c>] rtnl_newlink+0x6c/0xa0
          [<ffffffff837ac64e>] rtnetlink_rcv_msg+0x43e/0xe00
          [<ffffffff839a99e0>] netlink_rcv_skb+0x170/0x440
          [<ffffffff839a738f>] netlink_unicast+0x53f/0x810
          [<ffffffff839a7fcb>] netlink_sendmsg+0x96b/0xe90
          [<ffffffff8369d12f>] ____sys_sendmsg+0x30f/0xa70
          [<ffffffff836a6d7a>] ___sys_sendmsg+0x13a/0x1e0
      unreferenced object 0xffff88810f6a83e0 (size 32):
        comm "ip", pid 329, jiffies 4294775225 (age 28.561s)
        hex dump (first 32 bytes):
          a0 99 47 03 81 88 ff ff a0 99 47 03 81 88 ff ff  ..G.......G.....
          81 00 00 00 01 00 00 00 cc cc cc cc cc cc cc cc  ................
        backtrace:
          [<ffffffff81a6051a>] kmalloc_trace+0x2a/0xe0
          [<ffffffff84064369>] vlan_vid_add+0x409/0x790
          [<ffffffff84068e21>] vlan_device_event+0x1491/0x21a0
          [<ffffffff81440c8e>] notifier_call_chain+0xbe/0x1f0
          [<ffffffff8372383a>] call_netdevice_notifiers_info+0xba/0x150
          [<ffffffff837590f2>] __dev_notify_flags+0x132/0x2e0
          [<ffffffff8375ad9f>] dev_change_flags+0x11f/0x180
          [<ffffffff8379af36>] do_setlink+0xb96/0x4060
          [<ffffffff837adf6a>] __rtnl_newlink+0xc0a/0x18a0
          [<ffffffff837aec6c>] rtnl_newlink+0x6c/0xa0
          [<ffffffff837ac64e>] rtnetlink_rcv_msg+0x43e/0xe00
          [<ffffffff839a99e0>] netlink_rcv_skb+0x170/0x440
          [<ffffffff839a738f>] netlink_unicast+0x53f/0x810
          [<ffffffff839a7fcb>] netlink_sendmsg+0x96b/0xe90
          [<ffffffff8369d12f>] ____sys_sendmsg+0x30f/0xa70
          [<ffffffff836a6d7a>] ___sys_sendmsg+0x13a/0x1e0
      
      [2]
      ip link add name t-nlmon type nlmon
      ip link add name t-dummy type dummy
      ip link add name t-bond type bond mode active-backup
      
      ip link set dev t-bond up
      ip link set dev t-nlmon master t-bond
      ip link set dev t-nlmon nomaster
      ip link show dev t-bond
      ip link set dev t-dummy master t-bond
      ip link show dev t-bond
      
      ip link del dev t-bond
      ip link del dev t-dummy
      ip link del dev t-nlmon
      
      [3]
      Before:
      
      12: t-bond: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
          link/netlink
      12: t-bond: <BROADCAST,MULTICAST,MASTER,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
          link/ether 46:57:39:a4:46:a2 brd ff:ff:ff:ff:ff:ff
      
      After:
      
      12: t-bond: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
          link/netlink
      12: t-bond: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
          link/ether 66:48:7b:74:b6:8a brd ff:ff:ff:ff:ff:ff
      
      Fixes: e36b9d16 ("bonding: clean muticast addresses when device changes type")
      Fixes: 75c78500 ("bonding: remap muticast addresses without using dev_close() and dev_open()")
      Fixes: 9ec7eb60 ("bonding: restore IFF_MASTER/SLAVE flags on bond enslave ether type change")
      Reported-by: default avatarMirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
      Link: https://lore.kernel.org/netdev/78a8a03b-6070-3e6b-5042-f848dab16fb8@alu.unizg.hr/Tested-by: default avatarMirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c484fcc0
    • Tiezhu Yang's avatar
      tools/loongarch: Use __SIZEOF_LONG__ to define __BITS_PER_LONG · b5533e99
      Tiezhu Yang authored
      Although __SIZEOF_POINTER__ is equal to _SIZEOF_LONG__ on LoongArch,
      it is better to use __SIZEOF_LONG__ to define __BITS_PER_LONG to keep
      consistent between arch/loongarch/include/uapi/asm/bitsperlong.h and
      tools/arch/loongarch/include/uapi/asm/bitsperlong.h.
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      b5533e99
    • Enze Li's avatar
      LoongArch: Replace hard-coded values in comments with VALEN · 213ef669
      Enze Li authored
      According to LoongArch documentation [1], CSR.PGDL and CSR.PGDH are
      concerned with the VA's MSB which is VALEN-1 instead of always being 47.
      Fix comments to avoid misleading others.
      
      [1] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#page-global-directory-base-address-for-lower-half-address-spaceReviewed-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarEnze Li <lienze@kylinos.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      213ef669
    • Tiezhu Yang's avatar
      LoongArch: Clean up plat_swiotlb_setup() related code · afca6e06
      Tiezhu Yang authored
      After commit c78c43fe ("LoongArch: Use acpi_arch_dma_setup() and
      remove ARCH_HAS_PHYS_TO_DMA"), plat_swiotlb_setup() has been deleted,
      so clean up the related code.
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      afca6e06
    • Tiezhu Yang's avatar
      LoongArch: Check unwind_error() in arch_stack_walk() · 370a3b8f
      Tiezhu Yang authored
      We can see the following messages with CONFIG_PROVE_LOCKING=y on
      LoongArch:
      
        BUG: MAX_STACK_TRACE_ENTRIES too low!
        turning off the locking correctness validator.
      
      This is because stack_trace_save() returns a big value after call
      arch_stack_walk(), here is the call trace:
      
        save_trace()
          stack_trace_save()
            arch_stack_walk()
              stack_trace_consume_entry()
      
      arch_stack_walk() should return immediately if unwind_next_frame()
      failed, no need to do the useless loops to increase the value of c->len
      in stack_trace_consume_entry(), then we can fix the above problem.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Link: https://lore.kernel.org/all/8a44ad71-68d2-4926-892f-72bfc7a67e2a@roeck-us.net/Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      370a3b8f
    • Qing Zhang's avatar
      LoongArch: Adjust user_regset_copyin parameter to the correct offset · e32b3b82
      Qing Zhang authored
      Ensure that user_watch_state can be set correctly by the user.
      Signed-off-by: default avatarQing Zhang <zhangqing@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      e32b3b82
    • Qing Zhang's avatar
      LoongArch: Adjust user_watch_state for explicit alignment · ff9f3d7a
      Qing Zhang authored
      This is done in order to easily calculate the number of breakpoints in
      hw_break_get()/hw_break_set().
      Signed-off-by: default avatarQing Zhang <zhangqing@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      ff9f3d7a
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 92e8c732
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Unbreak br_netfilter physdev match support, from Florian Westphal.
      
      2) Use GFP_KERNEL_ACCOUNT for stateful/policy objects, from Chen Aotian.
      
      3) Use IS_ENABLED() in nf_reset_trace(), from Florian Westphal.
      
      4) Fix validation of catch-all set element.
      
      5) Tighten requirements for catch-all set elements.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements
        netfilter: nf_tables: validate catch-all set elements
        netfilter: nf_tables: fix ifdef to also consider nf_tables=m
        netfilter: nf_tables: Modify nla_memdup's flag to GFP_KERNEL_ACCOUNT
        netfilter: br_netfilter: fix recent physdev match breakage
      ====================
      
      Link: https://lore.kernel.org/r/20230418145048.67270-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      92e8c732
  4. 18 Apr, 2023 2 commits
    • Ryusuke Konishi's avatar
      nilfs2: initialize unused bytes in segment summary blocks · ef832747
      Ryusuke Konishi authored
      Syzbot still reports uninit-value in nilfs_add_checksums_on_logs() for
      KMSAN enabled kernels after applying commit 73970316 ("nilfs2:
      initialize "struct nilfs_binfo_dat"->bi_pad field").
      
      This is because the unused bytes at the end of each block in segment
      summaries are not initialized.  So this fixes the issue by padding the
      unused bytes with null bytes.
      
      Link: https://lkml.kernel.org/r/20230417173513.12598-1-konishi.ryusuke@gmail.comSigned-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: syzbot+048585f3f4227bb2b49b@syzkaller.appspotmail.com
        Link: https://syzkaller.appspot.com/bug?extid=048585f3f4227bb2b49b
      Cc: Alexander Potapenko <glider@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ef832747
    • Mel Gorman's avatar
      mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages · 4d73ba5f
      Mel Gorman authored
      A bug was reported by Yuanxi Liu where allocating 1G pages at runtime is
      taking an excessive amount of time for large amounts of memory.  Further
      testing allocating huge pages that the cost is linear i.e.  if allocating
      1G pages in batches of 10 then the time to allocate nr_hugepages from
      10->20->30->etc increases linearly even though 10 pages are allocated at
      each step.  Profiles indicated that much of the time is spent checking the
      validity within already existing huge pages and then attempting a
      migration that fails after isolating the range, draining pages and a whole
      lot of other useless work.
      
      Commit eb14d4ee ("mm,page_alloc: drop unnecessary checks from
      pfn_range_valid_contig") removed two checks, one which ignored huge pages
      for contiguous allocations as huge pages can sometimes migrate.  While
      there may be value on migrating a 2M page to satisfy a 1G allocation, it's
      potentially expensive if the 1G allocation fails and it's pointless to try
      moving a 1G page for a new 1G allocation or scan the tail pages for valid
      PFNs.
      
      Reintroduce the PageHuge check and assume any contiguous region with
      hugetlbfs pages is unsuitable for a new 1G allocation.
      
      The hpagealloc test allocates huge pages in batches and reports the
      average latency per page over time.  This test happens just after boot
      when fragmentation is not an issue.  Units are in milliseconds.
      
      hpagealloc
                                     6.3.0-rc6              6.3.0-rc6              6.3.0-rc6
                                       vanilla   hugeallocrevert-v1r1   hugeallocsimple-v1r2
      Min       Latency       26.42 (   0.00%)        5.07 (  80.82%)       18.94 (  28.30%)
      1st-qrtle Latency      356.61 (   0.00%)        5.34 (  98.50%)       19.85 (  94.43%)
      2nd-qrtle Latency      697.26 (   0.00%)        5.47 (  99.22%)       20.44 (  97.07%)
      3rd-qrtle Latency      972.94 (   0.00%)        5.50 (  99.43%)       20.81 (  97.86%)
      Max-1     Latency       26.42 (   0.00%)        5.07 (  80.82%)       18.94 (  28.30%)
      Max-5     Latency       82.14 (   0.00%)        5.11 (  93.78%)       19.31 (  76.49%)
      Max-10    Latency      150.54 (   0.00%)        5.20 (  96.55%)       19.43 (  87.09%)
      Max-90    Latency     1164.45 (   0.00%)        5.53 (  99.52%)       20.97 (  98.20%)
      Max-95    Latency     1223.06 (   0.00%)        5.55 (  99.55%)       21.06 (  98.28%)
      Max-99    Latency     1278.67 (   0.00%)        5.57 (  99.56%)       22.56 (  98.24%)
      Max       Latency     1310.90 (   0.00%)        8.06 (  99.39%)       26.62 (  97.97%)
      Amean     Latency      678.36 (   0.00%)        5.44 *  99.20%*       20.44 *  96.99%*
      
                         6.3.0-rc6   6.3.0-rc6   6.3.0-rc6
                           vanilla   revert-v1   hugeallocfix-v2
      Duration User           0.28        0.27        0.30
      Duration System       808.66       17.77       35.99
      Duration Elapsed      830.87       18.08       36.33
      
      The vanilla kernel is poor, taking up to 1.3 second to allocate a huge
      page and almost 10 minutes in total to run the test.  Reverting the
      problematic commit reduces it to 8ms at worst and the patch takes 26ms. 
      This patch fixes the main issue with skipping huge pages but leaves the
      page_count() out because a page with an elevated count potentially can
      migrate.
      
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=217022
      Link: https://lkml.kernel.org/r/20230414141429.pwgieuwluxwez3rj@techsingularity.net
      Fixes: eb14d4ee ("mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig")
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reported-by: default avatarYuanxi Liu <y.liu@naruida.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4d73ba5f