1. 25 Jan, 2023 1 commit
  2. 20 Jan, 2023 1 commit
  3. 19 Jan, 2023 5 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: Fix to preserve reg parent/live fields when copying range info' · 2eecf81e
      Alexei Starovoitov authored
      Eduard Zingerman says:
      
      ====================
      
      Struct bpf_reg_state is copied directly in several places including:
      - check_stack_write_fixed_off() (via save_register_state());
      - check_stack_read_fixed_off();
      - find_equal_scalars().
      
      However, a literal copy of this struct also copies the following fields:
      
      struct bpf_reg_state {
      	...
      	struct bpf_reg_state *parent;
      	...
      	enum bpf_reg_liveness live;
      	...
      };
      
      This breaks register parentage chain and liveness marking logic.
      The commit message for the first patch has a detailed example.
      This patch-set replaces direct copies with a call to a function
      copy_register_state(dst,src), which preserves 'parent' and 'live'
      fields of the 'dst'.
      
      The fix comes with a significant verifier runtime penalty for some
      selftest binaries listed in tools/testing/selftests/bpf/veristat.cfg
      and cilium BPF binaries (see [1]):
      
      $ ./veristat -e file,prog,states -C -f 'states_diff>10' master-baseline.log current.log
      File                        Program                           States (A)  States (B)  States   (DIFF)
      --------------------------  --------------------------------  ----------  ----------  ---------------
      bpf_host.o                  tail_handle_ipv4_from_host               231         299    +68 (+29.44%)
      bpf_host.o                  tail_handle_nat_fwd_ipv4                1088        1320   +232 (+21.32%)
      bpf_host.o                  tail_handle_nat_fwd_ipv6                 716         729     +13 (+1.82%)
      bpf_host.o                  tail_nodeport_nat_ingress_ipv4           281         314    +33 (+11.74%)
      bpf_host.o                  tail_nodeport_nat_ingress_ipv6           245         256     +11 (+4.49%)
      bpf_lxc.o                   tail_handle_nat_fwd_ipv4                1088        1320   +232 (+21.32%)
      bpf_lxc.o                   tail_handle_nat_fwd_ipv6                 716         729     +13 (+1.82%)
      bpf_lxc.o                   tail_ipv4_ct_egress                      239         262     +23 (+9.62%)
      bpf_lxc.o                   tail_ipv4_ct_ingress                     239         262     +23 (+9.62%)
      bpf_lxc.o                   tail_ipv4_ct_ingress_policy_only         239         262     +23 (+9.62%)
      bpf_lxc.o                   tail_ipv6_ct_egress                      181         195     +14 (+7.73%)
      bpf_lxc.o                   tail_ipv6_ct_ingress                     181         195     +14 (+7.73%)
      bpf_lxc.o                   tail_ipv6_ct_ingress_policy_only         181         195     +14 (+7.73%)
      bpf_lxc.o                   tail_nodeport_nat_ingress_ipv4           281         314    +33 (+11.74%)
      bpf_lxc.o                   tail_nodeport_nat_ingress_ipv6           245         256     +11 (+4.49%)
      bpf_overlay.o               tail_handle_nat_fwd_ipv4                 799         829     +30 (+3.75%)
      bpf_overlay.o               tail_nodeport_nat_ingress_ipv4           281         314    +33 (+11.74%)
      bpf_overlay.o               tail_nodeport_nat_ingress_ipv6           245         256     +11 (+4.49%)
      bpf_sock.o                  cil_sock4_connect                         47          70    +23 (+48.94%)
      bpf_sock.o                  cil_sock4_sendmsg                         45          68    +23 (+51.11%)
      bpf_sock.o                  cil_sock6_post_bind                       31          42    +11 (+35.48%)
      bpf_xdp.o                   tail_lb_ipv4                            4413        6457  +2044 (+46.32%)
      bpf_xdp.o                   tail_lb_ipv6                            6876        7249    +373 (+5.42%)
      test_cls_redirect.bpf.o     cls_redirect                            4704        4799     +95 (+2.02%)
      test_tcp_hdr_options.bpf.o  estab                                    180         206    +26 (+14.44%)
      xdp_synproxy_kern.bpf.o     syncookie_tc                           21059       21485    +426 (+2.02%)
      xdp_synproxy_kern.bpf.o     syncookie_xdp                          21857       23122   +1265 (+5.79%)
      --------------------------  --------------------------------  ----------  ----------  ---------------
      
      I looked through verification log for bpf_xdp.o tail_lb_ipv4 program in
      order to identify the reason for ~50% visited states increase.
      The slowdown is triggered by a difference in handling of three stack slots:
      fp-56, fp-72 and fp-80, with the main difference coming from fp-72.
      In fact the following change removes all the difference:
      
      @@ -3256,7 +3256,10 @@ static void save_register_state(struct bpf_func_state *state,
       {
              int i;
      
      -       copy_register_state(&state->stack[spi].spilled_ptr, reg);
      +       if ((spi == 6 /*56*/ || spi == 8 /*72*/ || spi == 9 /*80*/) && size != BPF_REG_SIZE)
      +               state->stack[spi].spilled_ptr = *reg;
      +       else
      +               copy_register_state(&state->stack[spi].spilled_ptr, reg);
      
      For fp-56 I found the following pattern for divergences between
      verification logs with and w/o this patch:
      
      - At some point insn 1862 is reached and checkpoint is created;
      - At some other point insn 1862 is reached again:
        - with this patch:
          - the current state is considered *not* equivalent to the old checkpoint;
          - the reason for mismatch is the state of fp-56:
            - current state: fp-56=????mmmm
            - checkpoint: fp-56_rD=mmmmmmmm
        - without this patch the current state is considered equivalent to the
          checkpoint, the fp-56 is not present in the checkpoint.
      
      Here is a fragment of the verification log for when the checkpoint in
      question created at insn 1862:
      
      checkpoint 1862:  ... fp-56=mmmmmmmm ...
      1862: ...
      1863: ...
      1864: (61) r1 = *(u32 *)(r0 +0)
      1865: ...
      1866: (63) *(u32 *)(r10 -56) = r1     ; R1_w=scalar(...) R10=fp0 fp-56=
      1867: (bf) r2 = r10                   ; R2_w=fp0 R10=fp0
      1868: (07) r2 += -56                  ; R2_w=fp-56
      ; return map_lookup_elem(&LB4_BACKEND_MAP_V2, &backend_id);
      1869: (18) r1 = 0xffff888100286000    ; R1_w=map_ptr(off=0,ks=4,vs=8,imm=0)
      1871: (85) call bpf_map_lookup_elem#1
      
      - Without this patch:
        - at insn 1864 r1 liveness is set to REG_LIVE_WRITTEN;
        - at insn 1866 fp-56 liveness is set REG_LIVE_WRITTEN mark because
          of the direct r1 copy in save_register_state();
        - at insn 1871 REG_LIVE_READ is not propagated to fp-56 at
          checkpoint 1862 because of the REG_LIVE_WRITTEN mark;
        - eventually fp-56 is pruned from checkpoint at 1862 in
          clean_func_state().
      - With this patch:
        - at insn 1864 r1 liveness is set to REG_LIVE_WRITTEN;
        - at insn 1866 fp-56 liveness is *not* set to REG_LIVE_WRITTEN mark
          because write size is not equal to BPF_REG_SIZE;
        - at insn 1871 REG_LIVE_READ is propagated to fp-56 at checkpoint 1862.
      
      Hence more states have to be visited by verifier with this patch compared
      to current master.
      
      Similar patterns could be found for both fp-72 and fp-80, although these
      are harder to track trough the log because of a big number of insns between
      slot write and bpf_map_lookup_elem() call triggering read mark, boils down
      to the following C code:
      
      	struct ipv4_frag_id frag_id = {
      		.daddr = ip4->daddr,
      		.saddr = ip4->saddr,
      		.id = ip4->id,
      		.proto = ip4->protocol,
      		.pad = 0,
      	};
          ...
          map_lookup_elem(..., &frag_id);
      
      Where:
      - .id is mapped to fp-72, write of size u16;
      - .saddr is mapped to fp-80, write of size u32.
      
      This patch-set is a continuation of discussion from [2].
      
      Changes v1 -> v2 (no changes in the code itself):
      - added analysis for the tail_lb_ipv4 verification slowdown;
      - rebase against fresh master branch.
      
      [1] git@github.com:anakryiko/cilium.git
      [2] https://lore.kernel.org/bpf/517af2c57ee4b9ce2d96a8cf33f7295f2d2dfe13.camel@gmail.com/
      ====================
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2eecf81e
    • Eduard Zingerman's avatar
      selftests/bpf: Verify copy_register_state() preserves parent/live fields · b9fa9bc8
      Eduard Zingerman authored
      A testcase to check that verifier.c:copy_register_state() preserves
      register parentage chain and livness information.
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/r/20230106142214.1040390-3-eddyz87@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b9fa9bc8
    • Eduard Zingerman's avatar
      bpf: Fix to preserve reg parent/live fields when copying range info · 71f656a5
      Eduard Zingerman authored
      Register range information is copied in several places. The intent is
      to transfer range/id information from one register/stack spill to
      another. Currently this is done using direct register assignment, e.g.:
      
      static void find_equal_scalars(..., struct bpf_reg_state *known_reg)
      {
      	...
      	struct bpf_reg_state *reg;
      	...
      			*reg = *known_reg;
      	...
      }
      
      However, such assignments also copy the following bpf_reg_state fields:
      
      struct bpf_reg_state {
      	...
      	struct bpf_reg_state *parent;
      	...
      	enum bpf_reg_liveness live;
      	...
      };
      
      Copying of these fields is accidental and incorrect, as could be
      demonstrated by the following example:
      
           0: call ktime_get_ns()
           1: r6 = r0
           2: call ktime_get_ns()
           3: r7 = r0
           4: if r0 > r6 goto +1             ; r0 & r6 are unbound thus generated
                                             ; branch states are identical
           5: *(u64 *)(r10 - 8) = 0xdeadbeef ; 64-bit write to fp[-8]
          --- checkpoint ---
           6: r1 = 42                        ; r1 marked as written
           7: *(u8 *)(r10 - 8) = r1          ; 8-bit write, fp[-8] parent & live
                                             ; overwritten
           8: r2 = *(u64 *)(r10 - 8)
           9: r0 = 0
          10: exit
      
      This example is unsafe because 64-bit write to fp[-8] at (5) is
      conditional, thus not all bytes of fp[-8] are guaranteed to be set
      when it is read at (8). However, currently the example passes
      verification.
      
      First, the execution path 1-10 is examined by verifier.
      Suppose that a new checkpoint is created by is_state_visited() at (6).
      After checkpoint creation:
      - r1.parent points to checkpoint.r1,
      - fp[-8].parent points to checkpoint.fp[-8].
      At (6) the r1.live is set to REG_LIVE_WRITTEN.
      At (7) the fp[-8].parent is set to r1.parent and fp[-8].live is set to
      REG_LIVE_WRITTEN, because of the following code called in
      check_stack_write_fixed_off():
      
      static void save_register_state(struct bpf_func_state *state,
      				int spi, struct bpf_reg_state *reg,
      				int size)
      {
      	...
      	state->stack[spi].spilled_ptr = *reg;  // <--- parent & live copied
      	if (size == BPF_REG_SIZE)
      		state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN;
      	...
      }
      
      Note the intent to mark stack spill as written only if 8 bytes are
      spilled to a slot, however this intent is spoiled by a 'live' field copy.
      At (8) the checkpoint.fp[-8] should be marked as REG_LIVE_READ but
      this does not happen:
      - fp[-8] in a current state is already marked as REG_LIVE_WRITTEN;
      - fp[-8].parent points to checkpoint.r1, parentage chain is used by
        mark_reg_read() to mark checkpoint states.
      At (10) the verification is finished for path 1-10 and jump 4-6 is
      examined. The checkpoint.fp[-8] never gets REG_LIVE_READ mark and this
      spill is pruned from the cached states by clean_live_states(). Hence
      verifier state obtained via path 1-4,6 is deemed identical to one
      obtained via path 1-6 and program marked as safe.
      
      Note: the example should be executed with BPF_F_TEST_STATE_FREQ flag
      set to force creation of intermediate verifier states.
      
      This commit revisits the locations where bpf_reg_state instances are
      copied and replaces the direct copies with a call to a function
      copy_register_state(dst, src) that preserves 'parent' and 'live'
      fields of the 'dst'.
      
      Fixes: 679c782d ("bpf/verifier: per-register parent pointers")
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/r/20230106142214.1040390-2-eddyz87@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      71f656a5
    • Yonghong Song's avatar
      bpf: Fix a possible task gone issue with bpf_send_signal[_thread]() helpers · bdb7fdb0
      Yonghong Song authored
      In current bpf_send_signal() and bpf_send_signal_thread() helper
      implementation, irq_work is used to handle nmi context. Hao Sun
      reported in [1] that the current task at the entry of the helper
      might be gone during irq_work callback processing. To fix the issue,
      a reference is acquired for the current task before enqueuing into
      the irq_work so that the queued task is still available during
      irq_work callback processing.
      
        [1] https://lore.kernel.org/bpf/20230109074425.12556-1-sunhao.th@gmail.com/
      
      Fixes: 8b401f9e ("bpf: implement bpf_send_signal() helper")
      Tested-by: default avatarHao Sun <sunhao.th@gmail.com>
      Reported-by: default avatarHao Sun <sunhao.th@gmail.com>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20230118204815.3331855-1-yhs@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      bdb7fdb0
    • Hou Tao's avatar
      bpf: Fix off-by-one error in bpf_mem_cache_idx() · 36024d02
      Hou Tao authored
      According to the definition of sizes[NUM_CACHES], when the size passed
      to bpf_mem_cache_size() is 256, it should return 6 instead 7.
      
      Fixes: 7c8199e2 ("bpf: Introduce any context BPF specific memory allocator.")
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20230118084630.3750680-1-houtao@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      36024d02
  4. 18 Jan, 2023 8 commits
  5. 17 Jan, 2023 13 commits
    • Ying Hsu's avatar
      Bluetooth: Fix possible deadlock in rfcomm_sk_state_change · 1d80d57f
      Ying Hsu authored
      syzbot reports a possible deadlock in rfcomm_sk_state_change [1].
      While rfcomm_sock_connect acquires the sk lock and waits for
      the rfcomm lock, rfcomm_sock_release could have the rfcomm
      lock and hit a deadlock for acquiring the sk lock.
      Here's a simplified flow:
      
      rfcomm_sock_connect:
        lock_sock(sk)
        rfcomm_dlc_open:
          rfcomm_lock()
      
      rfcomm_sock_release:
        rfcomm_sock_shutdown:
          rfcomm_lock()
          __rfcomm_dlc_close:
              rfcomm_k_state_change:
      	  lock_sock(sk)
      
      This patch drops the sk lock before calling rfcomm_dlc_open to
      avoid the possible deadlock and holds sk's reference count to
      prevent use-after-free after rfcomm_dlc_open completes.
      
      Reported-by: syzbot+d7ce59...@syzkaller.appspotmail.com
      Fixes: 1804fdf6 ("Bluetooth: btintel: Combine setting up MSFT extension")
      Link: https://syzkaller.appspot.com/bug?extid=d7ce59b06b3eb14fd218 [1]
      Signed-off-by: default avatarYing Hsu <yinghsu@chromium.org>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      1d80d57f
    • Luiz Augusto von Dentz's avatar
      Bluetooth: ISO: Fix possible circular locking dependency · 506d9b40
      Luiz Augusto von Dentz authored
      This attempts to fix the following trace:
      
      iso-tester/52 is trying to acquire lock:
      ffff8880024e0070 (&hdev->lock){+.+.}-{3:3}, at:
      iso_sock_listen+0x29e/0x440
      
      but task is already holding lock:
      ffff888001978130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at:
      iso_sock_listen+0x8b/0x440
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
             lock_acquire+0x176/0x3d0
             lock_sock_nested+0x32/0x80
             iso_connect_cfm+0x1a3/0x630
             hci_cc_le_setup_iso_path+0x195/0x340
             hci_cmd_complete_evt+0x1ae/0x500
             hci_event_packet+0x38e/0x7c0
             hci_rx_work+0x34c/0x980
             process_one_work+0x5a5/0x9a0
             worker_thread+0x89/0x6f0
             kthread+0x14e/0x180
             ret_from_fork+0x22/0x30
      
      -> #1 (hci_cb_list_lock){+.+.}-{3:3}:
             lock_acquire+0x176/0x3d0
             __mutex_lock+0x13b/0xf50
             hci_le_remote_feat_complete_evt+0x17e/0x320
             hci_event_packet+0x38e/0x7c0
             hci_rx_work+0x34c/0x980
             process_one_work+0x5a5/0x9a0
             worker_thread+0x89/0x6f0
             kthread+0x14e/0x180
             ret_from_fork+0x22/0x30
      
      -> #0 (&hdev->lock){+.+.}-{3:3}:
             check_prev_add+0xfc/0x1190
             __lock_acquire+0x1e27/0x2750
             lock_acquire+0x176/0x3d0
             __mutex_lock+0x13b/0xf50
             iso_sock_listen+0x29e/0x440
             __sys_listen+0xe6/0x160
             __x64_sys_listen+0x25/0x30
             do_syscall_64+0x42/0x90
             entry_SYSCALL_64_after_hwframe+0x62/0xcc
      
      other info that might help us debug this:
      
      Chain exists of:
        &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_ISO
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO);
                                     lock(hci_cb_list_lock);
                                     lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO);
        lock(&hdev->lock);
      
       *** DEADLOCK ***
      
      1 lock held by iso-tester/52:
       #0: ffff888001978130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at:
       iso_sock_listen+0x8b/0x440
      
      Fixes: f764a6c2 ("Bluetooth: ISO: Add broadcast support")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      506d9b40
    • Luiz Augusto von Dentz's avatar
      Bluetooth: hci_event: Fix Invalid wait context · e9d50f76
      Luiz Augusto von Dentz authored
      This fixes the following trace caused by attempting to lock
      cmd_sync_work_lock while holding the rcu_read_lock:
      
      kworker/u3:2/212 is trying to lock:
      ffff888002600910 (&hdev->cmd_sync_work_lock){+.+.}-{3:3}, at:
      hci_cmd_sync_queue+0xad/0x140
      other info that might help us debug this:
      context-{4:4}
      4 locks held by kworker/u3:2/212:
       #0: ffff8880028c6530 ((wq_completion)hci0#2){+.+.}-{0:0}, at:
       process_one_work+0x4dc/0x9a0
       #1: ffff888001aafde0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0},
       at: process_one_work+0x4dc/0x9a0
       #2: ffff888002600070 (&hdev->lock){+.+.}-{3:3}, at:
       hci_cc_le_set_cig_params+0x64/0x4f0
       #3: ffffffffa5994b00 (rcu_read_lock){....}-{1:2}, at:
       hci_cc_le_set_cig_params+0x2f9/0x4f0
      
      Fixes: 26afbd82 ("Bluetooth: Add initial implementation of CIS connections")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      e9d50f76
    • Luiz Augusto von Dentz's avatar
      Bluetooth: ISO: Fix possible circular locking dependency · 6a5ad251
      Luiz Augusto von Dentz authored
      This attempts to fix the following trace:
      
      kworker/u3:1/184 is trying to acquire lock:
      ffff888001888130 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}, at:
      iso_connect_cfm+0x2de/0x690
      
      but task is already holding lock:
      ffff8880028d1c20 (&conn->lock){+.+.}-{2:2}, at:
      iso_connect_cfm+0x265/0x690
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&conn->lock){+.+.}-{2:2}:
             lock_acquire+0x176/0x3d0
             _raw_spin_lock+0x2a/0x40
             __iso_sock_close+0x1dd/0x4f0
             iso_sock_release+0xa0/0x1b0
             sock_close+0x5e/0x120
             __fput+0x102/0x410
             task_work_run+0xf1/0x160
             exit_to_user_mode_prepare+0x170/0x180
             syscall_exit_to_user_mode+0x19/0x50
             do_syscall_64+0x4e/0x90
             entry_SYSCALL_64_after_hwframe+0x62/0xcc
      
      -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
             check_prev_add+0xfc/0x1190
             __lock_acquire+0x1e27/0x2750
             lock_acquire+0x176/0x3d0
             lock_sock_nested+0x32/0x80
             iso_connect_cfm+0x2de/0x690
             hci_cc_le_setup_iso_path+0x195/0x340
             hci_cmd_complete_evt+0x1ae/0x500
             hci_event_packet+0x38e/0x7c0
             hci_rx_work+0x34c/0x980
             process_one_work+0x5a5/0x9a0
             worker_thread+0x89/0x6f0
             kthread+0x14e/0x180
             ret_from_fork+0x22/0x30
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&conn->lock);
                                     lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO);
                                     lock(&conn->lock);
        lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO);
      
       *** DEADLOCK ***
      
      Fixes: ccf74f23 ("Bluetooth: Add BTPROTO_ISO socket type")
      Fixes: f764a6c2 ("Bluetooth: ISO: Add broadcast support")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      6a5ad251
    • Zhengchao Shao's avatar
      Bluetooth: hci_sync: fix memory leak in hci_update_adv_data() · 1ed8b37c
      Zhengchao Shao authored
      When hci_cmd_sync_queue() failed in hci_update_adv_data(), inst_ptr is
      not freed, which will cause memory leak, convert to use ERR_PTR/PTR_ERR
      to pass the instance to callback so no memory needs to be allocated.
      
      Fixes: 651cd3d6 ("Bluetooth: convert hci_update_adv_data to hci_sync")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      1ed8b37c
    • Krzysztof Kozlowski's avatar
      Bluetooth: hci_qca: Fix driver shutdown on closed serdev · 272970be
      Krzysztof Kozlowski authored
      The driver shutdown callback (which sends EDL_SOC_RESET to the device
      over serdev) should not be invoked when HCI device is not open (e.g. if
      hci_dev_open_sync() failed), because the serdev and its TTY are not open
      either.  Also skip this step if device is powered off
      (qca_power_shutdown()).
      
      The shutdown callback causes use-after-free during system reboot with
      Qualcomm Atheros Bluetooth:
      
        Unable to handle kernel paging request at virtual address
        0072662f67726fd7
        ...
        CPU: 6 PID: 1 Comm: systemd-shutdow Tainted: G        W
        6.1.0-rt5-00325-g8a5f56bcfcca #8
        Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
        Call trace:
         tty_driver_flush_buffer+0x4/0x30
         serdev_device_write_flush+0x24/0x34
         qca_serdev_shutdown+0x80/0x130 [hci_uart]
         device_shutdown+0x15c/0x260
         kernel_restart+0x48/0xac
      
      KASAN report:
      
        BUG: KASAN: use-after-free in tty_driver_flush_buffer+0x1c/0x50
        Read of size 8 at addr ffff16270c2e0018 by task systemd-shutdow/1
      
        CPU: 7 PID: 1 Comm: systemd-shutdow Not tainted
        6.1.0-next-20221220-00014-gb85aaf97fb01-dirty #28
        Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
        Call trace:
         dump_backtrace.part.0+0xdc/0xf0
         show_stack+0x18/0x30
         dump_stack_lvl+0x68/0x84
         print_report+0x188/0x488
         kasan_report+0xa4/0xf0
         __asan_load8+0x80/0xac
         tty_driver_flush_buffer+0x1c/0x50
         ttyport_write_flush+0x34/0x44
         serdev_device_write_flush+0x48/0x60
         qca_serdev_shutdown+0x124/0x274
         device_shutdown+0x1e8/0x350
         kernel_restart+0x48/0xb0
         __do_sys_reboot+0x244/0x2d0
         __arm64_sys_reboot+0x54/0x70
         invoke_syscall+0x60/0x190
         el0_svc_common.constprop.0+0x7c/0x160
         do_el0_svc+0x44/0xf0
         el0_svc+0x2c/0x6c
         el0t_64_sync_handler+0xbc/0x140
         el0t_64_sync+0x190/0x194
      
      Fixes: 7e7bbddd ("Bluetooth: hci_qca: Fix qca6390 enable failure after warm reboot")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      272970be
    • Zhengchao Shao's avatar
      Bluetooth: hci_conn: Fix memory leaks · 3aa21311
      Zhengchao Shao authored
      When hci_cmd_sync_queue() failed in hci_le_terminate_big() or
      hci_le_big_terminate(), the memory pointed by variable d is not freed,
      which will cause memory leak. Add release process to error path.
      
      Fixes: eca0ae4a ("Bluetooth: Add initial implementation of BIS connections")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      3aa21311
    • Luiz Augusto von Dentz's avatar
      Bluetooth: hci_sync: Fix use HCI_OP_LE_READ_BUFFER_SIZE_V2 · 3a4d29b6
      Luiz Augusto von Dentz authored
      Don't try to use HCI_OP_LE_READ_BUFFER_SIZE_V2 if controller don't
      support ISO channels, but in order to check if ISO channels are
      supported HCI_OP_LE_READ_LOCAL_FEATURES needs to be done earlier so the
      features bits can be checked on hci_le_read_buffer_size_sync.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216817
      Fixes: c1631dbc ("Bluetooth: hci_sync: Fix hci_read_buffer_size_sync")
      Cc: stable@vger.kernel.org # 6.1
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      3a4d29b6
    • Harshit Mogalapalli's avatar
      Bluetooth: Fix a buffer overflow in mgmt_mesh_add() · 2185e0fd
      Harshit Mogalapalli authored
      Smatch Warning:
      net/bluetooth/mgmt_util.c:375 mgmt_mesh_add() error: __memcpy()
      'mesh_tx->param' too small (48 vs 50)
      
      Analysis:
      
      'mesh_tx->param' is array of size 48. This is the destination.
      u8 param[sizeof(struct mgmt_cp_mesh_send) + 29]; // 19 + 29 = 48.
      
      But in the caller 'mesh_send' we reject only when len > 50.
      len > (MGMT_MESH_SEND_SIZE + 31) // 19 + 31 = 50.
      
      Fixes: b338d917 ("Bluetooth: Implement support for Mesh")
      Signed-off-by: default avatarHarshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
      Signed-off-by: default avatarBrian Gix <brian.gix@intel.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      2185e0fd
    • Florian Westphal's avatar
      netfilter: conntrack: handle tcp challenge acks during connection reuse · c410cb97
      Florian Westphal authored
      When a connection is re-used, following can happen:
      [ connection starts to close, fin sent in either direction ]
       > syn   # initator quickly reuses connection
       < ack   # peer sends a challenge ack
       > rst   # rst, sequence number == ack_seq of previous challenge ack
       > syn   # this syn is expected to pass
      
      Problem is that the rst will fail window validation, so it gets
      tagged as invalid.
      
      If ruleset drops such packets, we get repeated syn-retransmits until
      initator gives up or peer starts responding with syn/ack.
      
      Before the commit indicated in the "Fixes" tag below this used to work:
      
      The challenge-ack made conntrack re-init state based on the challenge
      ack itself, so the following rst would pass window validation.
      
      Add challenge-ack support: If we get ack for syn, record the ack_seq,
      and then check if the rst sequence number matches the last ack number
      seen in reverse direction.
      
      Fixes: c7aab4f1 ("netfilter: nf_conntrack_tcp: re-init for syn packets only")
      Reported-by: default avatarMichal Tesar <mtesar@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c410cb97
    • Heiner Kallweit's avatar
      net: stmmac: fix invalid call to mdiobus_get_phy() · 1f3bd64a
      Heiner Kallweit authored
      In a number of cases the driver assigns a default value of -1 to
      priv->plat->phy_addr. This may result in calling mdiobus_get_phy()
      with addr parameter being -1. Therefore check for this scenario and
      bail out before calling mdiobus_get_phy().
      
      Fixes: 42e87024 ("net: stmmac: Fix case when PHY handle is not present")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/669f9671-ecd1-a41b-2727-7b73e3003985@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1f3bd64a
    • Heiner Kallweit's avatar
      net: mdio: validate parameter addr in mdiobus_get_phy() · 867dbe78
      Heiner Kallweit authored
      The caller may pass any value as addr, what may result in an out-of-bounds
      access to array mdio_map. One existing case is stmmac_init_phy() that
      may pass -1 as addr. Therefore validate addr before using it.
      
      Fixes: 7f854420 ("phy: Add API for {un}registering an mdio device to a bus.")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/cdf664ea-3312-e915-73f8-021678d08887@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      867dbe78
    • Szymon Heidrich's avatar
      net: usb: sr9700: Handle negative len · ecf7cf8e
      Szymon Heidrich authored
      Packet len computed as difference of length word extracted from
      skb data and four may result in a negative value. In such case
      processing of the buffer should be interrupted rather than
      setting sr_skb->len to an unexpectedly large value (due to cast
      from signed to unsigned integer) and passing sr_skb to
      usbnet_skb_return.
      
      Fixes: e9da0b56 ("sr9700: sanity check for packet length")
      Signed-off-by: default avatarSzymon Heidrich <szymon.heidrich@gmail.com>
      Link: https://lore.kernel.org/r/20230114182326.30479-1-szymon.heidrich@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ecf7cf8e
  6. 16 Jan, 2023 7 commits
    • Geetha sowjanya's avatar
      octeontx2-pf: Avoid use of GFP_KERNEL in atomic context · 87b93b67
      Geetha sowjanya authored
      Using GFP_KERNEL in preemption disable context, causing below warning
      when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.
      
      [   32.542271] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274
      [   32.550883] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
      [   32.558707] preempt_count: 1, expected: 0
      [   32.562710] RCU nest depth: 0, expected: 0
      [   32.566800] CPU: 3 PID: 1 Comm: swapper/0 Tainted: G        W          6.2.0-rc2-00269-gae9dcb91 #7
      [   32.576188] Hardware name: Marvell CN106XX board (DT)
      [   32.581232] Call trace:
      [   32.583670]  dump_backtrace.part.0+0xe0/0xf0
      [   32.587937]  show_stack+0x18/0x30
      [   32.591245]  dump_stack_lvl+0x68/0x84
      [   32.594900]  dump_stack+0x18/0x34
      [   32.598206]  __might_resched+0x12c/0x160
      [   32.602122]  __might_sleep+0x48/0xa0
      [   32.605689]  __kmem_cache_alloc_node+0x2b8/0x2e0
      [   32.610301]  __kmalloc+0x58/0x190
      [   32.613610]  otx2_sq_aura_pool_init+0x1a8/0x314
      [   32.618134]  otx2_open+0x1d4/0x9d0
      
      To avoid use of GFP_ATOMIC for memory allocation, disable preemption
      after all memory allocation is done.
      
      Fixes: 4af1b64f ("octeontx2-pf: Fix lmtst ID used in aura free")
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Signed-off-by: default avatarSunil Kovvuri Goutham <sgoutham@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87b93b67
    • David S. Miller's avatar
      Merge branch 'l2tp-races' · 4101971a
      David S. Miller authored
      Cong Wang says:
      
      ====================
      l2tp: fix race conditions in l2tp_tunnel_register()
      
      This patchset contains two patches, the first one is a preparation for
      the second one which is the actual fix. Please find more details in
      each patch description.
      
      I have ran the l2tp test (https://github.com/katalix/l2tp-ktest),
      all test cases are passed.
      
      v3: preserve EEXIST errno for user-space
      v2: move IDR allocation to l2tp_tunnel_register()
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4101971a
    • Cong Wang's avatar
      l2tp: close all race conditions in l2tp_tunnel_register() · 0b2c5972
      Cong Wang authored
      The code in l2tp_tunnel_register() is racy in several ways:
      
      1. It modifies the tunnel socket _after_ publishing it.
      
      2. It calls setup_udp_tunnel_sock() on an existing socket without
         locking.
      
      3. It changes sock lock class on fly, which triggers many syzbot
         reports.
      
      This patch amends all of them by moving socket initialization code
      before publishing and under sock lock. As suggested by Jakub, the
      l2tp lockdep class is not necessary as we can just switch to
      bh_lock_sock_nested().
      
      Fixes: 37159ef2 ("l2tp: fix a lockdep splat")
      Fixes: 6b9f3423 ("l2tp: fix races in tunnel creation")
      Reported-by: syzbot+52866e24647f9a23403f@syzkaller.appspotmail.com
      Reported-by: syzbot+94cc2a66fc228b23f360@syzkaller.appspotmail.com
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Guillaume Nault <gnault@redhat.com>
      Cc: Jakub Sitnicki <jakub@cloudflare.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Tom Parkin <tparkin@katalix.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b2c5972
    • Cong Wang's avatar
      l2tp: convert l2tp_tunnel_list to idr · c4d48a58
      Cong Wang authored
      l2tp uses l2tp_tunnel_list to track all registered tunnels and
      to allocate tunnel ID's. IDR can do the same job.
      
      More importantly, with IDR we can hold the ID before a successful
      registration so that we don't need to worry about late error
      handling, it is not easy to rollback socket changes.
      
      This is a preparation for the following fix.
      
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Guillaume Nault <gnault@redhat.com>
      Cc: Jakub Sitnicki <jakub@cloudflare.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Tom Parkin <tparkin@katalix.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4d48a58
    • Eric Dumazet's avatar
      net/sched: sch_taprio: fix possible use-after-free · 3a415d59
      Eric Dumazet authored
      syzbot reported a nasty crash [1] in net_tx_action() which
      made little sense until we got a repro.
      
      This repro installs a taprio qdisc, but providing an
      invalid TCA_RATE attribute.
      
      qdisc_create() has to destroy the just initialized
      taprio qdisc, and taprio_destroy() is called.
      
      However, the hrtimer used by taprio had already fired,
      therefore advance_sched() called __netif_schedule().
      
      Then net_tx_action was trying to use a destroyed qdisc.
      
      We can not undo the __netif_schedule(), so we must wait
      until one cpu serviced the qdisc before we can proceed.
      
      Many thanks to Alexander Potapenko for his help.
      
      [1]
      BUG: KMSAN: uninit-value in queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline]
      BUG: KMSAN: uninit-value in do_raw_spin_trylock include/linux/spinlock.h:191 [inline]
      BUG: KMSAN: uninit-value in __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
      BUG: KMSAN: uninit-value in _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138
       queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline]
       do_raw_spin_trylock include/linux/spinlock.h:191 [inline]
       __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
       _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138
       spin_trylock include/linux/spinlock.h:359 [inline]
       qdisc_run_begin include/net/sch_generic.h:187 [inline]
       qdisc_run+0xee/0x540 include/net/pkt_sched.h:125
       net_tx_action+0x77c/0x9a0 net/core/dev.c:5086
       __do_softirq+0x1cc/0x7fb kernel/softirq.c:571
       run_ksoftirqd+0x2c/0x50 kernel/softirq.c:934
       smpboot_thread_fn+0x554/0x9f0 kernel/smpboot.c:164
       kthread+0x31b/0x430 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30
      
      Uninit was created at:
       slab_post_alloc_hook mm/slab.h:732 [inline]
       slab_alloc_node mm/slub.c:3258 [inline]
       __kmalloc_node_track_caller+0x814/0x1250 mm/slub.c:4970
       kmalloc_reserve net/core/skbuff.c:358 [inline]
       __alloc_skb+0x346/0xcf0 net/core/skbuff.c:430
       alloc_skb include/linux/skbuff.h:1257 [inline]
       nlmsg_new include/net/netlink.h:953 [inline]
       netlink_ack+0x5f3/0x12b0 net/netlink/af_netlink.c:2436
       netlink_rcv_skb+0x55d/0x6c0 net/netlink/af_netlink.c:2507
       rtnetlink_rcv+0x30/0x40 net/core/rtnetlink.c:6108
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0xf3b/0x1270 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x1288/0x1440 net/netlink/af_netlink.c:1921
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg net/socket.c:734 [inline]
       ____sys_sendmsg+0xabc/0xe90 net/socket.c:2482
       ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2536
       __sys_sendmsg net/socket.c:2565 [inline]
       __do_sys_sendmsg net/socket.c:2574 [inline]
       __se_sys_sendmsg net/socket.c:2572 [inline]
       __x64_sys_sendmsg+0x367/0x540 net/socket.c:2572
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 6.0.0-rc2-syzkaller-47461-gac3859c02d7f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
      
      Fixes: 5a781ccb ("tc: Add support for configuring the taprio scheduler")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a415d59
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 21705c77
      David S. Miller authored
      Pable Neira Ayuso says:
      
      ====================
      
      The following patchset contains Netfilter fixes for net:
      
      1) Increase timeout to 120 seconds for netfilter selftests to fix
         nftables transaction tests, from Florian Westphal.
      
      2) Fix overflow in bitmap_ip_create() due to integer arithmetics
         in a 64-bit bitmask, from Gavrilov Ilia.
      
      3) Fix incorrect arithmetics in nft_payload with double-tagged
         vlan matching.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21705c77
    • Kurt Kanzenbach's avatar
      net: stmmac: Fix queue statistics reading · c296c77e
      Kurt Kanzenbach authored
      Correct queue statistics reading. All queue statistics are stored as unsigned
      long values. The retrieval for ethtool fetches these values as u64. However, on
      some systems the size of the counters are 32 bit. That yields wrong queue
      statistic counters e.g., on arm32 systems such as the stm32mp157. Fix it by
      using the correct data type.
      
      Tested on Olimex STMP157-OLinuXino-LIME2 by simple running linuxptp for a short
      period of time:
      
      Non-patched kernel:
      |root@st1:~# ethtool -S eth0 | grep q0
      |     q0_tx_pkt_n: 3775276254951 # ???
      |     q0_tx_irq_n: 879
      |     q0_rx_pkt_n: 1194000908909 # ???
      |     q0_rx_irq_n: 278
      
      Patched kernel:
      |root@st1:~# ethtool -S eth0 | grep q0
      |     q0_tx_pkt_n: 2434
      |     q0_tx_irq_n: 1274
      |     q0_rx_pkt_n: 1604
      |     q0_rx_irq_n: 846
      
      Fixes: 68e9c5de ("net: stmmac: add ethtool per-queue statistic framework")
      Signed-off-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Cc: Vijayakannan Ayyathurai <vijayakannan.ayyathurai@intel.com>
      Cc: Wong Vee Khee <vee.khee.wong@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c296c77e
  7. 14 Jan, 2023 5 commits
    • Rahul Rameshbabu's avatar
      sch_htb: Avoid grafting on htb_destroy_class_offload when destroying htb · a22b7388
      Rahul Rameshbabu authored
      Peek at old qdisc and graft only when deleting a leaf class in the htb,
      rather than when deleting the htb itself. Do not peek at the qdisc of the
      netdev queue when destroying the htb. The caller may already have grafted a
      new qdisc that is not part of the htb structure being destroyed.
      
      This fix resolves two use cases.
      
        1. Using tc to destroy the htb.
          - Netdev was being prematurely activated before the htb was fully
            destroyed.
        2. Using tc to replace the htb with another qdisc (which also leads to
           the htb being destroyed).
          - Premature netdev activation like previous case. Newly grafted qdisc
            was also getting accidentally overwritten when destroying the htb.
      
      Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maxtram95@gmail.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20230113005528.302625-1-rrameshbabu@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a22b7388
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-userspace-pm-create-sockets-for-the-right-family' · da263fcb
      Jakub Kicinski authored
      Matthieu Baerts says:
      
      ====================
      mptcp: userspace pm: create sockets for the right family
      
      Before these patches, the Userspace Path Manager would allow the
      creation of subflows with wrong families: taking the one of the MPTCP
      socket instead of the provided ones and resulting in the creation of
      subflows with likely not the right source and/or destination IPs. It
      would also allow the creation of subflows between different families or
      not respecting v4/v6-only socket attributes.
      
      Patch 1 lets the userspace PM select the proper family to avoid creating
      subflows with the wrong source and/or destination addresses because the
      family is not the expected one.
      
      Patch 2 makes sure the userspace PM doesn't allow the userspace to
      create subflows for a family that is not allowed.
      
      Patch 3 validates scenarios with a mix of v4 and v6 subflows for the
      same MPTCP connection.
      
      These patches fix issues introduced in v5.19 when the userspace path
      manager has been introduced.
      ====================
      
      Link: https://lore.kernel.org/r/20230112-upstream-net-20230112-netlink-v4-v6-v1-0-6a8363a221d2@tessares.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      da263fcb
    • Matthieu Baerts's avatar
      selftests: mptcp: userspace: validate v4-v6 subflows mix · 4656d72c
      Matthieu Baerts authored
      MPTCP protocol supports having subflows in both IPv4 and IPv6. In Linux,
      it is possible to have that if the MPTCP socket has been created with
      AF_INET6 family without the IPV6_V6ONLY option.
      
      Here, a new IPv4 subflow is being added to the initial IPv6 connection,
      then being removed using Netlink commands.
      
      Cc: stable@vger.kernel.org # v5.19+
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4656d72c
    • Matthieu Baerts's avatar
      mptcp: netlink: respect v4/v6-only sockets · fb00ee4f
      Matthieu Baerts authored
      If an MPTCP socket has been created with AF_INET6 and the IPV6_V6ONLY
      option has been set, the userspace PM would allow creating subflows
      using IPv4 addresses, e.g. mapped in v6.
      
      The kernel side of userspace PM will also accept creating subflows with
      local and remote addresses having different families. Depending on the
      subflow socket's family, different behaviours are expected:
       - If AF_INET is forced with a v6 address, the kernel will take the last
         byte of the IP and try to connect to that: a new subflow is created
         but to a non expected address.
       - If AF_INET6 is forced with a v4 address, the kernel will try to
         connect to a v4 address (v4-mapped-v6). A -EBADF error from the
         connect() part is then expected.
      
      It is then required to check the given families can be accepted. This is
      done by using a new helper for addresses family matching, taking care of
      IPv4 vs IPv4-mapped-IPv6 addresses. This helper will be re-used later by
      the in-kernel path-manager to use mixed IPv4 and IPv6 addresses.
      
      While at it, a clear error message is now reported if there are some
      conflicts with the families that have been passed by the userspace.
      
      Fixes: 702c2f64 ("mptcp: netlink: allow userspace-driven subflow establishment")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fb00ee4f
    • Paolo Abeni's avatar
      mptcp: explicitly specify sock family at subflow creation time · 6bc1fe7d
      Paolo Abeni authored
      Let the caller specify the to-be-created subflow family.
      
      For a given MPTCP socket created with the AF_INET6 family, the current
      userspace PM can already ask the kernel to create subflows in v4 and v6.
      If "plain" IPv4 addresses are passed to the kernel, they are
      automatically mapped in v6 addresses "by accident". This can be
      problematic because the userspace will need to pass different addresses,
      now the v4-mapped-v6 addresses to destroy this new subflow.
      
      On the other hand, if the MPTCP socket has been created with the AF_INET
      family, the command to create a subflow in v6 will be accepted but the
      result will not be the one as expected as new subflow will be created in
      IPv4 using part of the v6 addresses passed to the kernel: not creating
      the expected subflow then.
      
      No functional change intended for the in-kernel PM where an explicit
      enforcement is currently in place. This arbitrary enforcement will be
      leveraged by other patches in a future version.
      
      Fixes: 702c2f64 ("mptcp: netlink: allow userspace-driven subflow establishment")
      Cc: stable@vger.kernel.org
      Co-developed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6bc1fe7d