1. 06 Jul, 2021 9 commits
    • Taehee Yoo's avatar
      bonding: Add struct bond_ipesc to manage SA · 9a560550
      Taehee Yoo authored
      bonding has been supporting ipsec offload.
      When SA is added, bonding just passes SA to its own active real interface.
      But it doesn't manage SA.
      So, when events(add/del real interface, active real interface change, etc)
      occur, bonding can't handle that well because It doesn't manage SA.
      So some problems(panic, UAF, refcnt leak)occur.
      
      In order to make it stable, it should manage SA.
      That's the reason why struct bond_ipsec is added.
      When a new SA is added to bonding interface, it is stored in the
      bond_ipsec list. And the SA is passed to a current active real interface.
      If events occur, it uses bond_ipsec data to handle these events.
      bond->ipsec_list is protected by bond->ipsec_lock.
      
      If a current active real interface is changed, the following logic works.
      1. delete all SAs from old active real interface
      2. Add all SAs to the new active real interface.
      3. If a new active real interface doesn't support ipsec offload or SA's
      option, it sets real_dev to NULL.
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a560550
    • Taehee Yoo's avatar
      bonding: disallow setting nested bonding + ipsec offload · b1216933
      Taehee Yoo authored
      bonding interface can be nested and it supports ipsec offload.
      So, it allows setting the nested bonding + ipsec scenario.
      But code does not support this scenario.
      So, it should be disallowed.
      
      interface graph:
      bond2
         |
      bond1
         |
      eth0
      
      The nested bonding + ipsec offload may not a real usecase.
      So, disallowing this scenario is fine.
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1216933
    • Taehee Yoo's avatar
      bonding: fix suspicious RCU usage in bond_ipsec_del_sa() · a22c39b8
      Taehee Yoo authored
      To dereference bond->curr_active_slave, it uses rcu_dereference().
      But it and the caller doesn't acquire RCU so a warning occurs.
      So add rcu_read_lock().
      
      Test commands:
          ip netns add A
          ip netns exec A bash
          modprobe netdevsim
          echo "1 1" > /sys/bus/netdevsim/new_device
          ip link add bond0 type bond
          ip link set eth0 master bond0
          ip link set eth0 up
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \
      transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
      dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
          ip x s f
      
      Splat looks like:
      =============================
      WARNING: suspicious RCU usage
      5.13.0-rc3+ #1168 Not tainted
      -----------------------------
      drivers/net/bonding/bond_main.c:448 suspicious rcu_dereference_check()
      usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by ip/705:
       #0: ffff888106701780 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3},
      at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user]
       #1: ffff8880075b0098 (&x->lock){+.-.}-{2:2},
      at: xfrm_state_delete+0x16/0x30
      
      stack backtrace:
      CPU: 6 PID: 705 Comm: ip Not tainted 5.13.0-rc3+ #1168
      Call Trace:
       dump_stack+0xa4/0xe5
       bond_ipsec_del_sa+0x16a/0x1c0 [bonding]
       __xfrm_state_delete+0x51f/0x730
       xfrm_state_delete+0x1e/0x30
       xfrm_state_flush+0x22f/0x390
       xfrm_flush_sa+0xd8/0x260 [xfrm_user]
       ? xfrm_flush_policy+0x290/0x290 [xfrm_user]
       xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? find_held_lock+0x3a/0x1c0
       ? mutex_lock_io_nested+0x1210/0x1210
       ? sched_clock_cpu+0x18/0x170
       netlink_rcv_skb+0x121/0x350
      [ ... ]
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a22c39b8
    • Taehee Yoo's avatar
      ixgbevf: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops · 2de7e4f6
      Taehee Yoo authored
      There are two pointers in struct xfrm_state_offload, *dev, *real_dev.
      These are used in callback functions of struct xfrmdev_ops.
      The *dev points whether bonding interface or real interface.
      If bonding ipsec offload is used, it points bonding interface If not,
      it points real interface.
      And real_dev always points real interface.
      So, ixgbevf should always use real_dev instead of dev.
      Of course, real_dev always not be null.
      
      Test commands:
          ip link add bond0 type bond
          #eth0 is ixgbevf interface
          ip link set eth0 master bond0
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \
      transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
      dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
      
      Splat looks like:
      KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      CPU: 6 PID: 688 Comm: ip Not tainted 5.13.0-rc3+ #1168
      RIP: 0010:ixgbevf_ipsec_find_empty_idx+0x28/0x1b0 [ixgbevf]
      Code: 00 00 0f 1f 44 00 00 55 53 48 89 fb 48 83 ec 08 40 84 f6 0f 84 9c
      00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02
      84 c0 74 08 3c 01 0f 8e 4c 01 00 00 66 81 3b 00 04 0f
      RSP: 0018:ffff8880089af390 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
      RBP: ffff8880089af4f8 R08: 0000000000000003 R09: fffffbfff4287e11
      R10: 0000000000000001 R11: ffff888005de8908 R12: 0000000000000000
      R13: ffff88810936a000 R14: ffff88810936a000 R15: ffff888004d78040
      FS:  00007fdf9883a680(0000) GS:ffff88811a400000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055bc14adbf40 CR3: 000000000b87c005 CR4: 00000000003706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       ixgbevf_ipsec_add_sa+0x1bf/0x9c0 [ixgbevf]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? ixgbevf_ipsec_parse_proto_keys.isra.9+0x280/0x280 [ixgbevf]
       ? lock_acquire+0x191/0x720
       ? bond_ipsec_add_sa+0x48/0x350 [bonding]
       ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
       ? rcu_read_lock_held+0x91/0xa0
       ? rcu_read_lock_sched_held+0xc0/0xc0
       bond_ipsec_add_sa+0x193/0x350 [bonding]
       xfrm_dev_state_add+0x2a9/0x770
       ? memcpy+0x38/0x60
       xfrm_add_sa+0x2278/0x3b10 [xfrm_user]
       ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user]
       ? register_lock_class+0x1750/0x1750
       xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? find_held_lock+0x3a/0x1c0
       ? mutex_lock_io_nested+0x1210/0x1210
       ? sched_clock_cpu+0x18/0x170
       netlink_rcv_skb+0x121/0x350
      [ ... ]
      
      Fixes: 272c2330 ("xfrm: bail early on slave pass over skb")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2de7e4f6
    • Taehee Yoo's avatar
      net: netdevsim: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops · 09adf756
      Taehee Yoo authored
      There are two pointers in struct xfrm_state_offload, *dev, *real_dev.
      These are used in callback functions of struct xfrmdev_ops.
      The *dev points whether bonding interface or real interface.
      If bonding ipsec offload is used, it points bonding interface If not,
      it points real interface.
      And real_dev always points real interface.
      So, netdevsim should always use real_dev instead of dev.
      Of course, real_dev always not be null.
      
      Test commands:
          ip netns add A
          ip netns exec A bash
          modprobe netdevsim
          echo "1 1" > /sys/bus/netdevsim/new_device
          ip link add bond0 type bond mode active-backup
          ip link set eth0 master bond0
          ip link set eth0 up
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \
      transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
      dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
      
      Splat looks like:
      BUG: spinlock bad magic on CPU#5, kworker/5:1/53
       lock: 0xffff8881068c2cc8, .magic: 11121314, .owner: <none>/-1,
      .owner_cpu: -235736076
      CPU: 5 PID: 53 Comm: kworker/5:1 Not tainted 5.13.0-rc3+ #1168
      Workqueue: events linkwatch_event
      Call Trace:
       dump_stack+0xa4/0xe5
       do_raw_spin_lock+0x20b/0x270
       ? rwlock_bug.part.1+0x90/0x90
       _raw_spin_lock_nested+0x5f/0x70
       bond_get_stats+0xe4/0x4c0 [bonding]
       ? rcu_read_lock_sched_held+0xc0/0xc0
       ? bond_neigh_init+0x2c0/0x2c0 [bonding]
       ? dev_get_alias+0xe2/0x190
       ? dev_get_port_parent_id+0x14a/0x360
       ? rtnl_unregister+0x190/0x190
       ? dev_get_phys_port_name+0xa0/0xa0
       ? memset+0x1f/0x40
       ? memcpy+0x38/0x60
       ? rtnl_phys_switch_id_fill+0x91/0x100
       dev_get_stats+0x8c/0x270
       rtnl_fill_stats+0x44/0xbe0
       ? nla_put+0xbe/0x140
       rtnl_fill_ifinfo+0x1054/0x3ad0
      [ ... ]
      
      Fixes: 272c2330 ("xfrm: bail early on slave pass over skb")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09adf756
    • Taehee Yoo's avatar
      bonding: fix null dereference in bond_ipsec_add_sa() · 105cd17a
      Taehee Yoo authored
      If bond doesn't have real device, bond->curr_active_slave is null.
      But bond_ipsec_add_sa() dereferences bond->curr_active_slave without
      null checking.
      So, null-ptr-deref would occur.
      
      Test commands:
          ip link add bond0 type bond
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi \
      0x07 mode transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
      dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
      
      Splat looks like:
      KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      CPU: 4 PID: 680 Comm: ip Not tainted 5.13.0-rc3+ #1168
      RIP: 0010:bond_ipsec_add_sa+0xc4/0x2e0 [bonding]
      Code: 85 21 02 00 00 4d 8b a6 48 0c 00 00 e8 75 58 44 ce 85 c0 0f 85 14
      01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02
      00 0f 85 fc 01 00 00 48 8d bb e0 02 00 00 4d 8b 2c 24 48
      RSP: 0018:ffff88810946f508 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: ffff88810b4e8040 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: ffffffff8fe34280 RDI: ffff888115abe100
      RBP: ffff88810946f528 R08: 0000000000000003 R09: fffffbfff2287e11
      R10: 0000000000000001 R11: ffff888115abe0c8 R12: 0000000000000000
      R13: ffffffffc0aea9a0 R14: ffff88800d7d2000 R15: ffff88810b4e8330
      FS:  00007efc5552e680(0000) GS:ffff888119c00000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055c2530dbf40 CR3: 0000000103056004 CR4: 00000000003706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       xfrm_dev_state_add+0x2a9/0x770
       ? memcpy+0x38/0x60
       xfrm_add_sa+0x2278/0x3b10 [xfrm_user]
       ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user]
       ? register_lock_class+0x1750/0x1750
       xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? find_held_lock+0x3a/0x1c0
       ? mutex_lock_io_nested+0x1210/0x1210
       ? sched_clock_cpu+0x18/0x170
       netlink_rcv_skb+0x121/0x350
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? netlink_ack+0x9d0/0x9d0
       ? netlink_deliver_tap+0x17c/0xa50
       xfrm_netlink_rcv+0x68/0x80 [xfrm_user]
       netlink_unicast+0x41c/0x610
       ? netlink_attachskb+0x710/0x710
       netlink_sendmsg+0x6b9/0xb70
      [ ...]
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      105cd17a
    • Taehee Yoo's avatar
      bonding: fix suspicious RCU usage in bond_ipsec_add_sa() · b648eba4
      Taehee Yoo authored
      To dereference bond->curr_active_slave, it uses rcu_dereference().
      But it and the caller doesn't acquire RCU so a warning occurs.
      So add rcu_read_lock().
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add bond0 type bond
          ip link set dummy0 master bond0
          ip link set dummy0 up
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 \
      	    mode transport \
      	    reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      	    0x44434241343332312423222114131211f4f3f2f1 128 sel \
      	    src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp offload \
      	    dev bond0 dir in
      
      Splat looks like:
      =============================
      WARNING: suspicious RCU usage
      5.13.0-rc3+ #1168 Not tainted
      -----------------------------
      drivers/net/bonding/bond_main.c:411 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      1 lock held by ip/684:
       #0: ffffffff9a2757c0 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3},
      at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user]
         55.191733][  T684] stack backtrace:
      CPU: 0 PID: 684 Comm: ip Not tainted 5.13.0-rc3+ #1168
      Call Trace:
       dump_stack+0xa4/0xe5
       bond_ipsec_add_sa+0x18c/0x1f0 [bonding]
       xfrm_dev_state_add+0x2a9/0x770
       ? memcpy+0x38/0x60
       xfrm_add_sa+0x2278/0x3b10 [xfrm_user]
       ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user]
       ? register_lock_class+0x1750/0x1750
       xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? find_held_lock+0x3a/0x1c0
       ? mutex_lock_io_nested+0x1210/0x1210
       ? sched_clock_cpu+0x18/0x170
       netlink_rcv_skb+0x121/0x350
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? netlink_ack+0x9d0/0x9d0
       ? netlink_deliver_tap+0x17c/0xa50
       xfrm_netlink_rcv+0x68/0x80 [xfrm_user]
       netlink_unicast+0x41c/0x610
       ? netlink_attachskb+0x710/0x710
       netlink_sendmsg+0x6b9/0xb70
      [ ... ]
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b648eba4
    • Nguyen Dinh Phi's avatar
      tcp: fix tcp_init_transfer() to not reset icsk_ca_initialized · be5d1b61
      Nguyen Dinh Phi authored
      This commit fixes a bug (found by syzkaller) that could cause spurious
      double-initializations for congestion control modules, which could cause
      memory leaks or other problems for congestion control modules (like CDG)
      that allocate memory in their init functions.
      
      The buggy scenario constructed by syzkaller was something like:
      
      (1) create a TCP socket
      (2) initiate a TFO connect via sendto()
      (3) while socket is in TCP_SYN_SENT, call setsockopt(TCP_CONGESTION),
          which calls:
             tcp_set_congestion_control() ->
               tcp_reinit_congestion_control() ->
                 tcp_init_congestion_control()
      (4) receive ACK, connection is established, call tcp_init_transfer(),
          set icsk_ca_initialized=0 (without first calling cc->release()),
          call tcp_init_congestion_control() again.
      
      Note that in this sequence tcp_init_congestion_control() is called
      twice without a cc->release() call in between. Thus, for CC modules
      that allocate memory in their init() function, e.g, CDG, a memory leak
      may occur. The syzkaller tool managed to find a reproducer that
      triggered such a leak in CDG.
      
      The bug was introduced when that commit 8919a9b3 ("tcp: Only init
      congestion control if not initialized already")
      introduced icsk_ca_initialized and set icsk_ca_initialized to 0 in
      tcp_init_transfer(), missing the possibility for a sequence like the
      one above, where a process could call setsockopt(TCP_CONGESTION) in
      state TCP_SYN_SENT (i.e. after the connect() or TFO open sendmsg()),
      which would call tcp_init_congestion_control(). It did not intend to
      reset any initialization that the user had already explicitly made;
      it just missed the possibility of that particular sequence (which
      syzkaller managed to find).
      
      Fixes: 8919a9b3 ("tcp: Only init congestion control if not initialized already")
      Reported-by: syzbot+f1e24a0594d4e3a895d3@syzkaller.appspotmail.com
      Signed-off-by: default avatarNguyen Dinh Phi <phind.uet@gmail.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Tested-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be5d1b61
    • Paul Blakey's avatar
      skbuff: Release nfct refcount on napi stolen or re-used skbs · 8550ff8d
      Paul Blakey authored
      When multiple SKBs are merged to a new skb under napi GRO,
      or SKB is re-used by napi, if nfct was set for them in the
      driver, it will not be released while freeing their stolen
      head state or on re-use.
      
      Release nfct on napi's stolen or re-used SKBs, and
      in gro_list_prepare, check conntrack metadata diff.
      
      Fixes: 5c6b9460 ("net/mlx5e: CT: Handle misses after executing CT action")
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8550ff8d
  2. 05 Jul, 2021 6 commits
  3. 03 Jul, 2021 1 commit
  4. 02 Jul, 2021 13 commits
  5. 01 Jul, 2021 11 commits