1. 08 Jul, 2021 10 commits
  2. 07 Jul, 2021 6 commits
  3. 06 Jul, 2021 19 commits
    • Nicolas Dichtel's avatar
      ipv6: fix 'disable_policy' for fwd packets · ccd27f05
      Nicolas Dichtel authored
      The goal of commit df789fe7 ("ipv6: Provide ipv6 version of
      "disable_policy" sysctl") was to have the disable_policy from ipv4
      available on ipv6.
      However, it's not exactly the same mechanism. On IPv4, all packets coming
      from an interface, which has disable_policy set, bypass the policy check.
      For ipv6, this is done only for local packets, ie for packets destinated to
      an address configured on the incoming interface.
      
      Let's align ipv6 with ipv4 so that the 'disable_policy' sysctl has the same
      effect for both protocols.
      
      My first approach was to create a new kind of route cache entries, to be
      able to set DST_NOPOLICY without modifying routes. This would have added a
      lot of code. Because the local delivery path is already handled, I choose
      to focus on the forwarding path to minimize code churn.
      
      Fixes: df789fe7 ("ipv6: Provide ipv6 version of "disable_policy" sysctl")
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ccd27f05
    • Colin Ian King's avatar
      octeontx2-pf: Fix assigned error return value that is never used · ad1f3797
      Colin Ian King authored
      Currently when the call to otx2_mbox_alloc_msg_cgx_mac_addr_update fails
      the error return variable rc is being assigned -ENOMEM and does not
      return early. rc is then re-assigned and the error case is not handled
      correctly. Fix this by returning -ENOMEM rather than assigning rc.
      
      Addresses-Coverity: ("Unused value")
      Fixes: 79d2be38 ("octeontx2-pf: offload DMAC filters to CGX/RPM block")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad1f3797
    • David S. Miller's avatar
      Merge branch 'bonding-ipsec' · 5ddef2ad
      David S. Miller authored
      Taehee Yoo says:
      
      ====================
      net: fix bonding ipsec offload problems
      
      This series fixes some problems related to bonding ipsec offload.
      
      The 1, 5, and 8th patches are to add a missing rcu_read_lock().
      The 2nd patch is to add null check code to bond_ipsec_add_sa.
      When bonding interface doesn't have an active real interface, the
      bond->curr_active_slave pointer is null.
      But bond_ipsec_add_sa() uses that pointer without null check.
      So that it results in null-ptr-deref.
      The 3 and 4th patches are to replace xs->xso.dev with xs->xso.real_dev.
      The 6th patch is to disallow to set ipsec offload if a real interface
      type is bonding.
      The 7th patch is to add struct bond_ipsec to manage SA.
      If bond mode is changed, or active real interface is changed, SA should
      be removed from old current active real interface then it should be added
      to new active real interface.
      But it can't, because it doesn't manage SA.
      The 9th patch is to fix incorrect return value of bond_ipsec_offload_ok().
      
      v1 -> v2:
       - Add 9th patch.
       - Do not print warning when there is no SA in bond_ipsec_add_sa_all().
       - Add comment for ipsec_lock.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ddef2ad
    • Taehee Yoo's avatar
      bonding: fix incorrect return value of bond_ipsec_offload_ok() · 168e696a
      Taehee Yoo authored
      bond_ipsec_offload_ok() is called to check whether the interface supports
      ipsec offload or not.
      bonding interface support ipsec offload only in active-backup mode.
      So, if a bond interface is not in active-backup mode, it should return
      false but it returns true.
      
      Fixes: a3b658cf ("bonding: allow xfrm offload setup post-module-load")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      168e696a
    • Taehee Yoo's avatar
      bonding: fix suspicious RCU usage in bond_ipsec_offload_ok() · 955b785e
      Taehee Yoo authored
      To dereference bond->curr_active_slave, it uses rcu_dereference().
      But it and the caller doesn't acquire RCU so a warning occurs.
      So add rcu_read_lock().
      
      Splat looks like:
      WARNING: suspicious RCU usage
      5.13.0-rc6+ #1179 Not tainted
      drivers/net/bonding/bond_main.c:571 suspicious
      rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      1 lock held by ping/974:
       #0: ffff888109e7db70 (sk_lock-AF_INET){+.+.}-{0:0},
      at: raw_sendmsg+0x1303/0x2cb0
      
      stack backtrace:
      CPU: 2 PID: 974 Comm: ping Not tainted 5.13.0-rc6+ #1179
      Call Trace:
       dump_stack+0xa4/0xe5
       bond_ipsec_offload_ok+0x1f4/0x260 [bonding]
       xfrm_output+0x179/0x890
       xfrm4_output+0xfa/0x410
       ? __xfrm4_output+0x4b0/0x4b0
       ? __ip_make_skb+0xecc/0x2030
       ? xfrm4_udp_encap_rcv+0x800/0x800
       ? ip_local_out+0x21/0x3a0
       ip_send_skb+0x37/0xa0
       raw_sendmsg+0x1bfd/0x2cb0
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      955b785e
    • Taehee Yoo's avatar
      bonding: Add struct bond_ipesc to manage SA · 9a560550
      Taehee Yoo authored
      bonding has been supporting ipsec offload.
      When SA is added, bonding just passes SA to its own active real interface.
      But it doesn't manage SA.
      So, when events(add/del real interface, active real interface change, etc)
      occur, bonding can't handle that well because It doesn't manage SA.
      So some problems(panic, UAF, refcnt leak)occur.
      
      In order to make it stable, it should manage SA.
      That's the reason why struct bond_ipsec is added.
      When a new SA is added to bonding interface, it is stored in the
      bond_ipsec list. And the SA is passed to a current active real interface.
      If events occur, it uses bond_ipsec data to handle these events.
      bond->ipsec_list is protected by bond->ipsec_lock.
      
      If a current active real interface is changed, the following logic works.
      1. delete all SAs from old active real interface
      2. Add all SAs to the new active real interface.
      3. If a new active real interface doesn't support ipsec offload or SA's
      option, it sets real_dev to NULL.
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a560550
    • Taehee Yoo's avatar
      bonding: disallow setting nested bonding + ipsec offload · b1216933
      Taehee Yoo authored
      bonding interface can be nested and it supports ipsec offload.
      So, it allows setting the nested bonding + ipsec scenario.
      But code does not support this scenario.
      So, it should be disallowed.
      
      interface graph:
      bond2
         |
      bond1
         |
      eth0
      
      The nested bonding + ipsec offload may not a real usecase.
      So, disallowing this scenario is fine.
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1216933
    • Taehee Yoo's avatar
      bonding: fix suspicious RCU usage in bond_ipsec_del_sa() · a22c39b8
      Taehee Yoo authored
      To dereference bond->curr_active_slave, it uses rcu_dereference().
      But it and the caller doesn't acquire RCU so a warning occurs.
      So add rcu_read_lock().
      
      Test commands:
          ip netns add A
          ip netns exec A bash
          modprobe netdevsim
          echo "1 1" > /sys/bus/netdevsim/new_device
          ip link add bond0 type bond
          ip link set eth0 master bond0
          ip link set eth0 up
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \
      transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
      dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
          ip x s f
      
      Splat looks like:
      =============================
      WARNING: suspicious RCU usage
      5.13.0-rc3+ #1168 Not tainted
      -----------------------------
      drivers/net/bonding/bond_main.c:448 suspicious rcu_dereference_check()
      usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by ip/705:
       #0: ffff888106701780 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3},
      at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user]
       #1: ffff8880075b0098 (&x->lock){+.-.}-{2:2},
      at: xfrm_state_delete+0x16/0x30
      
      stack backtrace:
      CPU: 6 PID: 705 Comm: ip Not tainted 5.13.0-rc3+ #1168
      Call Trace:
       dump_stack+0xa4/0xe5
       bond_ipsec_del_sa+0x16a/0x1c0 [bonding]
       __xfrm_state_delete+0x51f/0x730
       xfrm_state_delete+0x1e/0x30
       xfrm_state_flush+0x22f/0x390
       xfrm_flush_sa+0xd8/0x260 [xfrm_user]
       ? xfrm_flush_policy+0x290/0x290 [xfrm_user]
       xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? find_held_lock+0x3a/0x1c0
       ? mutex_lock_io_nested+0x1210/0x1210
       ? sched_clock_cpu+0x18/0x170
       netlink_rcv_skb+0x121/0x350
      [ ... ]
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a22c39b8
    • Taehee Yoo's avatar
      ixgbevf: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops · 2de7e4f6
      Taehee Yoo authored
      There are two pointers in struct xfrm_state_offload, *dev, *real_dev.
      These are used in callback functions of struct xfrmdev_ops.
      The *dev points whether bonding interface or real interface.
      If bonding ipsec offload is used, it points bonding interface If not,
      it points real interface.
      And real_dev always points real interface.
      So, ixgbevf should always use real_dev instead of dev.
      Of course, real_dev always not be null.
      
      Test commands:
          ip link add bond0 type bond
          #eth0 is ixgbevf interface
          ip link set eth0 master bond0
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \
      transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
      dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
      
      Splat looks like:
      KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      CPU: 6 PID: 688 Comm: ip Not tainted 5.13.0-rc3+ #1168
      RIP: 0010:ixgbevf_ipsec_find_empty_idx+0x28/0x1b0 [ixgbevf]
      Code: 00 00 0f 1f 44 00 00 55 53 48 89 fb 48 83 ec 08 40 84 f6 0f 84 9c
      00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02
      84 c0 74 08 3c 01 0f 8e 4c 01 00 00 66 81 3b 00 04 0f
      RSP: 0018:ffff8880089af390 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
      RBP: ffff8880089af4f8 R08: 0000000000000003 R09: fffffbfff4287e11
      R10: 0000000000000001 R11: ffff888005de8908 R12: 0000000000000000
      R13: ffff88810936a000 R14: ffff88810936a000 R15: ffff888004d78040
      FS:  00007fdf9883a680(0000) GS:ffff88811a400000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055bc14adbf40 CR3: 000000000b87c005 CR4: 00000000003706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       ixgbevf_ipsec_add_sa+0x1bf/0x9c0 [ixgbevf]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? ixgbevf_ipsec_parse_proto_keys.isra.9+0x280/0x280 [ixgbevf]
       ? lock_acquire+0x191/0x720
       ? bond_ipsec_add_sa+0x48/0x350 [bonding]
       ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
       ? rcu_read_lock_held+0x91/0xa0
       ? rcu_read_lock_sched_held+0xc0/0xc0
       bond_ipsec_add_sa+0x193/0x350 [bonding]
       xfrm_dev_state_add+0x2a9/0x770
       ? memcpy+0x38/0x60
       xfrm_add_sa+0x2278/0x3b10 [xfrm_user]
       ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user]
       ? register_lock_class+0x1750/0x1750
       xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? find_held_lock+0x3a/0x1c0
       ? mutex_lock_io_nested+0x1210/0x1210
       ? sched_clock_cpu+0x18/0x170
       netlink_rcv_skb+0x121/0x350
      [ ... ]
      
      Fixes: 272c2330 ("xfrm: bail early on slave pass over skb")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2de7e4f6
    • Taehee Yoo's avatar
      net: netdevsim: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops · 09adf756
      Taehee Yoo authored
      There are two pointers in struct xfrm_state_offload, *dev, *real_dev.
      These are used in callback functions of struct xfrmdev_ops.
      The *dev points whether bonding interface or real interface.
      If bonding ipsec offload is used, it points bonding interface If not,
      it points real interface.
      And real_dev always points real interface.
      So, netdevsim should always use real_dev instead of dev.
      Of course, real_dev always not be null.
      
      Test commands:
          ip netns add A
          ip netns exec A bash
          modprobe netdevsim
          echo "1 1" > /sys/bus/netdevsim/new_device
          ip link add bond0 type bond mode active-backup
          ip link set eth0 master bond0
          ip link set eth0 up
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \
      transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
      dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
      
      Splat looks like:
      BUG: spinlock bad magic on CPU#5, kworker/5:1/53
       lock: 0xffff8881068c2cc8, .magic: 11121314, .owner: <none>/-1,
      .owner_cpu: -235736076
      CPU: 5 PID: 53 Comm: kworker/5:1 Not tainted 5.13.0-rc3+ #1168
      Workqueue: events linkwatch_event
      Call Trace:
       dump_stack+0xa4/0xe5
       do_raw_spin_lock+0x20b/0x270
       ? rwlock_bug.part.1+0x90/0x90
       _raw_spin_lock_nested+0x5f/0x70
       bond_get_stats+0xe4/0x4c0 [bonding]
       ? rcu_read_lock_sched_held+0xc0/0xc0
       ? bond_neigh_init+0x2c0/0x2c0 [bonding]
       ? dev_get_alias+0xe2/0x190
       ? dev_get_port_parent_id+0x14a/0x360
       ? rtnl_unregister+0x190/0x190
       ? dev_get_phys_port_name+0xa0/0xa0
       ? memset+0x1f/0x40
       ? memcpy+0x38/0x60
       ? rtnl_phys_switch_id_fill+0x91/0x100
       dev_get_stats+0x8c/0x270
       rtnl_fill_stats+0x44/0xbe0
       ? nla_put+0xbe/0x140
       rtnl_fill_ifinfo+0x1054/0x3ad0
      [ ... ]
      
      Fixes: 272c2330 ("xfrm: bail early on slave pass over skb")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09adf756
    • Taehee Yoo's avatar
      bonding: fix null dereference in bond_ipsec_add_sa() · 105cd17a
      Taehee Yoo authored
      If bond doesn't have real device, bond->curr_active_slave is null.
      But bond_ipsec_add_sa() dereferences bond->curr_active_slave without
      null checking.
      So, null-ptr-deref would occur.
      
      Test commands:
          ip link add bond0 type bond
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi \
      0x07 mode transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
      dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
      
      Splat looks like:
      KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      CPU: 4 PID: 680 Comm: ip Not tainted 5.13.0-rc3+ #1168
      RIP: 0010:bond_ipsec_add_sa+0xc4/0x2e0 [bonding]
      Code: 85 21 02 00 00 4d 8b a6 48 0c 00 00 e8 75 58 44 ce 85 c0 0f 85 14
      01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02
      00 0f 85 fc 01 00 00 48 8d bb e0 02 00 00 4d 8b 2c 24 48
      RSP: 0018:ffff88810946f508 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: ffff88810b4e8040 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: ffffffff8fe34280 RDI: ffff888115abe100
      RBP: ffff88810946f528 R08: 0000000000000003 R09: fffffbfff2287e11
      R10: 0000000000000001 R11: ffff888115abe0c8 R12: 0000000000000000
      R13: ffffffffc0aea9a0 R14: ffff88800d7d2000 R15: ffff88810b4e8330
      FS:  00007efc5552e680(0000) GS:ffff888119c00000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055c2530dbf40 CR3: 0000000103056004 CR4: 00000000003706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       xfrm_dev_state_add+0x2a9/0x770
       ? memcpy+0x38/0x60
       xfrm_add_sa+0x2278/0x3b10 [xfrm_user]
       ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user]
       ? register_lock_class+0x1750/0x1750
       xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? find_held_lock+0x3a/0x1c0
       ? mutex_lock_io_nested+0x1210/0x1210
       ? sched_clock_cpu+0x18/0x170
       netlink_rcv_skb+0x121/0x350
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? netlink_ack+0x9d0/0x9d0
       ? netlink_deliver_tap+0x17c/0xa50
       xfrm_netlink_rcv+0x68/0x80 [xfrm_user]
       netlink_unicast+0x41c/0x610
       ? netlink_attachskb+0x710/0x710
       netlink_sendmsg+0x6b9/0xb70
      [ ...]
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      105cd17a
    • Taehee Yoo's avatar
      bonding: fix suspicious RCU usage in bond_ipsec_add_sa() · b648eba4
      Taehee Yoo authored
      To dereference bond->curr_active_slave, it uses rcu_dereference().
      But it and the caller doesn't acquire RCU so a warning occurs.
      So add rcu_read_lock().
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add bond0 type bond
          ip link set dummy0 master bond0
          ip link set dummy0 up
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 \
      	    mode transport \
      	    reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      	    0x44434241343332312423222114131211f4f3f2f1 128 sel \
      	    src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp offload \
      	    dev bond0 dir in
      
      Splat looks like:
      =============================
      WARNING: suspicious RCU usage
      5.13.0-rc3+ #1168 Not tainted
      -----------------------------
      drivers/net/bonding/bond_main.c:411 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      1 lock held by ip/684:
       #0: ffffffff9a2757c0 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3},
      at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user]
         55.191733][  T684] stack backtrace:
      CPU: 0 PID: 684 Comm: ip Not tainted 5.13.0-rc3+ #1168
      Call Trace:
       dump_stack+0xa4/0xe5
       bond_ipsec_add_sa+0x18c/0x1f0 [bonding]
       xfrm_dev_state_add+0x2a9/0x770
       ? memcpy+0x38/0x60
       xfrm_add_sa+0x2278/0x3b10 [xfrm_user]
       ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user]
       ? register_lock_class+0x1750/0x1750
       xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? find_held_lock+0x3a/0x1c0
       ? mutex_lock_io_nested+0x1210/0x1210
       ? sched_clock_cpu+0x18/0x170
       netlink_rcv_skb+0x121/0x350
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? netlink_ack+0x9d0/0x9d0
       ? netlink_deliver_tap+0x17c/0xa50
       xfrm_netlink_rcv+0x68/0x80 [xfrm_user]
       netlink_unicast+0x41c/0x610
       ? netlink_attachskb+0x710/0x710
       netlink_sendmsg+0x6b9/0xb70
      [ ... ]
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b648eba4
    • Nguyen Dinh Phi's avatar
      tcp: fix tcp_init_transfer() to not reset icsk_ca_initialized · be5d1b61
      Nguyen Dinh Phi authored
      This commit fixes a bug (found by syzkaller) that could cause spurious
      double-initializations for congestion control modules, which could cause
      memory leaks or other problems for congestion control modules (like CDG)
      that allocate memory in their init functions.
      
      The buggy scenario constructed by syzkaller was something like:
      
      (1) create a TCP socket
      (2) initiate a TFO connect via sendto()
      (3) while socket is in TCP_SYN_SENT, call setsockopt(TCP_CONGESTION),
          which calls:
             tcp_set_congestion_control() ->
               tcp_reinit_congestion_control() ->
                 tcp_init_congestion_control()
      (4) receive ACK, connection is established, call tcp_init_transfer(),
          set icsk_ca_initialized=0 (without first calling cc->release()),
          call tcp_init_congestion_control() again.
      
      Note that in this sequence tcp_init_congestion_control() is called
      twice without a cc->release() call in between. Thus, for CC modules
      that allocate memory in their init() function, e.g, CDG, a memory leak
      may occur. The syzkaller tool managed to find a reproducer that
      triggered such a leak in CDG.
      
      The bug was introduced when that commit 8919a9b3 ("tcp: Only init
      congestion control if not initialized already")
      introduced icsk_ca_initialized and set icsk_ca_initialized to 0 in
      tcp_init_transfer(), missing the possibility for a sequence like the
      one above, where a process could call setsockopt(TCP_CONGESTION) in
      state TCP_SYN_SENT (i.e. after the connect() or TFO open sendmsg()),
      which would call tcp_init_congestion_control(). It did not intend to
      reset any initialization that the user had already explicitly made;
      it just missed the possibility of that particular sequence (which
      syzkaller managed to find).
      
      Fixes: 8919a9b3 ("tcp: Only init congestion control if not initialized already")
      Reported-by: syzbot+f1e24a0594d4e3a895d3@syzkaller.appspotmail.com
      Signed-off-by: default avatarNguyen Dinh Phi <phind.uet@gmail.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Tested-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be5d1b61
    • Paul Blakey's avatar
      skbuff: Release nfct refcount on napi stolen or re-used skbs · 8550ff8d
      Paul Blakey authored
      When multiple SKBs are merged to a new skb under napi GRO,
      or SKB is re-used by napi, if nfct was set for them in the
      driver, it will not be released while freeing their stolen
      head state or on re-use.
      
      Release nfct on napi's stolen or re-used SKBs, and
      in gro_list_prepare, check conntrack metadata diff.
      
      Fixes: 5c6b9460 ("net/mlx5e: CT: Handle misses after executing CT action")
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8550ff8d
    • Pablo Neira Ayuso's avatar
      netfilter: nft_last: incorrect arithmetics when restoring last used · d1b5b80d
      Pablo Neira Ayuso authored
      Subtract the jiffies that have passed by to current jiffies to fix last
      used restoration.
      
      Fixes: 836382dc ("netfilter: nf_tables: add last expression")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d1b5b80d
    • Pablo Neira Ayuso's avatar
      netfilter: nft_last: honor NFTA_LAST_SET on restoration · 6ac4bac4
      Pablo Neira Ayuso authored
      NFTA_LAST_SET tells us if this expression has ever seen a packet, do not
      ignore this attribute when restoring the ruleset.
      
      Fixes: 836382dc ("netfilter: nf_tables: add last expression")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6ac4bac4
    • Manfred Spraul's avatar
      netfilter: conntrack: Mark access for KCSAN · cf4466ea
      Manfred Spraul authored
      KCSAN detected an data race with ipc/sem.c that is intentional.
      
      As nf_conntrack_lock() uses the same algorithm: Update
      nf_conntrack_core as well:
      
      nf_conntrack_lock() contains
        a1) spin_lock()
        a2) smp_load_acquire(nf_conntrack_locks_all).
      
      a1) actually accesses one lock from an array of locks.
      
      nf_conntrack_locks_all() contains
        b1) nf_conntrack_locks_all=true (normal write)
        b2) spin_lock()
        b3) spin_unlock()
      
      b2 and b3 are done for every lock.
      
      This guarantees that nf_conntrack_locks_all() prevents any
      concurrent nf_conntrack_lock() owners:
      If a thread past a1), then b2) will block until that thread releases
      the lock.
      If the threat is before a1, then b3)+a1) ensure the write b1) is
      visible, thus a2) is guaranteed to see the updated value.
      
      But: This is only the latest time when b1) becomes visible.
      It may also happen that b1) is visible an undefined amount of time
      before the b3). And thus KCSAN will notice a data race.
      
      In addition, the compiler might be too clever.
      
      Solution: Use WRITE_ONCE().
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      cf4466ea
    • Ali Abdallah's avatar
      netfilter: conntrack: add new sysctl to disable RST check · 1da4cd82
      Ali Abdallah authored
      This patch adds a new sysctl tcp_ignore_invalid_rst to disable marking
      out of segments RSTs as INVALID.
      Signed-off-by: default avatarAli Abdallah <aabdallah@suse.de>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1da4cd82
    • Ali Abdallah's avatar
      netfilter: conntrack: improve RST handling when tuple is re-used · c4edc3cc
      Ali Abdallah authored
      If we receive a SYN packet in original direction on an existing
      connection tracking entry, we let this SYN through because conntrack
      might be out-of-sync.
      
      Conntrack gets back in sync when server responds with SYN/ACK and state
      gets updated accordingly.
      
      However, if server replies with RST, this packet might be marked as
      INVALID because td_maxack value reflects the *old* conntrack state
      and not the state of the originator of the RST.
      
      Avoid td_maxack-based checks if previous packet was a SYN.
      
      Unfortunately that is not be enough: an out of order ACK in original
      direction updates last_index, so we still end up marking valid RST.
      
      Thus disable the sequence check when we are not in established state and
      the received RST has a sequence of 0.
      
      Because marking RSTs as invalid usually leads to unwanted timeouts,
      also skip RST sequence checks if a conntrack entry is already closing.
      
      Such entries can already be evicted via GC in case the table is full.
      Co-developed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarAli Abdallah <aabdallah@suse.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c4edc3cc
  4. 05 Jul, 2021 5 commits