1. 03 Jun, 2021 25 commits
    • Coco Li's avatar
      ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions · 821bbf79
      Coco Li authored
      Reported by syzbot:
      HEAD commit:    90c911ad Merge tag 'fixes' of git://git.kernel.org/pub/scm..
      git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
      dashboard link: https://syzkaller.appspot.com/bug?extid=123aa35098fd3c000eb7
      compiler:       Debian clang version 11.0.1-2
      
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in fib6_nh_get_excptn_bucket net/ipv6/route.c:1604 [inline]
      BUG: KASAN: slab-out-of-bounds in fib6_nh_flush_exceptions+0xbd/0x360 net/ipv6/route.c:1732
      Read of size 8 at addr ffff8880145c78f8 by task syz-executor.4/17760
      
      CPU: 0 PID: 17760 Comm: syz-executor.4 Not tainted 5.12.0-rc8-syzkaller #0
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x202/0x31e lib/dump_stack.c:120
       print_address_description+0x5f/0x3b0 mm/kasan/report.c:232
       __kasan_report mm/kasan/report.c:399 [inline]
       kasan_report+0x15c/0x200 mm/kasan/report.c:416
       fib6_nh_get_excptn_bucket net/ipv6/route.c:1604 [inline]
       fib6_nh_flush_exceptions+0xbd/0x360 net/ipv6/route.c:1732
       fib6_nh_release+0x9a/0x430 net/ipv6/route.c:3536
       fib6_info_destroy_rcu+0xcb/0x1c0 net/ipv6/ip6_fib.c:174
       rcu_do_batch kernel/rcu/tree.c:2559 [inline]
       rcu_core+0x8f6/0x1450 kernel/rcu/tree.c:2794
       __do_softirq+0x372/0x7a6 kernel/softirq.c:345
       invoke_softirq kernel/softirq.c:221 [inline]
       __irq_exit_rcu+0x22c/0x260 kernel/softirq.c:422
       irq_exit_rcu+0x5/0x20 kernel/softirq.c:434
       sysvec_apic_timer_interrupt+0x91/0xb0 arch/x86/kernel/apic/apic.c:1100
       </IRQ>
       asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
      RIP: 0010:lock_acquire+0x1f6/0x720 kernel/locking/lockdep.c:5515
      Code: f6 84 24 a1 00 00 00 02 0f 85 8d 02 00 00 f7 c3 00 02 00 00 49 bd 00 00 00 00 00 fc ff df 74 01 fb 48 c7 44 24 40 0e 36 e0 45 <4b> c7 44 3d 00 00 00 00 00 4b c7 44 3d 09 00 00 00 00 43 c7 44 3d
      RSP: 0018:ffffc90009e06560 EFLAGS: 00000206
      RAX: 1ffff920013c0cc0 RBX: 0000000000000246 RCX: dffffc0000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffffc90009e066e0 R08: dffffc0000000000 R09: fffffbfff1f992b1
      R10: fffffbfff1f992b1 R11: 0000000000000000 R12: 0000000000000000
      R13: dffffc0000000000 R14: 0000000000000000 R15: 1ffff920013c0cb4
       rcu_lock_acquire+0x2a/0x30 include/linux/rcupdate.h:267
       rcu_read_lock include/linux/rcupdate.h:656 [inline]
       ext4_get_group_info+0xea/0x340 fs/ext4/ext4.h:3231
       ext4_mb_prefetch+0x123/0x5d0 fs/ext4/mballoc.c:2212
       ext4_mb_regular_allocator+0x8a5/0x28f0 fs/ext4/mballoc.c:2379
       ext4_mb_new_blocks+0xc6e/0x24f0 fs/ext4/mballoc.c:4982
       ext4_ext_map_blocks+0x2be3/0x7210 fs/ext4/extents.c:4238
       ext4_map_blocks+0xab3/0x1cb0 fs/ext4/inode.c:638
       ext4_getblk+0x187/0x6c0 fs/ext4/inode.c:848
       ext4_bread+0x2a/0x1c0 fs/ext4/inode.c:900
       ext4_append+0x1a4/0x360 fs/ext4/namei.c:67
       ext4_init_new_dir+0x337/0xa10 fs/ext4/namei.c:2768
       ext4_mkdir+0x4b8/0xc00 fs/ext4/namei.c:2814
       vfs_mkdir+0x45b/0x640 fs/namei.c:3819
       ovl_do_mkdir fs/overlayfs/overlayfs.h:161 [inline]
       ovl_mkdir_real+0x53/0x1a0 fs/overlayfs/dir.c:146
       ovl_create_real+0x280/0x490 fs/overlayfs/dir.c:193
       ovl_workdir_create+0x425/0x600 fs/overlayfs/super.c:788
       ovl_make_workdir+0xed/0x1140 fs/overlayfs/super.c:1355
       ovl_get_workdir fs/overlayfs/super.c:1492 [inline]
       ovl_fill_super+0x39ee/0x5370 fs/overlayfs/super.c:2035
       mount_nodev+0x52/0xe0 fs/super.c:1413
       legacy_get_tree+0xea/0x180 fs/fs_context.c:592
       vfs_get_tree+0x86/0x270 fs/super.c:1497
       do_new_mount fs/namespace.c:2903 [inline]
       path_mount+0x196f/0x2be0 fs/namespace.c:3233
       do_mount fs/namespace.c:3246 [inline]
       __do_sys_mount fs/namespace.c:3454 [inline]
       __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3431
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x4665f9
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f68f2b87188 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
      RAX: ffffffffffffffda RBX: 000000000056bf60 RCX: 00000000004665f9
      RDX: 00000000200000c0 RSI: 0000000020000000 RDI: 000000000040000a
      RBP: 00000000004bfbb9 R08: 0000000020000100 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf60
      R13: 00007ffe19002dff R14: 00007f68f2b87300 R15: 0000000000022000
      
      Allocated by task 17768:
       kasan_save_stack mm/kasan/common.c:38 [inline]
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:427 [inline]
       ____kasan_kmalloc+0xc2/0xf0 mm/kasan/common.c:506
       kasan_kmalloc include/linux/kasan.h:233 [inline]
       __kmalloc+0xb4/0x380 mm/slub.c:4055
       kmalloc include/linux/slab.h:559 [inline]
       kzalloc include/linux/slab.h:684 [inline]
       fib6_info_alloc+0x2c/0xd0 net/ipv6/ip6_fib.c:154
       ip6_route_info_create+0x55d/0x1a10 net/ipv6/route.c:3638
       ip6_route_add+0x22/0x120 net/ipv6/route.c:3728
       inet6_rtm_newroute+0x2cd/0x2260 net/ipv6/route.c:5352
       rtnetlink_rcv_msg+0xb34/0xe70 net/core/rtnetlink.c:5553
       netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2502
       netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
       netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1338
       netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1927
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg net/socket.c:674 [inline]
       ____sys_sendmsg+0x5a2/0x900 net/socket.c:2350
       ___sys_sendmsg net/socket.c:2404 [inline]
       __sys_sendmsg+0x319/0x400 net/socket.c:2433
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Last potentially related work creation:
       kasan_save_stack+0x27/0x50 mm/kasan/common.c:38
       kasan_record_aux_stack+0xee/0x120 mm/kasan/generic.c:345
       __call_rcu kernel/rcu/tree.c:3039 [inline]
       call_rcu+0x1b1/0xa30 kernel/rcu/tree.c:3114
       fib6_info_release include/net/ip6_fib.h:337 [inline]
       ip6_route_info_create+0x10c4/0x1a10 net/ipv6/route.c:3718
       ip6_route_add+0x22/0x120 net/ipv6/route.c:3728
       inet6_rtm_newroute+0x2cd/0x2260 net/ipv6/route.c:5352
       rtnetlink_rcv_msg+0xb34/0xe70 net/core/rtnetlink.c:5553
       netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2502
       netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
       netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1338
       netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1927
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg net/socket.c:674 [inline]
       ____sys_sendmsg+0x5a2/0x900 net/socket.c:2350
       ___sys_sendmsg net/socket.c:2404 [inline]
       __sys_sendmsg+0x319/0x400 net/socket.c:2433
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Second to last potentially related work creation:
       kasan_save_stack+0x27/0x50 mm/kasan/common.c:38
       kasan_record_aux_stack+0xee/0x120 mm/kasan/generic.c:345
       insert_work+0x54/0x400 kernel/workqueue.c:1331
       __queue_work+0x981/0xcc0 kernel/workqueue.c:1497
       queue_work_on+0x111/0x200 kernel/workqueue.c:1524
       queue_work include/linux/workqueue.h:507 [inline]
       call_usermodehelper_exec+0x283/0x470 kernel/umh.c:433
       kobject_uevent_env+0x1349/0x1730 lib/kobject_uevent.c:617
       kvm_uevent_notify_change+0x309/0x3b0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4809
       kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:877 [inline]
       kvm_put_kvm+0x9c/0xd10 arch/x86/kvm/../../../virt/kvm/kvm_main.c:920
       kvm_vcpu_release+0x53/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3120
       __fput+0x352/0x7b0 fs/file_table.c:280
       task_work_run+0x146/0x1c0 kernel/task_work.c:140
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
       exit_to_user_mode_prepare+0x10b/0x1e0 kernel/entry/common.c:208
       __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
       syscall_exit_to_user_mode+0x26/0x70 kernel/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The buggy address belongs to the object at ffff8880145c7800
       which belongs to the cache kmalloc-192 of size 192
      The buggy address is located 56 bytes to the right of
       192-byte region [ffff8880145c7800, ffff8880145c78c0)
      The buggy address belongs to the page:
      page:ffffea00005171c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x145c7
      flags: 0xfff00000000200(slab)
      raw: 00fff00000000200 ffffea00006474c0 0000000200000002 ffff888010c41a00
      raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8880145c7780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff8880145c7800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff8880145c7880: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
                                                                      ^
       ffff8880145c7900: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880145c7980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      ==================================================================
      
      In the ip6_route_info_create function, in the case that the nh pointer
      is not NULL, the fib6_nh in fib6_info has not been allocated.
      Therefore, when trying to free fib6_info in this error case using
      fib6_info_release, the function will call fib6_info_destroy_rcu,
      which it will access fib6_nh_release(f6i->fib6_nh);
      However, f6i->fib6_nh doesn't have any refcount yet given the lack of allocation
      causing the reported memory issue above.
      Therefore, releasing the empty pointer directly instead would be the solution.
      
      Fixes: f88d8ea6 ("ipv6: Plumb support for nexthop object in a fib6_info")
      Fixes: 706ec919 ("ipv6: Fix nexthop refcnt leak when creating ipv6 route info")
      Signed-off-by: default avatarCoco Li <lixiaoyan@google.com>
      Cc: David Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      821bbf79
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-2021-06-03' of... · 5e7a2c64
      David S. Miller authored
      Merge tag 'wireless-drivers-2021-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for v5.13
      
      We have only mt76 fixes this time, most important being the fix for
      A-MSDU injection attacks.
      
      mt76
      
      * mitigate A-MSDU injection attacks (CVE-2020-24588)
      
      * fix possible array out of bound access in mt7921_mcu_tx_rate_report
      
      * various aggregation and HE setting fixes
      
      * suspend/resume fix for pci devices
      
      * mt7615: fix crash when runtime-pm is not supported
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e7a2c64
    • Zheng Yongjun's avatar
      fib: Return the correct errno code · 59607863
      Zheng Yongjun authored
      When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF.
      Signed-off-by: default avatarZheng Yongjun <zhengyongjun3@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59607863
    • Zheng Yongjun's avatar
      net: Return the correct errno code · 49251cd0
      Zheng Yongjun authored
      When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF.
      Signed-off-by: default avatarZheng Yongjun <zhengyongjun3@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49251cd0
    • Zheng Yongjun's avatar
      net/x25: Return the correct errno code · d7736958
      Zheng Yongjun authored
      When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF.
      Signed-off-by: default avatarZheng Yongjun <zhengyongjun3@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7736958
    • Rahul Lakkireddy's avatar
      cxgb4: fix regression with HASH tc prio value update · a27fb314
      Rahul Lakkireddy authored
      commit db43b30c ("cxgb4: add ethtool n-tuple filter deletion")
      has moved searching for next highest priority HASH filter rule to
      cxgb4_flow_rule_destroy(), which searches the rhashtable before the
      the rule is removed from it and hence always finds at least 1 entry.
      Fix by removing the rule from rhashtable first before calling
      cxgb4_flow_rule_destroy() and hence avoid fetching stale info.
      
      Fixes: db43b30c ("cxgb4: add ethtool n-tuple filter deletion")
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a27fb314
    • David S. Miller's avatar
      Merge branch 'caif-fixes' · e0310182
      David S. Miller authored
      Pavel Skripkin says:
      
      ====================
      This patch series fix 2 memory leaks in caif
      interface.
      
      Syzbot reported memory leak in cfserl_create().
      The problem was in cfcnfg_add_phy_layer() function.
      This function accepts struct cflayer *link_support and
      assign it to corresponting structures, but it can fail
      in some cases.
      
      These cases must be handled to prevent leaking allocated
      struct cflayer *link_support pointer, because if error accured
      before assigning link_support pointer to somewhere, this pointer
      must be freed.
      
      Fail log:
      
      [   49.051872][ T7010] caif:cfcnfg_add_phy_layer(): Too many CAIF Link Layers (max 6)
      [   49.110236][ T7042] caif:cfcnfg_add_phy_layer(): Too many CAIF Link Layers (max 6)
      [   49.134936][ T7045] caif:cfcnfg_add_phy_layer(): Too many CAIF Link Layers (max 6)
      [   49.163083][ T7043] caif:cfcnfg_add_phy_layer(): Too many CAIF Link Layers (max 6)
      [   55.248950][ T6994] kmemleak: 4 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      int cfcnfg_add_phy_layer(..., struct cflayer *link_support, ...)
      {
      ...
      	/* CAIF protocol allow maximum 6 link-layers */
      	for (i = 0; i < 7; i++) {
      		phyid = (dev->ifindex + i) & 0x7;
      		if (phyid == 0)
      			continue;
      		if (cfcnfg_get_phyinfo_rcu(cnfg, phyid) == NULL)
      			goto got_phyid;
      	}
      	pr_warn("Too many CAIF Link Layers (max 6)\n");
      	goto out;
      ...
      	if (link_support != NULL) {
      		link_support->id = phyid;
      		layer_set_dn(frml, link_support);
      		layer_set_up(link_support, frml);
      		layer_set_dn(link_support, phy_layer);
      		layer_set_up(phy_layer, link_support);
      	}
      ...
      }
      
      As you can see, if cfcnfg_add_phy_layer fails before layer_set_*,
      link_support becomes leaked.
      
      So, in this series, I made cfcnfg_add_phy_layer()
      return an int and added error handling code to prevent
      leaking link_support pointer in caif_device_notify()
      and cfusbl_device_notify() functions.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0310182
    • Pavel Skripkin's avatar
      net: caif: fix memory leak in cfusbl_device_notify · 7f5d8666
      Pavel Skripkin authored
      In case of caif_enroll_dev() fail, allocated
      link_support won't be assigned to the corresponding
      structure. So simply free allocated pointer in case
      of error.
      
      Fixes: 7ad65bf6 ("caif: Add support for CAIF over CDC NCM USB interface")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f5d8666
    • Pavel Skripkin's avatar
      net: caif: fix memory leak in caif_device_notify · b53558a9
      Pavel Skripkin authored
      In case of caif_enroll_dev() fail, allocated
      link_support won't be assigned to the corresponding
      structure. So simply free allocated pointer in case
      of error
      
      Fixes: 7c18d220 ("caif: Restructure how link caif link layer enroll")
      Cc: stable@vger.kernel.org
      Reported-and-tested-by: syzbot+7ec324747ce876a29db6@syzkaller.appspotmail.com
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b53558a9
    • Pavel Skripkin's avatar
      net: caif: add proper error handling · a2805dca
      Pavel Skripkin authored
      caif_enroll_dev() can fail in some cases. Ingnoring
      these cases can lead to memory leak due to not assigning
      link_support pointer to anywhere.
      
      Fixes: 7c18d220 ("caif: Restructure how link caif link layer enroll")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2805dca
    • Pavel Skripkin's avatar
      net: caif: added cfserl_release function · bce130e7
      Pavel Skripkin authored
      Added cfserl_release() function.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bce130e7
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 4189777c
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      This series contains updates to igb, igc, ixgbe, ixgbevf, i40e and ice
      drivers.
      
      Kurt Kanzenbach fixes XDP for igb when PTP is enabled by pulling the
      timestamp and adjusting appropriate values prior to XDP operations.
      
      Magnus adds missing exception tracing for XDP on igb, igc, ixgbe,
      ixgbevf, i40e and ice drivers.
      
      Maciej adds tracking of AF_XDP zero copy enabled queues to resolve an
      issue with copy mode Tx for the ice driver.
      
      Note: Patch 7 will conflict when merged with net-next. Please carry
      these changes forward. IGC_XDP_TX and IGC_XDP_REDIRECT will need to be
      changed to return to conform with the net-next changes. Let me know if
      you have issues.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4189777c
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 86b84066
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2021-06-02
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 2 non-merge commits during the last 7 day(s) which contain
      a total of 4 files changed, 19 insertions(+), 24 deletions(-).
      
      The main changes are:
      
      1) Fix pahole BTF generation when ccache is used, from Javier Martinez Canillas.
      
      2) Fix BPF lockdown hooks in bpf_probe_read_kernel{,_str}() helpers which caused
         a deadlock from bcc programs, triggered OOM killer from audit side and didn't
         work generally with SELinux policy rules due to pointing to wrong task struct,
         from Daniel Borkmann.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86b84066
    • Pavel Skripkin's avatar
      net: kcm: fix memory leak in kcm_sendmsg · c47cc304
      Pavel Skripkin authored
      Syzbot reported memory leak in kcm_sendmsg()[1].
      The problem was in non-freed frag_list in case of error.
      
      In the while loop:
      
      	if (head == skb)
      		skb_shinfo(head)->frag_list = tskb;
      	else
      		skb->next = tskb;
      
      frag_list filled with skbs, but nothing was freeing them.
      
      backtrace:
        [<0000000094c02615>] __alloc_skb+0x5e/0x250 net/core/skbuff.c:198
        [<00000000e5386cbd>] alloc_skb include/linux/skbuff.h:1083 [inline]
        [<00000000e5386cbd>] kcm_sendmsg+0x3b6/0xa50 net/kcm/kcmsock.c:967 [1]
        [<00000000f1613a8a>] sock_sendmsg_nosec net/socket.c:652 [inline]
        [<00000000f1613a8a>] sock_sendmsg+0x4c/0x60 net/socket.c:672
      
      Reported-and-tested-by: syzbot+b039f5699bd82e1fb011@syzkaller.appspotmail.com
      Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c47cc304
    • zhang kai's avatar
      sit: set name of device back to struct parms · 261ba78c
      zhang kai authored
      addrconf_set_sit_dstaddr will use parms->name.
      Signed-off-by: default avatarzhang kai <zhangkaiheb@126.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      261ba78c
    • Jiapeng Chong's avatar
      rtnetlink: Fix missing error code in rtnl_bridge_notify() · a8db57c1
      Jiapeng Chong authored
      The error code is missing in this code scenario, add the error code
      '-EINVAL' to the return value 'err'.
      
      Eliminate the follow smatch warning:
      
      net/core/rtnetlink.c:4834 rtnl_bridge_notify() warn: missing error code
      'err'.
      Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: default avatarJiapeng Chong <jiapeng.chong@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8db57c1
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 59717f39
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Do not allow to add conntrack helper extension for confirmed
         conntracks in the nf_tables ct expectation support.
      
      2) Fix bogus EBUSY in nfnetlink_cthelper when NFCTH_PRIV_DATA_LEN
         is passed on userspace helper updates.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59717f39
    • Maciej Fijalkowski's avatar
      ice: track AF_XDP ZC enabled queues in bitmap · e102db78
      Maciej Fijalkowski authored
      Commit c7a21904 ("ice: Remove xsk_buff_pool from VSI structure")
      silently introduced a regression and broke the Tx side of AF_XDP in copy
      mode. xsk_pool on ice_ring is set only based on the existence of the XDP
      prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed.
      That is not something that should happen for copy mode as it should use
      the regular data path ice_clean_tx_irq.
      
      This results in a following splat when xdpsock is run in txonly or l2fwd
      scenarios in copy mode:
      
      <snip>
      [  106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030
      [  106.057269] #PF: supervisor read access in kernel mode
      [  106.062493] #PF: error_code(0x0000) - not-present page
      [  106.067709] PGD 0 P4D 0
      [  106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [  106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45
      [  106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
      [  106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50
      [  106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00
      [  106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206
      [  106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800
      [  106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800
      [  106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800
      [  106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff
      [  106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018
      [  106.157117] FS:  0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000
      [  106.165332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0
      [  106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  106.192898] PKRU: 55555554
      [  106.195653] Call Trace:
      [  106.198143]  <IRQ>
      [  106.200196]  ice_clean_tx_irq_zc+0x183/0x2a0 [ice]
      [  106.205087]  ice_napi_poll+0x3e/0x590 [ice]
      [  106.209356]  __napi_poll+0x2a/0x160
      [  106.212911]  net_rx_action+0xd6/0x200
      [  106.216634]  __do_softirq+0xbf/0x29b
      [  106.220274]  irq_exit_rcu+0x88/0xc0
      [  106.223819]  common_interrupt+0x7b/0xa0
      [  106.227719]  </IRQ>
      [  106.229857]  asm_common_interrupt+0x1e/0x40
      </snip>
      
      Fix this by introducing the bitmap of queues that are zero-copy enabled,
      where each bit, corresponding to a queue id that xsk pool is being
      configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and
      checked within ice_xsk_pool(). The latter is a function used for
      deciding which napi poll routine is executed.
      Idea is being taken from our other drivers such as i40e and ixgbe.
      
      Fixes: c7a21904 ("ice: Remove xsk_buff_pool from VSI structure")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      e102db78
    • Magnus Karlsson's avatar
      igc: add correct exception tracing for XDP · 45ce0859
      Magnus Karlsson authored
      Add missing exception tracing to XDP when a number of different
      errors can occur. The support was only partial. Several errors
      where not logged which would confuse the user quite a lot not
      knowing where and why the packets disappeared.
      
      Fixes: 73f1071c ("igc: Add support for XDP_TX action")
      Fixes: 4ff32036 ("igc: Add support for XDP_REDIRECT action")
      Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: default avatarDvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      45ce0859
    • Magnus Karlsson's avatar
      ixgbevf: add correct exception tracing for XDP · faae8142
      Magnus Karlsson authored
      Add missing exception tracing to XDP when a number of different
      errors can occur. The support was only partial. Several errors
      where not logged which would confuse the user quite a lot not
      knowing where and why the packets disappeared.
      
      Fixes: 21092e9c ("ixgbevf: Add support for XDP_TX action")
      Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: default avatarVishakha Jambekar <vishakha.jambekar@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      faae8142
    • Magnus Karlsson's avatar
      igb: add correct exception tracing for XDP · 74431c40
      Magnus Karlsson authored
      Add missing exception tracing to XDP when a number of different
      errors can occur. The support was only partial. Several errors
      where not logged which would confuse the user quite a lot not
      knowing where and why the packets disappeared.
      
      Fixes: 9cbc948b ("igb: add XDP support")
      Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: default avatarVishakha Jambekar <vishakha.jambekar@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      74431c40
    • Magnus Karlsson's avatar
      ixgbe: add correct exception tracing for XDP · 8281356b
      Magnus Karlsson authored
      Add missing exception tracing to XDP when a number of different
      errors can occur. The support was only partial. Several errors
      where not logged which would confuse the user quite a lot not
      knowing where and why the packets disappeared.
      
      Fixes: 33fdc82f ("ixgbe: add support for XDP_TX action")
      Fixes: d0bcacd0 ("ixgbe: add AF_XDP zero-copy Rx support")
      Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: default avatarVishakha Jambekar <vishakha.jambekar@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      8281356b
    • Magnus Karlsson's avatar
      ice: add correct exception tracing for XDP · 89d65df0
      Magnus Karlsson authored
      Add missing exception tracing to XDP when a number of different
      errors can occur. The support was only partial. Several errors
      where not logged which would confuse the user quite a lot not
      knowing where and why the packets disappeared.
      
      Fixes: efc2214b ("ice: Add support for XDP")
      Fixes: 2d4238f5 ("ice: Add support for AF_XDP")
      Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: default avatarKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      89d65df0
    • Magnus Karlsson's avatar
      i40e: add correct exception tracing for XDP · f6c10b48
      Magnus Karlsson authored
      Add missing exception tracing to XDP when a number of different errors
      can occur. The support was only partial. Several errors where not
      logged which would confuse the user quite a lot not knowing where and
      why the packets disappeared.
      
      Fixes: 74608d17 ("i40e: add support for XDP_TX action")
      Fixes: 0a714186 ("i40e: add AF_XDP zero-copy Rx support")
      Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: default avatarKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      f6c10b48
    • Kurt Kanzenbach's avatar
      igb: Fix XDP with PTP enabled · 53792608
      Kurt Kanzenbach authored
      When using native XDP with the igb driver, the XDP frame data doesn't point to
      the beginning of the packet. It's off by 16 bytes. Everything works as expected
      with XDP skb mode.
      
      Actually these 16 bytes are used to store the packet timestamps. Therefore, pull
      the timestamp before executing any XDP operations and adjust all other code
      accordingly. The igc driver does it like that as well.
      
      Tested with Intel i210 card and AF_XDP sockets.
      
      Fixes: 9cbc948b ("igb: add XDP support")
      Signed-off-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      53792608
  2. 02 Jun, 2021 15 commits
    • Wong Vee Khee's avatar
      net: stmmac: fix issue where clk is being unprepared twice · ab00f3e0
      Wong Vee Khee authored
      In the case of MDIO bus registration failure due to no external PHY
      devices is connected to the MAC, clk_disable_unprepare() is called in
      stmmac_bus_clk_config() and intel_eth_pci_probe() respectively.
      
      The second call in intel_eth_pci_probe() will caused the following:-
      
      [   16.578605] intel-eth-pci 0000:00:1e.5: No PHY found
      [   16.583778] intel-eth-pci 0000:00:1e.5: stmmac_dvr_probe: MDIO bus (id: 2) registration failed
      [   16.680181] ------------[ cut here ]------------
      [   16.684861] stmmac-0000:00:1e.5 already disabled
      [   16.689547] WARNING: CPU: 13 PID: 2053 at drivers/clk/clk.c:952 clk_core_disable+0x96/0x1b0
      [   16.697963] Modules linked in: dwc3 iTCO_wdt mei_hdcp iTCO_vendor_support udc_core x86_pkg_temp_thermal kvm_intel marvell10g kvm sch_fq_codel nfsd irqbypass dwmac_intel(+) stmmac uio ax88179_178a pcs_xpcs phylink uhid spi_pxa2xx_platform usbnet mei_me pcspkr tpm_crb mii i2c_i801 dw_dmac dwc3_pci thermal dw_dmac_core intel_rapl_msr libphy i2c_smbus mei tpm_tis intel_th_gth tpm_tis_core tpm intel_th_acpi intel_pmc_core intel_th i915 fuse configfs snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_pcm snd_timer snd soundcore
      [   16.746785] CPU: 13 PID: 2053 Comm: systemd-udevd Tainted: G     U            5.13.0-rc3-intel-lts #76
      [   16.756134] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-S ADP-S DRR4 CRB, BIOS ADLIFSI1.R00.1494.B00.2012031421 12/03/2020
      [   16.769465] RIP: 0010:clk_core_disable+0x96/0x1b0
      [   16.774222] Code: 00 8b 05 45 96 17 01 85 c0 7f 24 48 8b 5b 30 48 85 db 74 a5 8b 43 7c 85 c0 75 93 48 8b 33 48 c7 c7 6e 32 cc b7 e8 b2 5d 52 00 <0f> 0b 5b 5d c3 65 8b 05 76 31 18 49 89 c0 48 0f a3 05 bc 92 1a 01
      [   16.793016] RSP: 0018:ffffa44580523aa0 EFLAGS: 00010086
      [   16.798287] RAX: 0000000000000000 RBX: ffff8d7d0eb70a00 RCX: 0000000000000000
      [   16.805435] RDX: 0000000000000002 RSI: ffffffffb7c62d5f RDI: 00000000ffffffff
      [   16.812610] RBP: 0000000000000287 R08: 0000000000000000 R09: ffffa445805238d0
      [   16.819759] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8d7d0eb70a00
      [   16.826904] R13: ffff8d7d027370c8 R14: 0000000000000006 R15: ffffa44580523ad0
      [   16.834047] FS:  00007f9882fa2600(0000) GS:ffff8d80a0940000(0000) knlGS:0000000000000000
      [   16.842177] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   16.847966] CR2: 00007f9882bea3d8 CR3: 000000010b126001 CR4: 0000000000370ee0
      [   16.855144] Call Trace:
      [   16.857614]  clk_core_disable_lock+0x1b/0x30
      [   16.861941]  intel_eth_pci_probe.cold+0x11d/0x136 [dwmac_intel]
      [   16.867913]  pci_device_probe+0xcf/0x150
      [   16.871890]  really_probe+0xf5/0x3e0
      [   16.875526]  driver_probe_device+0x64/0x150
      [   16.879763]  device_driver_attach+0x53/0x60
      [   16.883998]  __driver_attach+0x9f/0x150
      [   16.887883]  ? device_driver_attach+0x60/0x60
      [   16.892288]  ? device_driver_attach+0x60/0x60
      [   16.896698]  bus_for_each_dev+0x77/0xc0
      [   16.900583]  bus_add_driver+0x184/0x1f0
      [   16.904469]  driver_register+0x6c/0xc0
      [   16.908268]  ? 0xffffffffc07ae000
      [   16.911598]  do_one_initcall+0x4a/0x210
      [   16.915489]  ? kmem_cache_alloc_trace+0x305/0x4e0
      [   16.920247]  do_init_module+0x5c/0x230
      [   16.924057]  load_module+0x2894/0x2b70
      [   16.927857]  ? __do_sys_finit_module+0xb5/0x120
      [   16.932441]  __do_sys_finit_module+0xb5/0x120
      [   16.936845]  do_syscall_64+0x42/0x80
      [   16.940476]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   16.945586] RIP: 0033:0x7f98830e5ccd
      [   16.949177] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 31 0c 00 f7 d8 64 89 01 48
      [   16.967970] RSP: 002b:00007ffc66b60168 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [   16.975583] RAX: ffffffffffffffda RBX: 000055885de35ef0 RCX: 00007f98830e5ccd
      [   16.982725] RDX: 0000000000000000 RSI: 00007f98832541e3 RDI: 0000000000000012
      [   16.989868] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
      [   16.997042] R10: 0000000000000012 R11: 0000000000000246 R12: 00007f98832541e3
      [   17.004222] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffc66b60328
      [   17.011369] ---[ end trace df06a3dab26b988c ]---
      [   17.016062] ------------[ cut here ]------------
      [   17.020701] stmmac-0000:00:1e.5 already unprepared
      
      Removing the stmmac_bus_clks_config() call in stmmac_dvr_probe and let
      dwmac-intel to handle the unprepare and disable of the clk device.
      
      Fixes: 5ec55823 ("net: stmmac: add clocks management for gmac driver")
      Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarWong Vee Khee <vee.khee.wong@linux.intel.com>
      Reviewed-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab00f3e0
    • Josh Triplett's avatar
      net: ipconfig: Don't override command-line hostnames or domains · b508d5fb
      Josh Triplett authored
      If the user specifies a hostname or domain name as part of the ip=
      command-line option, preserve it and don't overwrite it with one
      supplied by DHCP/BOOTP.
      
      For instance, ip=::::myhostname::dhcp will use "myhostname" rather than
      ignoring and overwriting it.
      
      Fix the comment on ic_bootp_string that suggests it only copies a string
      "if not already set"; it doesn't have any such logic.
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b508d5fb
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2021-06-01' of git://git.kernel.org/pub/scm/linu · dd627662
      David S. Miller authored
      x/kernel/git/saeed/linux
      
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2021-06-01
      
      This series introduces some fixes to mlx5 driver.
      Please pull and let me know if there is any problem.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd627662
    • Daniel Borkmann's avatar
      bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks · ff40e510
      Daniel Borkmann authored
      Commit 59438b46 ("security,lockdown,selinux: implement SELinux lockdown")
      added an implementation of the locked_down LSM hook to SELinux, with the aim
      to restrict which domains are allowed to perform operations that would breach
      lockdown. This is indirectly also getting audit subsystem involved to report
      events. The latter is problematic, as reported by Ondrej and Serhei, since it
      can bring down the whole system via audit:
      
        1) The audit events that are triggered due to calls to security_locked_down()
           can OOM kill a machine, see below details [0].
      
        2) It also seems to be causing a deadlock via avc_has_perm()/slow_avc_audit()
           when trying to wake up kauditd, for example, when using trace_sched_switch()
           tracepoint, see details in [1]. Triggering this was not via some hypothetical
           corner case, but with existing tools like runqlat & runqslower from bcc, for
           example, which make use of this tracepoint. Rough call sequence goes like:
      
           rq_lock(rq) -> -------------------------+
             trace_sched_switch() ->               |
               bpf_prog_xyz() ->                   +-> deadlock
                 selinux_lockdown() ->             |
                   audit_log_end() ->              |
                     wake_up_interruptible() ->    |
                       try_to_wake_up() ->         |
                         rq_lock(rq) --------------+
      
      What's worse is that the intention of 59438b46 to further restrict lockdown
      settings for specific applications in respect to the global lockdown policy is
      completely broken for BPF. The SELinux policy rule for the current lockdown check
      looks something like this:
      
        allow <who> <who> : lockdown { <reason> };
      
      However, this doesn't match with the 'current' task where the security_locked_down()
      is executed, example: httpd does a syscall. There is a tracing program attached
      to the syscall which triggers a BPF program to run, which ends up doing a
      bpf_probe_read_kernel{,_str}() helper call. The selinux_lockdown() hook does
      the permission check against 'current', that is, httpd in this example. httpd
      has literally zero relation to this tracing program, and it would be nonsensical
      having to write an SELinux policy rule against httpd to let the tracing helper
      pass. The policy in this case needs to be against the entity that is installing
      the BPF program. For example, if bpftrace would generate a histogram of syscall
      counts by user space application:
      
        bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
      
      bpftrace would then go and generate a BPF program from this internally. One way
      of doing it [for the sake of the example] could be to call bpf_get_current_task()
      helper and then access current->comm via one of bpf_probe_read_kernel{,_str}()
      helpers. So the program itself has nothing to do with httpd or any other random
      app doing a syscall here. The BPF program _explicitly initiated_ the lockdown
      check. The allow/deny policy belongs in the context of bpftrace: meaning, you
      want to grant bpftrace access to use these helpers, but other tracers on the
      system like my_random_tracer _not_.
      
      Therefore fix all three issues at the same time by taking a completely different
      approach for the security_locked_down() hook, that is, move the check into the
      program verification phase where we actually retrieve the BPF func proto. This
      also reliably gets the task (current) that is trying to install the BPF tracing
      program, e.g. bpftrace/bcc/perf/systemtap/etc, and it also fixes the OOM since
      we're moving this out of the BPF helper's fast-path which can be called several
      millions of times per second.
      
      The check is then also in line with other security_locked_down() hooks in the
      system where the enforcement is performed at open/load time, for example,
      open_kcore() for /proc/kcore access or module_sig_check() for module signatures
      just to pick few random ones. What's out of scope in the fix as well as in
      other security_locked_down() hook locations /outside/ of BPF subsystem is that
      if the lockdown policy changes on the fly there is no retrospective action.
      This requires a different discussion, potentially complex infrastructure, and
      it's also not clear whether this can be solved generically. Either way, it is
      out of scope for a suitable stable fix which this one is targeting. Note that
      the breakage is specifically on 59438b46 where it started to rely on 'current'
      as UAPI behavior, and _not_ earlier infrastructure such as 9d1f8be5 ("bpf:
      Restrict bpf when kernel lockdown is in confidentiality mode").
      
      [0] https://bugzilla.redhat.com/show_bug.cgi?id=1955585, Jakub Hrozek says:
      
        I starting seeing this with F-34. When I run a container that is traced with
        BPF to record the syscalls it is doing, auditd is flooded with messages like:
      
        type=AVC msg=audit(1619784520.593:282387): avc:  denied  { confidentiality }
          for pid=476 comm="auditd" lockdown_reason="use of bpf to read kernel RAM"
            scontext=system_u:system_r:auditd_t:s0 tcontext=system_u:system_r:auditd_t:s0
              tclass=lockdown permissive=0
      
        This seems to be leading to auditd running out of space in the backlog buffer
        and eventually OOMs the machine.
      
        [...]
        auditd running at 99% CPU presumably processing all the messages, eventually I get:
        Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded
        Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded
        Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152579 > audit_backlog_limit=64
        Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152626 > audit_backlog_limit=64
        Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152694 > audit_backlog_limit=64
        Apr 30 12:20:42 fedora kernel: audit: audit_lost=6878426 audit_rate_limit=0 audit_backlog_limit=64
        Apr 30 12:20:45 fedora kernel: oci-seccomp-bpf invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000
        Apr 30 12:20:45 fedora kernel: CPU: 0 PID: 13284 Comm: oci-seccomp-bpf Not tainted 5.11.12-300.fc34.x86_64 #1
        Apr 30 12:20:45 fedora kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
        [...]
      
      [1] https://lore.kernel.org/linux-audit/CANYvDQN7H5tVp47fbYcRasv4XF07eUbsDwT_eDCHXJUj43J7jQ@mail.gmail.com/,
          Serhei Makarov says:
      
        Upstream kernel 5.11.0-rc7 and later was found to deadlock during a
        bpf_probe_read_compat() call within a sched_switch tracepoint. The problem
        is reproducible with the reg_alloc3 testcase from SystemTap's BPF backend
        testsuite on x86_64 as well as the runqlat, runqslower tools from bcc on
        ppc64le. Example stack trace:
      
        [...]
        [  730.868702] stack backtrace:
        [  730.869590] CPU: 1 PID: 701 Comm: in:imjournal Not tainted, 5.12.0-0.rc2.20210309git144c79ef.166.fc35.x86_64 #1
        [  730.871605] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
        [  730.873278] Call Trace:
        [  730.873770]  dump_stack+0x7f/0xa1
        [  730.874433]  check_noncircular+0xdf/0x100
        [  730.875232]  __lock_acquire+0x1202/0x1e10
        [  730.876031]  ? __lock_acquire+0xfc0/0x1e10
        [  730.876844]  lock_acquire+0xc2/0x3a0
        [  730.877551]  ? __wake_up_common_lock+0x52/0x90
        [  730.878434]  ? lock_acquire+0xc2/0x3a0
        [  730.879186]  ? lock_is_held_type+0xa7/0x120
        [  730.880044]  ? skb_queue_tail+0x1b/0x50
        [  730.880800]  _raw_spin_lock_irqsave+0x4d/0x90
        [  730.881656]  ? __wake_up_common_lock+0x52/0x90
        [  730.882532]  __wake_up_common_lock+0x52/0x90
        [  730.883375]  audit_log_end+0x5b/0x100
        [  730.884104]  slow_avc_audit+0x69/0x90
        [  730.884836]  avc_has_perm+0x8b/0xb0
        [  730.885532]  selinux_lockdown+0xa5/0xd0
        [  730.886297]  security_locked_down+0x20/0x40
        [  730.887133]  bpf_probe_read_compat+0x66/0xd0
        [  730.887983]  bpf_prog_250599c5469ac7b5+0x10f/0x820
        [  730.888917]  trace_call_bpf+0xe9/0x240
        [  730.889672]  perf_trace_run_bpf_submit+0x4d/0xc0
        [  730.890579]  perf_trace_sched_switch+0x142/0x180
        [  730.891485]  ? __schedule+0x6d8/0xb20
        [  730.892209]  __schedule+0x6d8/0xb20
        [  730.892899]  schedule+0x5b/0xc0
        [  730.893522]  exit_to_user_mode_prepare+0x11d/0x240
        [  730.894457]  syscall_exit_to_user_mode+0x27/0x70
        [  730.895361]  entry_SYSCALL_64_after_hwframe+0x44/0xae
        [...]
      
      Fixes: 59438b46 ("security,lockdown,selinux: implement SELinux lockdown")
      Reported-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Reported-by: default avatarJakub Hrozek <jhrozek@redhat.com>
      Reported-by: default avatarSerhei Makarov <smakarov@redhat.com>
      Reported-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Tested-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: James Morris <jamorris@linux.microsoft.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Frank Eigler <fche@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lore.kernel.org/bpf/01135120-8bf7-df2e-cff0-1d73f1f841c3@iogearbox.net
      ff40e510
    • Pablo Neira Ayuso's avatar
      netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches · 8971ee8b
      Pablo Neira Ayuso authored
      The private helper data size cannot be updated. However, updates that
      contain NFCTH_PRIV_DATA_LEN might bogusly hit EBUSY even if the size is
      the same.
      
      Fixes: 12f7a505 ("netfilter: add user-space connection tracking helper infrastructure")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8971ee8b
    • Pablo Neira Ayuso's avatar
      netfilter: nft_ct: skip expectations for confirmed conntrack · 1710eb91
      Pablo Neira Ayuso authored
      nft_ct_expect_obj_eval() calls nf_ct_ext_add() for a confirmed
      conntrack entry. However, nf_ct_ext_add() can only be called for
      !nf_ct_is_confirmed().
      
      [ 1825.349056] WARNING: CPU: 0 PID: 1279 at net/netfilter/nf_conntrack_extend.c:48 nf_ct_xt_add+0x18e/0x1a0 [nf_conntrack]
      [ 1825.351391] RIP: 0010:nf_ct_ext_add+0x18e/0x1a0 [nf_conntrack]
      [ 1825.351493] Code: 41 5c 41 5d 41 5e 41 5f c3 41 bc 0a 00 00 00 e9 15 ff ff ff ba 09 00 00 00 31 f6 4c 89 ff e8 69 6c 3d e9 eb 96 45 31 ed eb cd <0f> 0b e9 b1 fe ff ff e8 86 79 14 e9 eb bf 0f 1f 40 00 0f 1f 44 00
      [ 1825.351721] RSP: 0018:ffffc90002e1f1e8 EFLAGS: 00010202
      [ 1825.351790] RAX: 000000000000000e RBX: ffff88814f5783c0 RCX: ffffffffc0e4f887
      [ 1825.351881] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88814f578440
      [ 1825.351971] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88814f578447
      [ 1825.352060] R10: ffffed1029eaf088 R11: 0000000000000001 R12: ffff88814f578440
      [ 1825.352150] R13: ffff8882053f3a00 R14: 0000000000000000 R15: 0000000000000a20
      [ 1825.352240] FS:  00007f992261c900(0000) GS:ffff889faec00000(0000) knlGS:0000000000000000
      [ 1825.352343] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1825.352417] CR2: 000056070a4d1158 CR3: 000000015efe0000 CR4: 0000000000350ee0
      [ 1825.352508] Call Trace:
      [ 1825.352544]  nf_ct_helper_ext_add+0x10/0x60 [nf_conntrack]
      [ 1825.352641]  nft_ct_expect_obj_eval+0x1b8/0x1e0 [nft_ct]
      [ 1825.352716]  nft_do_chain+0x232/0x850 [nf_tables]
      
      Add the ct helper extension only for unconfirmed conntrack. Skip rule
      evaluation if the ct helper extension does not exist. Thus, you can
      only create expectations from the first packet.
      
      It should be possible to remove this limitation by adding a new action
      to attach a generic ct helper to the first packet. Then, use this ct
      helper extension from follow up packets to create the ct expectation.
      
      While at it, add a missing check to skip the template conntrack too
      and remove check for IPCT_UNTRACK which is implicit to !ct.
      
      Fixes: 857b4602 ("netfilter: nft_ct: add ct expectations support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1710eb91
    • Yevgeny Kliteynik's avatar
      net/mlx5: DR, Create multi-destination flow table with level less than 64 · 216214c6
      Yevgeny Kliteynik authored
      Flow table that contains flow pointing to multiple flow tables or multiple
      TIRs must have a level lower than 64. In our case it applies to muli-
      destination flow table.
      Fix the level of the created table to comply with HW Spec definitions, and
      still make sure that its level lower than SW-owned tables, so that it
      would be possible to point from the multi-destination FW table to SW
      tables.
      
      Fixes: 34583bee ("net/mlx5: DR, Create multi-destination table for SW-steering use")
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Reviewed-by: default avatarAlex Vesker <valex@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      216214c6
    • Aya Levin's avatar
      net/mlx5e: Fix conflict with HW TS and CQE compression · 5349cbba
      Aya Levin authored
      When a driver's profile doesn't support a dedicated PTP-RQ,
      configuration of CQE compression while HW TS is configured should fail.
      
      Fixes: 885b8cfb ("net/mlx5e: Update ethtool setting of CQE compression")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      5349cbba
    • Aya Levin's avatar
      net/mlx5e: Fix HW TS with CQE compression according to profile · 256f79d1
      Aya Levin authored
      When the driver's profile doesn't support a dedicated PTP-RQ, the PTP
      accuracy of HW TS is affected by the CQE compression. In this case,
      turn off CQE compression. Otherwise, the driver crashes:
      
      BUG: kernel NULL pointer dereference, address:0000000000000018
      ...
      ...
      RIP: 0010:mlx5e_ptp_rx_set_fs+0x25/0x1a0 [mlx5_core]
      ...
      ...
      Call Trace:
       mlx5e_ptp_activate_channel+0xb2/0xf0 [mlx5_core]
       mlx5e_activate_priv_channels+0x3b9/0x8c0 [mlx5_core]
       ? __mutex_unlock_slowpath+0x45/0x2a0
       ? mlx5e_refresh_tirs+0x151/0x1e0 [mlx5_core]
       mlx5e_switch_priv_channels+0x1cd/0x2d0 [mlx5_core]
       ? mlx5e_xdp_allowed+0x150/0x150 [mlx5_core]
       mlx5e_safe_switch_params+0x118/0x3c0 [mlx5_core]
       ? __mutex_lock+0x6e/0x8e0
       ? mlx5e_hwstamp_set+0xa9/0x300 [mlx5_core]
       mlx5e_hwstamp_set+0x194/0x300 [mlx5_core]
       ? dev_ioctl+0x9b/0x3d0
       mlx5i_ioctl+0x37/0x60 [mlx5_core]
       mlx5i_pkey_ioctl+0x12/0x20 [mlx5_core]
       dev_ioctl+0xa9/0x3d0
       sock_ioctl+0x268/0x420
       __x64_sys_ioctl+0x3d8/0x790
       ? lockdep_hardirqs_on_prepare+0xe4/0x190
       do_syscall_64+0x2d/0x40
      entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: 960fbfe2 ("net/mlx5e: Allow coexistence of CQE compression and HW TS PTP")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      256f79d1
    • Roi Dayan's avatar
      net/mlx5e: Fix adding encap rules to slow path · 2a2c84fa
      Roi Dayan authored
      On some devices the ignore flow level cap is not supported and we
      shouldn't use it. Setting the dest ft with mlx5_chains_get_tc_end_ft()
      already gives the correct end ft if ignore flow level cap is supported
      or not.
      
      Fixes: 39ac237c ("net/mlx5: E-Switch, Refactor chains and priorities")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2a2c84fa
    • Roi Dayan's avatar
      net/mlx5e: Check for needed capability for cvlan matching · afe93f71
      Roi Dayan authored
      If not supported show an error and return instead of trying to offload
      to the hardware and fail.
      
      Fixes: 699e96dd ("net/mlx5e: Support offloading tc double vlan headers match")
      Reported-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      afe93f71
    • Moshe Shemesh's avatar
      net/mlx5: Check firmware sync reset requested is set before trying to abort it · 5940e642
      Moshe Shemesh authored
      In case driver sent NACK to firmware on sync reset request, it will get
      sync reset abort event while it didn't set sync reset requested mode.
      Thus, on abort sync reset event handler, driver should check reset
      requested is set before trying to stop sync reset poll.
      
      Fixes: 7dd6df32 ("net/mlx5: Handle sync reset abort event")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      5940e642
    • Roi Dayan's avatar
      net/mlx5e: Disable TLS offload for uplink representor · b38742e4
      Roi Dayan authored
      TLS offload is not supported in switchdev mode.
      
      Fixes: 7a9fb35e ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b38742e4
    • Aya Levin's avatar
      net/mlx5e: Fix incompatible casting · d8ec9200
      Aya Levin authored
      Device supports setting of a single fec mode at a time, enforce this
      by bitmap_weight == 1. Input from fec command is in u32, avoid cast to
      unsigned long and use bitmap_from_arr32 to populate bitmap safely.
      
      Fixes: 4bd9d507 ("net/mlx5e: Enforce setting of a single FEC mode")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d8ec9200
    • Joe Perches's avatar
      MAINTAINERS: nfc mailing lists are subscribers-only · b0003726
      Joe Perches authored
      It looks as if the MAINTAINERS entries for the nfc mailing list
      should be updated as I just got a "rejected" bounce from the nfc list.
      
      -------
      Your message to the Linux-nfc mailing-list was rejected for the following
      reasons:
      
      The message is not from a list member
      -------
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0003726