1. 30 Sep, 2021 9 commits
    • Moshe Shemesh's avatar
      net/mlx5: E-Switch, Fix double allocation of acl flow counter · a586775f
      Moshe Shemesh authored
      Flow counter is allocated in eswitch legacy acl setting functions
      without checking if already allocated by previous setting. Add a check
      to avoid such double allocation.
      
      Fixes: 07bab950 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
      Fixes: ea651a86 ("net/mlx5: E-Switch, Refactor eswitch egress acl codes")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a586775f
    • Tariq Toukan's avatar
      net/mlx5e: Improve MQPRIO resiliency · 7dbc849b
      Tariq Toukan authored
      * Add netdev->tc_to_txq rollback in case of failure in
        mlx5e_update_netdev_queues().
      * Fix broken transition between the two modes:
        MQPRIO DCB mode with tc==8, and MQPRIO channel mode.
      * Disable MQPRIO channel mode if re-attaching with a different number
        of channels.
      * Improve code sharing.
      
      Fixes: ec60c458 ("net/mlx5e: Support MQPRIO channel mode")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      7dbc849b
    • Tariq Toukan's avatar
      net/mlx5e: Keep the value for maximum number of channels in-sync · 9d758d4a
      Tariq Toukan authored
      The value for maximum number of channels is first calculated based
      on the netdev's profile and current function resources (specifically,
      number of MSIX vectors, which depends among other things on the number
      of online cores in the system).
      This value is then used to calculate the netdev's number of rxqs/txqs.
      Once created (by alloc_etherdev_mqs), the number of netdev's rxqs/txqs
      is constant and we must not exceed it.
      
      To achieve this, keep the maximum number of channels in sync upon any
      netdevice re-attach.
      
      Use mlx5e_get_max_num_channels() for calculating the number of netdev's
      rxqs/txqs. After netdev is created, use mlx5e_calc_max_nch() (which
      coinsiders core device resources, profile, and netdev) to init or
      update priv->max_nch.
      
      Before this patch, the value of priv->max_nch might get out of sync,
      mistakenly allowing accesses to out-of-bounds objects, which would
      crash the system.
      
      Track the number of channels stats structures used in a separate
      field, as they are persistent to suspend/resume operations. All the
      collected stats of every channel index that ever existed should be
      preserved. They are reset only when struct mlx5e_priv is,
      in mlx5e_priv_cleanup(), which is part of the profile changing flow.
      
      There is no point anymore in blocking a profile change due to max_nch
      mismatch in mlx5e_netdev_change_profile(). Remove the limitation.
      
      Fixes: a1f240f1 ("net/mlx5e: Adjust to max number of channles when re-attaching")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      9d758d4a
    • Raed Salem's avatar
      net/mlx5e: IPSEC RX, enable checksum complete · f9a10440
      Raed Salem authored
      Currently in Rx data path IPsec crypto offloaded packets uses
      csum_none flag, so checksum is handled by the stack, this naturally
      have some performance/cpu utilization impact on such flows. As Nvidia
      NIC starting from ConnectX6DX provides checksum complete value out of
      the box also for such flows there is no sense in taking csum_none path,
      furthermore the stack (xfrm) have the method to handle checksum complete
      corrections for such flows i.e. IPsec trailer removal and consequently
      checksum value adjustment.
      
      Because of the above and in addition the ConnectX6DX is the first HW
      which supports IPsec crypto offload then it is safe to report csum
      complete for IPsec offloaded traffic.
      
      Fixes: b2ac7541 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload")
      Signed-off-by: default avatarRaed Salem <raeds@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      f9a10440
    • Eric Dumazet's avatar
      af_unix: fix races in sk_peer_pid and sk_peer_cred accesses · 35306eb2
      Eric Dumazet authored
      Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations
      are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred.
      
      In order to fix this issue, this patch adds a new spinlock that needs
      to be used whenever these fields are read or written.
      
      Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently
      reading sk->sk_peer_pid which makes no sense, as this field
      is only possibly set by AF_UNIX sockets.
      We will have to clean this in a separate patch.
      This could be done by reverting b48596d1 "Bluetooth: L2CAP: Add get_peer_pid callback"
      or implementing what was truly expected.
      
      Fixes: 109f6e39 ("af_unix: Allow SO_PEERCRED to work across namespaces.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarJann Horn <jannh@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Cc: Marcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35306eb2
    • Wong Vee Khee's avatar
      net: stmmac: fix EEE init issue when paired with EEE capable PHYs · 656ed8b0
      Wong Vee Khee authored
      When STMMAC is paired with Energy-Efficient Ethernet(EEE) capable PHY,
      and the PHY is advertising EEE by default, we need to enable EEE on the
      xPCS side too, instead of having user to manually trigger the enabling
      config via ethtool.
      
      Fixed this by adding xpcs_config_eee() call in stmmac_eee_init().
      
      Fixes: 7617af3d ("net: pcs: Introducing support for DWC xpcs Energy Efficient Ethernet")
      Cc: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
      Signed-off-by: default avatarWong Vee Khee <vee.khee.wong@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      656ed8b0
    • Jakub Kicinski's avatar
      net: dev_addr_list: handle first address in __hw_addr_add_ex · a5b8fd65
      Jakub Kicinski authored
      struct dev_addr_list is used for device addresses, unicast addresses
      and multicast addresses. The first of those needs special handling
      of the main address - netdev->dev_addr points directly the data
      of the entry and drivers write to it freely, so we can't maintain
      it in the rbtree (for now, at least, to be fixed in net-next).
      
      Current work around sprinkles special handling of the first
      address on the list throughout the code but it missed the case
      where address is being added. First address will not be visible
      during subsequent adds.
      
      Syzbot found a warning where unicast addresses are modified
      without holding the rtnl lock, tl;dr is that team generates
      the same modification multiple times, not necessarily when
      right locks are held.
      
      In the repro we have:
      
        macvlan -> team -> veth
      
      macvlan adds a unicast address to the team. Team then pushes
      that address down to its memebers (veths). Next something unrelated
      makes team sync member addrs again, and because of the bug
      the addr entries get duplicated in the veths. macvlan gets
      removed, removes its addr from team which removes only one
      of the duplicated addresses from veths. This removal is done
      under rtnl. Next syzbot uses iptables to add a multicast addr
      to team (which does not hold rtnl lock). Team syncs veth addrs,
      but because veths' unicast list still has the duplicate it will
      also get sync, even though this update is intended for mc addresses.
      Again, uc address updates need rtnl lock, boom.
      
      Reported-by: syzbot+7a2ab2cdc14d134de553@syzkaller.appspotmail.com
      Fixes: 406f42fa ("net-next: When a bond have a massive amount of VLANs with IPv6 addresses, performance of changing link state, attaching a VRF, changing an IPv6 address, etc. go down dramtically.")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5b8fd65
    • Vlad Buslov's avatar
      net: sched: flower: protect fl_walk() with rcu · d5ef1906
      Vlad Buslov authored
      Patch that refactored fl_walk() to use idr_for_each_entry_continue_ul()
      also removed rcu protection of individual filters which causes following
      use-after-free when filter is deleted concurrently. Fix fl_walk() to obtain
      rcu read lock while iterating and taking the filter reference and temporary
      release the lock while calling arg->fn() callback that can sleep.
      
      KASAN trace:
      
      [  352.773640] ==================================================================
      [  352.775041] BUG: KASAN: use-after-free in fl_walk+0x159/0x240 [cls_flower]
      [  352.776304] Read of size 4 at addr ffff8881c8251480 by task tc/2987
      
      [  352.777862] CPU: 3 PID: 2987 Comm: tc Not tainted 5.15.0-rc2+ #2
      [  352.778980] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [  352.781022] Call Trace:
      [  352.781573]  dump_stack_lvl+0x46/0x5a
      [  352.782332]  print_address_description.constprop.0+0x1f/0x140
      [  352.783400]  ? fl_walk+0x159/0x240 [cls_flower]
      [  352.784292]  ? fl_walk+0x159/0x240 [cls_flower]
      [  352.785138]  kasan_report.cold+0x83/0xdf
      [  352.785851]  ? fl_walk+0x159/0x240 [cls_flower]
      [  352.786587]  kasan_check_range+0x145/0x1a0
      [  352.787337]  fl_walk+0x159/0x240 [cls_flower]
      [  352.788163]  ? fl_put+0x10/0x10 [cls_flower]
      [  352.789007]  ? __mutex_unlock_slowpath.constprop.0+0x220/0x220
      [  352.790102]  tcf_chain_dump+0x231/0x450
      [  352.790878]  ? tcf_chain_tp_delete_empty+0x170/0x170
      [  352.791833]  ? __might_sleep+0x2e/0xc0
      [  352.792594]  ? tfilter_notify+0x170/0x170
      [  352.793400]  ? __mutex_unlock_slowpath.constprop.0+0x220/0x220
      [  352.794477]  tc_dump_tfilter+0x385/0x4b0
      [  352.795262]  ? tc_new_tfilter+0x1180/0x1180
      [  352.796103]  ? __mod_node_page_state+0x1f/0xc0
      [  352.796974]  ? __build_skb_around+0x10e/0x130
      [  352.797826]  netlink_dump+0x2c0/0x560
      [  352.798563]  ? netlink_getsockopt+0x430/0x430
      [  352.799433]  ? __mutex_unlock_slowpath.constprop.0+0x220/0x220
      [  352.800542]  __netlink_dump_start+0x356/0x440
      [  352.801397]  rtnetlink_rcv_msg+0x3ff/0x550
      [  352.802190]  ? tc_new_tfilter+0x1180/0x1180
      [  352.802872]  ? rtnl_calcit.isra.0+0x1f0/0x1f0
      [  352.803668]  ? tc_new_tfilter+0x1180/0x1180
      [  352.804344]  ? _copy_from_iter_nocache+0x800/0x800
      [  352.805202]  ? kasan_set_track+0x1c/0x30
      [  352.805900]  netlink_rcv_skb+0xc6/0x1f0
      [  352.806587]  ? rht_deferred_worker+0x6b0/0x6b0
      [  352.807455]  ? rtnl_calcit.isra.0+0x1f0/0x1f0
      [  352.808324]  ? netlink_ack+0x4d0/0x4d0
      [  352.809086]  ? netlink_deliver_tap+0x62/0x3d0
      [  352.809951]  netlink_unicast+0x353/0x480
      [  352.810744]  ? netlink_attachskb+0x430/0x430
      [  352.811586]  ? __alloc_skb+0xd7/0x200
      [  352.812349]  netlink_sendmsg+0x396/0x680
      [  352.813132]  ? netlink_unicast+0x480/0x480
      [  352.813952]  ? __import_iovec+0x192/0x210
      [  352.814759]  ? netlink_unicast+0x480/0x480
      [  352.815580]  sock_sendmsg+0x6c/0x80
      [  352.816299]  ____sys_sendmsg+0x3a5/0x3c0
      [  352.817096]  ? kernel_sendmsg+0x30/0x30
      [  352.817873]  ? __ia32_sys_recvmmsg+0x150/0x150
      [  352.818753]  ___sys_sendmsg+0xd8/0x140
      [  352.819518]  ? sendmsg_copy_msghdr+0x110/0x110
      [  352.820402]  ? ___sys_recvmsg+0xf4/0x1a0
      [  352.821110]  ? __copy_msghdr_from_user+0x260/0x260
      [  352.821934]  ? _raw_spin_lock+0x81/0xd0
      [  352.822680]  ? __handle_mm_fault+0xef3/0x1b20
      [  352.823549]  ? rb_insert_color+0x2a/0x270
      [  352.824373]  ? copy_page_range+0x16b0/0x16b0
      [  352.825209]  ? perf_event_update_userpage+0x2d0/0x2d0
      [  352.826190]  ? __fget_light+0xd9/0xf0
      [  352.826941]  __sys_sendmsg+0xb3/0x130
      [  352.827613]  ? __sys_sendmsg_sock+0x20/0x20
      [  352.828377]  ? do_user_addr_fault+0x2c5/0x8a0
      [  352.829184]  ? fpregs_assert_state_consistent+0x52/0x60
      [  352.830001]  ? exit_to_user_mode_prepare+0x32/0x160
      [  352.830845]  do_syscall_64+0x35/0x80
      [  352.831445]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  352.832331] RIP: 0033:0x7f7bee973c17
      [  352.833078] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
      [  352.836202] RSP: 002b:00007ffcbb368e28 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  352.837524] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7bee973c17
      [  352.838715] RDX: 0000000000000000 RSI: 00007ffcbb368e50 RDI: 0000000000000003
      [  352.839838] RBP: 00007ffcbb36d090 R08: 00000000cea96d79 R09: 00007f7beea34a40
      [  352.841021] R10: 00000000004059bb R11: 0000000000000246 R12: 000000000046563f
      [  352.842208] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcbb36d088
      
      [  352.843784] Allocated by task 2960:
      [  352.844451]  kasan_save_stack+0x1b/0x40
      [  352.845173]  __kasan_kmalloc+0x7c/0x90
      [  352.845873]  fl_change+0x282/0x22db [cls_flower]
      [  352.846696]  tc_new_tfilter+0x6cf/0x1180
      [  352.847493]  rtnetlink_rcv_msg+0x471/0x550
      [  352.848323]  netlink_rcv_skb+0xc6/0x1f0
      [  352.849097]  netlink_unicast+0x353/0x480
      [  352.849886]  netlink_sendmsg+0x396/0x680
      [  352.850678]  sock_sendmsg+0x6c/0x80
      [  352.851398]  ____sys_sendmsg+0x3a5/0x3c0
      [  352.852202]  ___sys_sendmsg+0xd8/0x140
      [  352.852967]  __sys_sendmsg+0xb3/0x130
      [  352.853718]  do_syscall_64+0x35/0x80
      [  352.854457]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [  352.855830] Freed by task 7:
      [  352.856421]  kasan_save_stack+0x1b/0x40
      [  352.857139]  kasan_set_track+0x1c/0x30
      [  352.857854]  kasan_set_free_info+0x20/0x30
      [  352.858609]  __kasan_slab_free+0xed/0x130
      [  352.859348]  kfree+0xa7/0x3c0
      [  352.859951]  process_one_work+0x44d/0x780
      [  352.860685]  worker_thread+0x2e2/0x7e0
      [  352.861390]  kthread+0x1f4/0x220
      [  352.862022]  ret_from_fork+0x1f/0x30
      
      [  352.862955] Last potentially related work creation:
      [  352.863758]  kasan_save_stack+0x1b/0x40
      [  352.864378]  kasan_record_aux_stack+0xab/0xc0
      [  352.865028]  insert_work+0x30/0x160
      [  352.865617]  __queue_work+0x351/0x670
      [  352.866261]  rcu_work_rcufn+0x30/0x40
      [  352.866917]  rcu_core+0x3b2/0xdb0
      [  352.867561]  __do_softirq+0xf6/0x386
      
      [  352.868708] Second to last potentially related work creation:
      [  352.869779]  kasan_save_stack+0x1b/0x40
      [  352.870560]  kasan_record_aux_stack+0xab/0xc0
      [  352.871426]  call_rcu+0x5f/0x5c0
      [  352.872108]  queue_rcu_work+0x44/0x50
      [  352.872855]  __fl_put+0x17c/0x240 [cls_flower]
      [  352.873733]  fl_delete+0xc7/0x100 [cls_flower]
      [  352.874607]  tc_del_tfilter+0x510/0xb30
      [  352.886085]  rtnetlink_rcv_msg+0x471/0x550
      [  352.886875]  netlink_rcv_skb+0xc6/0x1f0
      [  352.887636]  netlink_unicast+0x353/0x480
      [  352.888285]  netlink_sendmsg+0x396/0x680
      [  352.888942]  sock_sendmsg+0x6c/0x80
      [  352.889583]  ____sys_sendmsg+0x3a5/0x3c0
      [  352.890311]  ___sys_sendmsg+0xd8/0x140
      [  352.891019]  __sys_sendmsg+0xb3/0x130
      [  352.891716]  do_syscall_64+0x35/0x80
      [  352.892395]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [  352.893666] The buggy address belongs to the object at ffff8881c8251000
                      which belongs to the cache kmalloc-2k of size 2048
      [  352.895696] The buggy address is located 1152 bytes inside of
                      2048-byte region [ffff8881c8251000, ffff8881c8251800)
      [  352.897640] The buggy address belongs to the page:
      [  352.898492] page:00000000213bac35 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1c8250
      [  352.900110] head:00000000213bac35 order:3 compound_mapcount:0 compound_pincount:0
      [  352.901541] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
      [  352.902908] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042f00
      [  352.904391] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
      [  352.905861] page dumped because: kasan: bad access detected
      
      [  352.907323] Memory state around the buggy address:
      [  352.908218]  ffff8881c8251380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  352.909471]  ffff8881c8251400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  352.910735] >ffff8881c8251480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  352.912012]                    ^
      [  352.912642]  ffff8881c8251500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  352.913919]  ffff8881c8251580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  352.915185] ==================================================================
      
      Fixes: d39d7149 ("idr: introduce idr_for_each_entry_continue_ul()")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Acked-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5ef1906
    • Paolo Abeni's avatar
      net: introduce and use lock_sock_fast_nested() · 49054556
      Paolo Abeni authored
      Syzkaller reported a false positive deadlock involving
      the nl socket lock and the subflow socket lock:
      
      MPTCP: kernel_bind error, err=-98
      ============================================
      WARNING: possible recursive locking detected
      5.15.0-rc1-syzkaller #0 Not tainted
      --------------------------------------------
      syz-executor998/6520 is trying to acquire lock:
      ffff8880795718a0 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738
      
      but task is already holding lock:
      ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline]
      ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(k-sk_lock-AF_INET);
        lock(k-sk_lock-AF_INET);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      3 locks held by syz-executor998/6520:
       #0: ffffffff8d176c50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 net/netlink/genetlink.c:802
       #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_lock net/netlink/genetlink.c:33 [inline]
       #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x3e0/0x580 net/netlink/genetlink.c:790
       #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline]
       #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720
      
      stack backtrace:
      CPU: 1 PID: 6520 Comm: syz-executor998 Not tainted 5.15.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_deadlock_bug kernel/locking/lockdep.c:2944 [inline]
       check_deadlock kernel/locking/lockdep.c:2987 [inline]
       validate_chain kernel/locking/lockdep.c:3776 [inline]
       __lock_acquire.cold+0x149/0x3ab kernel/locking/lockdep.c:5015
       lock_acquire kernel/locking/lockdep.c:5625 [inline]
       lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590
       lock_sock_fast+0x36/0x100 net/core/sock.c:3229
       mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738
       inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
       __sock_release net/socket.c:649 [inline]
       sock_release+0x87/0x1b0 net/socket.c:677
       mptcp_pm_nl_create_listen_socket+0x238/0x2c0 net/mptcp/pm_netlink.c:900
       mptcp_nl_cmd_add_addr+0x359/0x930 net/mptcp/pm_netlink.c:1170
       genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731
       genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
       genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
       netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
       netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       sock_no_sendpage+0x101/0x150 net/core/sock.c:2980
       kernel_sendpage.part.0+0x1a0/0x340 net/socket.c:3504
       kernel_sendpage net/socket.c:3501 [inline]
       sock_sendpage+0xe5/0x140 net/socket.c:1003
       pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364
       splice_from_pipe_feed fs/splice.c:418 [inline]
       __splice_from_pipe+0x43e/0x8a0 fs/splice.c:562
       splice_from_pipe fs/splice.c:597 [inline]
       generic_splice_sendpage+0xd4/0x140 fs/splice.c:746
       do_splice_from fs/splice.c:767 [inline]
       direct_splice_actor+0x110/0x180 fs/splice.c:936
       splice_direct_to_actor+0x34b/0x8c0 fs/splice.c:891
       do_splice_direct+0x1b3/0x280 fs/splice.c:979
       do_sendfile+0xae9/0x1240 fs/read_write.c:1249
       __do_sys_sendfile64 fs/read_write.c:1314 [inline]
       __se_sys_sendfile64 fs/read_write.c:1300 [inline]
       __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1300
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f215cb69969
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffc96bb3868 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
      RAX: ffffffffffffffda RBX: 00007f215cbad072 RCX: 00007f215cb69969
      RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005
      RBP: 0000000000000000 R08: 00007ffc96bb3a08 R09: 00007ffc96bb3a08
      R10: 0000000100000002 R11: 0000000000000246 R12: 00007ffc96bb387c
      R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000
      
      the problem originates from uncorrect lock annotation in the mptcp
      code and is only visible since commit 2dcb96ba ("net: core: Correct
      the sock::sk_lock.owned lockdep annotations"), but is present since
      the port-based endpoint support initial implementation.
      
      This patch addresses the issue introducing a nested variant of
      lock_sock_fast() and using it in the relevant code path.
      
      Fixes: 1729cf18 ("mptcp: create the listening socket for new port")
      Fixes: 2dcb96ba ("net: core: Correct the sock::sk_lock.owned lockdep annotations")
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reported-and-tested-by: syzbot+1dd53f7a89b299d59eaf@syzkaller.appspotmail.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49054556
  2. 29 Sep, 2021 13 commits
    • Florian Fainelli's avatar
      net: phy: bcm7xxx: Fixed indirect MMD operations · d88fd1b5
      Florian Fainelli authored
      When EEE support was added to the 28nm EPHY it was assumed that it would
      be able to support the standard clause 45 over clause 22 register access
      method. It turns out that the PHY does not support that, which is the
      very reason for using the indirect shadow mode 2 bank 3 access method.
      
      Implement {read,write}_mmd to allow the standard PHY library routines
      pertaining to EEE querying and configuration to work correctly on these
      PHYs. This forces us to implement a __phy_set_clr_bits() function that
      does not grab the MDIO bus lock since the PHY driver's {read,write}_mmd
      functions are always called with that lock held.
      
      Fixes: 83ee102a ("net: phy: bcm7xxx: add support for 28nm EPHY")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d88fd1b5
    • David S. Miller's avatar
      Merge branch 'hns3-fixes' · 251ffc07
      David S. Miller authored
      Guangbin Huang says:
      
      ====================
      net: hns3: add some fixes for -net
      
      This series adds some fixes for the HNS3 ethernet driver.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      251ffc07
    • Guangbin Huang's avatar
      net: hns3: disable firmware compatible features when uninstall PF · 0178839c
      Guangbin Huang authored
      Currently, the firmware compatible features are enabled in PF driver
      initialization process, but they are not disabled in PF driver
      deinitialization process and firmware keeps these features in enabled
      status.
      
      In this case, if load an old PF driver (for example, in VM) which not
      support the firmware compatible features, firmware will still send mailbox
      message to PF when link status changed and PF will print
      "un-supported mailbox message, code = 201".
      
      To fix this problem, disable these firmware compatible features in PF
      driver deinitialization process.
      
      Fixes: ed8fb4b2 ("net: hns3: add link change event report")
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0178839c
    • Guangbin Huang's avatar
      net: hns3: fix always enable rx vlan filter problem after selftest · 27bf4af6
      Guangbin Huang authored
      Currently, the rx vlan filter will always be disabled before selftest and
      be enabled after selftest as the rx vlan filter feature is fixed on in
      old device earlier than V3.
      
      However, this feature is not fixed in some new devices and it can be
      disabled by user. In this case, it is wrong if rx vlan filter is enabled
      after selftest. So fix it.
      
      Fixes: bcc26e8d ("net: hns3: remove unused code in hns3_self_test()")
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27bf4af6
    • Guangbin Huang's avatar
      net: hns3: PF enable promisc for VF when mac table is overflow · 276e6042
      Guangbin Huang authored
      If unicast mac address table is full, and user add a new mac address, the
      unicast promisc needs to be enabled for the new unicast mac address can be
      used. So does the multicast promisc.
      
      Now this feature has been implemented for PF, and VF should be implemented
      too. When the mac table of VF is overflow, PF will enable promisc for this
      VF.
      
      Fixes: 1e6e7610 ("net: hns3: configure promisc mode for VF asynchronously")
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      276e6042
    • Jian Shen's avatar
      net: hns3: fix show wrong state when add existing uc mac address · 108b3c78
      Jian Shen authored
      Currently, if function adds an existing unicast mac address, eventhough
      driver will not add this address into hardware, but it will return 0 in
      function hclge_add_uc_addr_common(). It will cause the state of this
      unicast mac address is ACTIVE in driver, but it should be in TO-ADD state.
      
      To fix this problem, function hclge_add_uc_addr_common() returns -EEXIST
      if mac address is existing, and delete two error log to avoid printing
      them all the time after this modification.
      
      Fixes: 72110b56 ("net: hns3: return 0 and print warning when hit duplicate MAC")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      108b3c78
    • Jian Shen's avatar
      net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE · 0472e95f
      Jian Shen authored
      HCLGE_FLAG_MQPRIO_ENABLE is supposed to set when enable
      multiple TCs with tc mqprio, and HCLGE_FLAG_DCB_ENABLE is
      supposed to set when enable multiple TCs with ets. But
      the driver mixed the flags when updating the tm configuration.
      
      Furtherly, PFC should be available when HCLGE_FLAG_MQPRIO_ENABLE
      too, so remove the unnecessary limitation.
      
      Fixes: 5a5c9091 ("net: hns3: add support for tc mqprio offload")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0472e95f
    • Jian Shen's avatar
      net: hns3: don't rollback when destroy mqprio fail · d82650be
      Jian Shen authored
      For destroy mqprio is irreversible in stack, so it's unnecessary
      to rollback the tc configuration when destroy mqprio failed.
      Otherwise, it may cause the configuration being inconsistent
      between driver and netstack.
      
      As the failure is usually caused by reset, and the driver will
      restore the configuration after reset, so it can keep the
      configuration being consistent between driver and hardware.
      
      Fixes: 5a5c9091 ("net: hns3: add support for tc mqprio offload")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d82650be
    • Jian Shen's avatar
      net: hns3: remove tc enable checking · a8e76fef
      Jian Shen authored
      Currently, in function hns3_nic_set_real_num_queue(), the
      driver doesn't report the queue count and offset for disabled
      tc. If user enables multiple TCs, but only maps user
      priorities to partial of them, it may cause the queue range
      of the unmapped TC being displayed abnormally.
      
      Fix it by removing the tc enable checking, ensure the queue
      count is not zero.
      
      With this change, the tc_en is useless now, so remove it.
      
      Fixes: a75a8efa ("net: hns3: Fix tc setup when netdev is first up")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8e76fef
    • Jian Shen's avatar
      net: hns3: do not allow call hns3_nic_net_open repeatedly · 5b09e88e
      Jian Shen authored
      hns3_nic_net_open() is not allowed to called repeatly, but there
      is no checking for this. When doing device reset and setup tc
      concurrently, there is a small oppotunity to call hns3_nic_net_open
      repeatedly, and cause kernel bug by calling napi_enable twice.
      
      The calltrace information is like below:
      [ 3078.222780] ------------[ cut here ]------------
      [ 3078.230255] kernel BUG at net/core/dev.c:6991!
      [ 3078.236224] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
      [ 3078.243431] Modules linked in: hns3 hclgevf hclge hnae3 vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O)
      [ 3078.258880] CPU: 0 PID: 295 Comm: kworker/u8:5 Tainted: G           O      5.14.0-rc4+ #1
      [ 3078.269102] Hardware name:  , BIOS KpxxxFPGA 1P B600 V181 08/12/2021
      [ 3078.276801] Workqueue: hclge hclge_service_task [hclge]
      [ 3078.288774] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
      [ 3078.296168] pc : napi_enable+0x80/0x84
      tc qdisc sho[w  3d0e7v8 .e3t0h218 79] lr : hns3_nic_net_open+0x138/0x510 [hns3]
      
      [ 3078.314771] sp : ffff8000108abb20
      [ 3078.319099] x29: ffff8000108abb20 x28: 0000000000000000 x27: ffff0820a8490300
      [ 3078.329121] x26: 0000000000000001 x25: ffff08209cfc6200 x24: 0000000000000000
      [ 3078.339044] x23: ffff0820a8490300 x22: ffff08209cd76000 x21: ffff0820abfe3880
      [ 3078.349018] x20: 0000000000000000 x19: ffff08209cd76900 x18: 0000000000000000
      [ 3078.358620] x17: 0000000000000000 x16: ffffc816e1727a50 x15: 0000ffff8f4ff930
      [ 3078.368895] x14: 0000000000000000 x13: 0000000000000000 x12: 0000259e9dbeb6b4
      [ 3078.377987] x11: 0096a8f7e764eb40 x10: 634615ad28d3eab5 x9 : ffffc816ad8885b8
      [ 3078.387091] x8 : ffff08209cfc6fb8 x7 : ffff0820ac0da058 x6 : ffff0820a8490344
      [ 3078.396356] x5 : 0000000000000140 x4 : 0000000000000003 x3 : ffff08209cd76938
      [ 3078.405365] x2 : 0000000000000000 x1 : 0000000000000010 x0 : ffff0820abfe38a0
      [ 3078.414657] Call trace:
      [ 3078.418517]  napi_enable+0x80/0x84
      [ 3078.424626]  hns3_reset_notify_up_enet+0x78/0xd0 [hns3]
      [ 3078.433469]  hns3_reset_notify+0x64/0x80 [hns3]
      [ 3078.441430]  hclge_notify_client+0x68/0xb0 [hclge]
      [ 3078.450511]  hclge_reset_rebuild+0x524/0x884 [hclge]
      [ 3078.458879]  hclge_reset_service_task+0x3c4/0x680 [hclge]
      [ 3078.467470]  hclge_service_task+0xb0/0xb54 [hclge]
      [ 3078.475675]  process_one_work+0x1dc/0x48c
      [ 3078.481888]  worker_thread+0x15c/0x464
      [ 3078.487104]  kthread+0x160/0x170
      [ 3078.492479]  ret_from_fork+0x10/0x18
      [ 3078.498785] Code: c8027c81 35ffffa2 d50323bf d65f03c0 (d4210000)
      [ 3078.506889] ---[ end trace 8ebe0340a1b0fb44 ]---
      
      Once hns3_nic_net_open() is excute success, the flag
      HNS3_NIC_STATE_DOWN will be cleared. So add checking for this
      flag, directly return when HNS3_NIC_STATE_DOWN is no set.
      
      Fixes: e8884027 ("net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT")
      Signed-off-by: default avatarJian Shen <shenjian15@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b09e88e
    • Feng Zhou's avatar
      ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup · 513e605d
      Feng Zhou authored
      The ixgbe driver currently generates a NULL pointer dereference with
      some machine (online cpus < 63). This is due to the fact that the
      maximum value of num_xdp_queues is nr_cpu_ids. Code is in
      "ixgbe_set_rss_queues"".
      
      Here's how the problem repeats itself:
      Some machine (online cpus < 63), And user set num_queues to 63 through
      ethtool. Code is in the "ixgbe_set_channels",
      	adapter->ring_feature[RING_F_FDIR].limit = count;
      
      It becomes 63.
      
      When user use xdp, "ixgbe_set_rss_queues" will set queues num.
      	adapter->num_rx_queues = rss_i;
      	adapter->num_tx_queues = rss_i;
      	adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
      
      And rss_i's value is from
      	f = &adapter->ring_feature[RING_F_FDIR];
      	rss_i = f->indices = f->limit;
      
      So "num_rx_queues" > "num_xdp_queues", when run to "ixgbe_xdp_setup",
      	for (i = 0; i < adapter->num_rx_queues; i++)
      		if (adapter->xdp_ring[i]->xsk_umem)
      
      It leads to panic.
      
      Call trace:
      [exception RIP: ixgbe_xdp+368]
      RIP: ffffffffc02a76a0  RSP: ffff9fe16202f8d0  RFLAGS: 00010297
      RAX: 0000000000000000  RBX: 0000000000000020  RCX: 0000000000000000
      RDX: 0000000000000000  RSI: 000000000000001c  RDI: ffffffffa94ead90
      RBP: ffff92f8f24c0c18   R8: 0000000000000000   R9: 0000000000000000
      R10: ffff9fe16202f830  R11: 0000000000000000  R12: ffff92f8f24c0000
      R13: ffff9fe16202fc01  R14: 000000000000000a  R15: ffffffffc02a7530
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       7 [ffff9fe16202f8f0] dev_xdp_install at ffffffffa89fbbcc
       8 [ffff9fe16202f920] dev_change_xdp_fd at ffffffffa8a08808
       9 [ffff9fe16202f960] do_setlink at ffffffffa8a20235
      10 [ffff9fe16202fa88] rtnl_setlink at ffffffffa8a20384
      11 [ffff9fe16202fc78] rtnetlink_rcv_msg at ffffffffa8a1a8dd
      12 [ffff9fe16202fcf0] netlink_rcv_skb at ffffffffa8a717eb
      13 [ffff9fe16202fd40] netlink_unicast at ffffffffa8a70f88
      14 [ffff9fe16202fd80] netlink_sendmsg at ffffffffa8a71319
      15 [ffff9fe16202fdf0] sock_sendmsg at ffffffffa89df290
      16 [ffff9fe16202fe08] __sys_sendto at ffffffffa89e19c8
      17 [ffff9fe16202ff30] __x64_sys_sendto at ffffffffa89e1a64
      18 [ffff9fe16202ff38] do_syscall_64 at ffffffffa84042b9
      19 [ffff9fe16202ff50] entry_SYSCALL_64_after_hwframe at ffffffffa8c0008c
      
      So I fix ixgbe_max_channels so that it will not allow a setting of queues
      to be higher than the num_online_cpus(). And when run to ixgbe_xdp_setup,
      take the smaller value of num_rx_queues and num_xdp_queues.
      
      Fixes: 4a9b32f3 ("ixgbe: fix potential RX buffer starvation for AF_XDP")
      Signed-off-by: default avatarFeng Zhou <zhoufeng.zf@bytedance.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      513e605d
    • Thomas Gleixner's avatar
      net: bridge: mcast: Associate the seqcount with its protecting lock. · f936bb42
      Thomas Gleixner authored
      The sequence count bridge_mcast_querier::seq is protected by
      net_bridge::multicast_lock but seqcount_init() does not associate the
      seqcount with the lock. This leads to a warning on PREEMPT_RT because
      preemption is still enabled.
      
      Let seqcount_init() associate the seqcount with lock that protects the
      write section. Remove lockdep_assert_held_once() because lockdep already checks
      whether the associated lock is held.
      
      Fixes: 67b746f9 ("net: bridge: mcast: make sure querier port/address updates are consistent")
      Reported-by: default avatarMike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Tested-by: default avatarMike Galbraith <efault@gmx.de>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/r/20210928141049.593833-1-bigeasy@linutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f936bb42
    • Cai Huoqing's avatar
      net: mdio-ipq4019: Fix the error for an optional regs resource · 9e28cfea
      Cai Huoqing authored
      The second resource is optional which is only provided on the chipset
      IPQ5018. But the blamed commit ignores that and if the resource is
      not there it just fails.
      
      the resource is used like this,
      	if (priv->eth_ldo_rdy) {
      		val = readl(priv->eth_ldo_rdy);
      		val |= BIT(0);
      		writel(val, priv->eth_ldo_rdy);
      		fsleep(IPQ_PHY_SET_DELAY_US);
      	}
      
      This patch reverts that to still allow the second resource to be optional
      because other SoC have the some MDIO controller and doesn't need to
      second resource.
      
      Fixes: fa14d03e ("net: mdio-ipq4019: Make use of devm_platform_ioremap_resource()")
      Signed-off-by: default avatarCai Huoqing <caihuoqing@baidu.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20210928134849.2092-1-caihuoqing@baidu.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9e28cfea
  3. 28 Sep, 2021 18 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 4ccb9f03
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2021-09-28
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 10 non-merge commits during the last 14 day(s) which contain
      a total of 11 files changed, 139 insertions(+), 53 deletions(-).
      
      The main changes are:
      
      1) Fix MIPS JIT jump code emission for too large offsets, from Piotr Krysiuk.
      
      2) Fix x86 JIT atomic/fetch emission when dst reg maps to rax, from Johan Almbladh.
      
      3) Fix cgroup_sk_alloc corner case when called from interrupt, from Daniel Borkmann.
      
      4) Fix segfault in libbpf's linker for objects without BTF, from Kumar Kartikeya Dwivedi.
      
      5) Fix bpf_jit_charge_modmem for applications with CAP_BPF, from Lorenz Bauer.
      
      6) Fix return value handling for struct_ops BPF programs, from Hou Tao.
      
      7) Various fixes to BPF selftests, from Jiri Benc.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ,
      4ccb9f03
    • Arnd Bergmann's avatar
      net: hns3: fix hclge_dbg_dump_tm_pg() stack usage · c894b51e
      Arnd Bergmann authored
      This function copies strings around between multiple buffers
      including a large on-stack array that causes a build warning
      on 32-bit systems:
      
      drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c: In function 'hclge_dbg_dump_tm_pg':
      drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:782:1: error: the frame size of 1424 bytes is larger than 1400 bytes [-Werror=frame-larger-than=]
      
      The function can probably be cleaned up a lot, to go back to
      printing directly into the output buffer, but dynamically allocating
      the structure is a simpler workaround for now.
      
      Fixes: 04d96139 ("net: hns3: refine function hclge_dbg_dump_tm_pri()")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c894b51e
    • Horatiu Vultur's avatar
      net: mdio: mscc-miim: Fix the mdio controller · c6995117
      Horatiu Vultur authored
      According to the documentation the second resource is optional. But the
      blamed commit ignores that and if the resource is not there it just
      fails.
      
      This patch reverts that to still allow the second resource to be
      optional because other SoC have the some MDIO controller and doesn't
      need to second resource.
      
      Fixes: 672a1c39 ("net: mdio: mscc-miim: Make use of the helper function devm_platform_ioremap_resource()")
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Reviewed-by: default avatarCai Huoqing <caihuoqing@baidu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6995117
    • Kuniyuki Iwashima's avatar
      af_unix: Return errno instead of NULL in unix_create1(). · f4bd73b5
      Kuniyuki Iwashima authored
      unix_create1() returns NULL on error, and the callers assume that it never
      fails for reasons other than out of memory.  So, the callers always return
      -ENOMEM when unix_create1() fails.
      
      However, it also returns NULL when the number of af_unix sockets exceeds
      twice the limit controlled by sysctl: fs.file-max.  In this case, the
      callers should return -ENFILE like alloc_empty_file().
      
      This patch changes unix_create1() to return the correct error value instead
      of NULL on error.
      
      Out of curiosity, the assumption has been wrong since 1999 due to this
      change introduced in 2.2.4 [0].
      
        diff -u --recursive --new-file v2.2.3/linux/net/unix/af_unix.c linux/net/unix/af_unix.c
        --- v2.2.3/linux/net/unix/af_unix.c	Tue Jan 19 11:32:53 1999
        +++ linux/net/unix/af_unix.c	Sun Mar 21 07:22:00 1999
        @@ -388,6 +413,9 @@
         {
         	struct sock *sk;
      
        +	if (atomic_read(&unix_nr_socks) >= 2*max_files)
        +		return NULL;
        +
         	MOD_INC_USE_COUNT;
         	sk = sk_alloc(PF_UNIX, GFP_KERNEL, 1);
         	if (!sk) {
      
      [0]: https://cdn.kernel.org/pub/linux/kernel/v2.2/patch-2.2.4.gz
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4bd73b5
    • Eric Dumazet's avatar
      net: udp: annotate data race around udp_sk(sk)->corkflag · a9f59707
      Eric Dumazet authored
      up->corkflag field can be read or written without any lock.
      Annotate accesses to avoid possible syzbot/KCSAN reports.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9f59707
    • Randy Dunlap's avatar
      net: sun: SUNVNET_COMMON should depend on INET · 103bde37
      Randy Dunlap authored
      When CONFIG_INET is not set, there are failing references to IPv4
      functions, so make this driver depend on INET.
      
      Fixes these build errors:
      
      sparc64-linux-ld: drivers/net/ethernet/sun/sunvnet_common.o: in function `sunvnet_start_xmit_common':
      sunvnet_common.c:(.text+0x1a68): undefined reference to `__icmp_send'
      sparc64-linux-ld: drivers/net/ethernet/sun/sunvnet_common.o: in function `sunvnet_poll_common':
      sunvnet_common.c:(.text+0x358c): undefined reference to `ip_send_check'
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Aaron Young <aaron.young@oracle.com>
      Cc: Rashmi Narasimhan <rashmi.narasimhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      103bde37
    • Shannon Nelson's avatar
      ionic: fix gathering of debug stats · c23bb54f
      Shannon Nelson authored
      Don't print stats for which we haven't reserved space as it can
      cause nasty memory bashing and related bad behaviors.
      
      Fixes: aa620993 ("ionic: pull per-q stats work out of queue loops")
      Signed-off-by: default avatarShannon Nelson <snelson@pensando.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c23bb54f
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/t · 3fb2a54b
      David S. Miller authored
      nguy/net-queue
      
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-09-27
      
      This series contains updates to e100 driver only.
      
      Jake corrects under allocation of register buffer due to incorrect
      calculations and fixes buffer overrun of register dump.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3fb2a54b
    • Arnd Bergmann's avatar
      dmascc: add CONFIG_VIRT_TO_BUS dependency · 05e97b3d
      Arnd Bergmann authored
      Many architectures don't define virt_to_bus() any more, as drivers
      should be using the dma-mapping interfaces where possible:
      
      In file included from drivers/net/hamradio/dmascc.c:27:
      drivers/net/hamradio/dmascc.c: In function 'tx_on':
      drivers/net/hamradio/dmascc.c:976:30: error: implicit declaration of function 'virt_to_bus'; did you mean 'virt_to_fix'? [-Werror=implicit-function-declaration]
        976 |                              virt_to_bus(priv->tx_buf[priv->tx_tail]) + n);
            |                              ^~~~~~~~~~~
      arch/arm/include/asm/dma.h:109:52: note: in definition of macro 'set_dma_addr'
        109 |         __set_dma_addr(chan, (void *)__bus_to_virt(addr))
            |                                                    ^~~~
      
      Add the Kconfig dependency to prevent this from being built on
      architectures without virt_to_bus().
      
      Fixes: bc1abb9e ("dmascc: use proper 'virt_to_bus()' rather than casting to 'int'")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05e97b3d
    • Arnd Bergmann's avatar
      net: ks8851: fix link error · 51bb08dd
      Arnd Bergmann authored
      An object file cannot be built for both loadable module and built-in
      use at the same time:
      
      arm-linux-gnueabi-ld: drivers/net/ethernet/micrel/ks8851_common.o: in function `ks8851_probe_common':
      ks8851_common.c:(.text+0xf80): undefined reference to `__this_module'
      
      Change the ks8851_common code to be a standalone module instead,
      and use Makefile logic to ensure this is built-in if at least one
      of its two users is.
      
      Fixes: 797047f8 ("net: ks8851: Implement Parallel bus operations")
      Link: https://lore.kernel.org/netdev/20210125121937.3900988-1-arnd@kernel.org/Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: default avatarMarek Vasut <marex@denx.de>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51bb08dd
    • Johan Almbladh's avatar
      bpf, x86: Fix bpf mapping of atomic fetch implementation · ced18582
      Johan Almbladh authored
      Fix the case where the dst register maps to %rax as otherwise this produces
      an incorrect mapping with the implementation in 981f94c3 ("bpf: Add
      bitwise atomic instructions") as %rax is clobbered given it's part of the
      cmpxchg as operand.
      
      The issue is similar to b29dd96b ("bpf, x86: Fix BPF_FETCH atomic and/or/
      xor with r0 as src") just that the case of dst register was missed.
      
      Before, dst=r0 (%rax) src=r2 (%rsi):
      
        [...]
        c5:   mov    %rax,%r10
        c8:   mov    0x0(%rax),%rax       <---+ (broken)
        cc:   mov    %rax,%r11                |
        cf:   and    %rsi,%r11                |
        d2:   lock cmpxchg %r11,0x0(%rax) <---+
        d8:   jne    0x00000000000000c8       |
        da:   mov    %rax,%rsi                |
        dd:   mov    %r10,%rax                |
        [...]                                 |
                                              |
      After, dst=r0 (%rax) src=r2 (%rsi):     |
                                              |
        [...]                                 |
        da:	mov    %rax,%r10                |
        dd:	mov    0x0(%r10),%rax       <---+ (fixed)
        e1:	mov    %rax,%r11                |
        e4:	and    %rsi,%r11                |
        e7:	lock cmpxchg %r11,0x0(%r10) <---+
        ed:	jne    0x00000000000000dd
        ef:	mov    %rax,%rsi
        f2:	mov    %r10,%rax
        [...]
      
      The remaining combinations were fine as-is though:
      
      After, dst=r9 (%r15) src=r0 (%rax):
      
        [...]
        dc:	mov    %rax,%r10
        df:	mov    0x0(%r15),%rax
        e3:	mov    %rax,%r11
        e6:	and    %r10,%r11
        e9:	lock cmpxchg %r11,0x0(%r15)
        ef:	jne    0x00000000000000df      _
        f1:	mov    %rax,%r10                | (unneeded, but
        f4:	mov    %r10,%rax               _|  not a problem)
        [...]
      
      After, dst=r9 (%r15) src=r4 (%rcx):
      
        [...]
        de:	mov    %rax,%r10
        e1:	mov    0x0(%r15),%rax
        e5:	mov    %rax,%r11
        e8:	and    %rcx,%r11
        eb:	lock cmpxchg %r11,0x0(%r15)
        f1:	jne    0x00000000000000e1
        f3:	mov    %rax,%rcx
        f6:	mov    %r10,%rax
        [...]
      
      The case of dst == src register is rejected by the verifier and
      therefore not supported, but x86 JIT also handles this case just
      fine.
      
      After, dst=r0 (%rax) src=r0 (%rax):
      
        [...]
        eb:	mov    %rax,%r10
        ee:	mov    0x0(%r10),%rax
        f2:	mov    %rax,%r11
        f5:	and    %r10,%r11
        f8:	lock cmpxchg %r11,0x0(%r10)
        fe:	jne    0x00000000000000ee
       100:	mov    %rax,%r10
       103:	mov    %r10,%rax
        [...]
      
      Fixes: 981f94c3 ("bpf: Add bitwise atomic instructions")
      Reported-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Signed-off-by: default avatarJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Co-developed-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarBrendan Jackman <jackmanb@google.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ced18582
    • Jiri Benc's avatar
      selftests, bpf: test_lwt_ip_encap: Really disable rp_filter · 79e2c306
      Jiri Benc authored
      It's not enough to set net.ipv4.conf.all.rp_filter=0, that does not override
      a greater rp_filter value on the individual interfaces. We also need to set
      net.ipv4.conf.default.rp_filter=0 before creating the interfaces. That way,
      they'll also get their own rp_filter value of zero.
      
      Fixes: 0fde56e4 ("selftests: bpf: add test_lwt_ip_encap selftest")
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/b1cdd9d469f09ea6e01e9c89a6071c79b7380f89.1632386362.git.jbenc@redhat.com
      79e2c306
    • Jiri Benc's avatar
      selftests, bpf: Fix makefile dependencies on libbpf · d888eaac
      Jiri Benc authored
      When building bpf selftest with make -j, I'm randomly getting build failures
      such as this one:
      
        In file included from progs/bpf_flow.c:19:
        [...]/tools/testing/selftests/bpf/tools/include/bpf/bpf_helpers.h:11:10: fatal error: 'bpf_helper_defs.h' file not found
        #include "bpf_helper_defs.h"
                 ^~~~~~~~~~~~~~~~~~~
      
      The file that fails the build varies between runs but it's always in the
      progs/ subdir.
      
      The reason is a missing make dependency on libbpf for the .o files in
      progs/. There was a dependency before commit 3ac2e20f but that commit
      removed it to prevent unneeded rebuilds. However, that only works if libbpf
      has been built already; the 'wildcard' prerequisite does not trigger when
      there's no bpf_helper_defs.h generated yet.
      
      Keep the libbpf as an order-only prerequisite to satisfy both goals. It is
      always built before the progs/ objects but it does not trigger unnecessary
      rebuilds by itself.
      
      Fixes: 3ac2e20f ("selftests/bpf: BPF object files should depend only on libbpf headers")
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/ee84ab66436fba05a197f952af23c98d90eb6243.1632758415.git.jbenc@redhat.com
      d888eaac
    • Daniel Borkmann's avatar
      bpf, test, cgroup: Use sk_{alloc,free} for test cases · 435b08ec
      Daniel Borkmann authored
      BPF test infra has some hacks in place which kzalloc() a socket and perform
      minimum init via sock_net_set() and sock_init_data(). As a result, the sk's
      skcd->cgroup is NULL since it didn't go through proper initialization as it
      would have been the case from sk_alloc(). Rather than re-adding a NULL test
      in sock_cgroup_ptr() just for this, use sk_{alloc,free}() pair for the test
      socket. The latter also allows to get rid of the bpf_sk_storage_free() special
      case.
      
      Fixes: 8520e224 ("bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode")
      Fixes: b7a1848e ("bpf: add BPF_PROG_TEST_RUN support for flow dissector")
      Fixes: 2cb494a3 ("bpf: add tests for direct packet access from CGROUP_SKB")
      Reported-by: syzbot+664b58e9a40fbb2cec71@syzkaller.appspotmail.com
      Reported-by: syzbot+33f36d0754d4c5c0e102@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Tested-by: syzbot+664b58e9a40fbb2cec71@syzkaller.appspotmail.com
      Tested-by: syzbot+33f36d0754d4c5c0e102@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/bpf/20210927123921.21535-2-daniel@iogearbox.net
      435b08ec
    • Daniel Borkmann's avatar
      bpf, cgroup: Assign cgroup in cgroup_sk_alloc when called from interrupt · 78cc316e
      Daniel Borkmann authored
      If cgroup_sk_alloc() is called from interrupt context, then just assign the
      root cgroup to skcd->cgroup. Prior to commit 8520e224 ("bpf, cgroups:
      Fix cgroup v2 fallback on v1/v2 mixed mode") we would just return, and later
      on in sock_cgroup_ptr(), we were NULL-testing the cgroup in fast-path, and
      iff indeed NULL returning the root cgroup (v ?: &cgrp_dfl_root.cgrp). Rather
      than re-adding the NULL-test to the fast-path we can just assign it once from
      cgroup_sk_alloc() given v1/v2 handling has been simplified. The migration from
      NULL test with returning &cgrp_dfl_root.cgrp to assigning &cgrp_dfl_root.cgrp
      directly does /not/ change behavior for callers of sock_cgroup_ptr().
      
      syzkaller was able to trigger a splat in the legacy netrom code base, where
      the RX handler in nr_rx_frame() calls nr_make_new() which calls sk_alloc()
      and therefore cgroup_sk_alloc() with in_interrupt() condition. Thus the NULL
      skcd->cgroup, where it trips over on cgroup_sk_free() side given it expects
      a non-NULL object. There are a few other candidates aside from netrom which
      have similar pattern where in their accept-like implementation, they just call
      to sk_alloc() and thus cgroup_sk_alloc() instead of sk_clone_lock() with the
      corresponding cgroup_sk_clone() which then inherits the cgroup from the parent
      socket. None of them are related to core protocols where BPF cgroup programs
      are running from. However, in future, they should follow to implement a similar
      inheritance mechanism.
      
      Additionally, with a !CONFIG_CGROUP_NET_PRIO and !CONFIG_CGROUP_NET_CLASSID
      configuration, the same issue was exposed also prior to 8520e224 due to
      commit e876ecc6 ("cgroup: memcg: net: do not associate sock with unrelated
      cgroup") which added the early in_interrupt() return back then.
      
      Fixes: 8520e224 ("bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode")
      Fixes: e876ecc6 ("cgroup: memcg: net: do not associate sock with unrelated cgroup")
      Reported-by: syzbot+df709157a4ecaf192b03@syzkaller.appspotmail.com
      Reported-by: syzbot+533f389d4026d86a2a95@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Tested-by: syzbot+df709157a4ecaf192b03@syzkaller.appspotmail.com
      Tested-by: syzbot+533f389d4026d86a2a95@syzkaller.appspotmail.com
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Link: https://lore.kernel.org/bpf/20210927123921.21535-1-daniel@iogearbox.net
      78cc316e
    • Kumar Kartikeya Dwivedi's avatar
      libbpf: Fix segfault in static linker for objects without BTF · bcfd367c
      Kumar Kartikeya Dwivedi authored
      When a BPF object is compiled without BTF info (without -g),
      trying to link such objects using bpftool causes a SIGSEGV due to
      btf__get_nr_types accessing obj->btf which is NULL. Fix this by
      checking for the NULL pointer, and return error.
      
      Reproducer:
      $ cat a.bpf.c
      extern int foo(void);
      int bar(void) { return foo(); }
      $ cat b.bpf.c
      int foo(void) { return 0; }
      $ clang -O2 -target bpf -c a.bpf.c
      $ clang -O2 -target bpf -c b.bpf.c
      $ bpftool gen obj out a.bpf.o b.bpf.o
      Segmentation fault (core dumped)
      
      After fix:
      $ bpftool gen obj out a.bpf.o b.bpf.o
      libbpf: failed to find BTF info for object 'a.bpf.o'
      Error: failed to link 'a.bpf.o': Unknown error -22 (-22)
      
      Fixes: a4634922 (libbpf: Add linker extern resolution support for functions and global variables)
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210924023725.70228-1-memxor@gmail.com
      bcfd367c
    • Dave Marchevsky's avatar
      MAINTAINERS: Add btf headers to BPF · b3aa173d
      Dave Marchevsky authored
      BPF folks maintain these and they're not picked up by the current
      MAINTAINERS entries.
      
      Files caught by the added globs:
      
        include/linux/btf.h
        include/linux/btf_ids.h
        include/uapi/linux/btf.h
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210924193557.3081469-1-davemarchevsky@fb.com
      b3aa173d
    • Lorenz Bauer's avatar
      bpf: Exempt CAP_BPF from checks against bpf_jit_limit · 8a98ae12
      Lorenz Bauer authored
      When introducing CAP_BPF, bpf_jit_charge_modmem() was not changed to treat
      programs with CAP_BPF as privileged for the purpose of JIT memory allocation.
      This means that a program without CAP_BPF can block a program with CAP_BPF
      from loading a program.
      
      Fix this by checking bpf_capable() in bpf_jit_charge_modmem().
      
      Fixes: 2c78ee89 ("bpf: Implement CAP_BPF")
      Signed-off-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210922111153.19843-1-lmb@cloudflare.com
      8a98ae12