1. 13 Mar, 2021 8 commits
  2. 12 Mar, 2021 15 commits
    • liuyacan's avatar
      net: correct sk_acceptq_is_full() · f211ac15
      liuyacan authored
      The "backlog" argument in listen() specifies
      the maximom length of pending connections,
      so the accept queue should be considered full
      if there are exactly "backlog" elements.
      Signed-off-by: default avatarliuyacan <yacanliu@163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f211ac15
    • David S. Miller's avatar
      Revert "net: bonding: fix error return code of bond_neigh_init()" · 080bfa1e
      David S. Miller authored
      This reverts commit 2055a99d.
      
      This change rejects legitimate configurations.
      
      A slave doesn't need to exist nor implement ndo_slave_setup.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      080bfa1e
    • Li RongQing's avatar
      igb: avoid premature Rx buffer reuse · 98dfb02a
      Li RongQing authored
      Igb needs a similar fix as commit 75aab4e1 ("i40e: avoid
      premature Rx buffer reuse")
      
      The page recycle code, incorrectly, relied on that a page fragment
      could not be freed inside xdp_do_redirect(). This assumption leads to
      that page fragments that are used by the stack/XDP redirect can be
      reused and overwritten.
      
      To avoid this, store the page count prior invoking xdp_do_redirect().
      
      Longer explanation:
      
      Intel NICs have a recycle mechanism. The main idea is that a page is
      split into two parts. One part is owned by the driver, one part might
      be owned by someone else, such as the stack.
      
      t0: Page is allocated, and put on the Rx ring
                    +---------------
      used by NIC ->| upper buffer
      (rx_buffer)   +---------------
                    | lower buffer
                    +---------------
        page count  == USHRT_MAX
        rx_buffer->pagecnt_bias == USHRT_MAX
      
      t1: Buffer is received, and passed to the stack (e.g.)
                    +---------------
                    | upper buff (skb)
                    +---------------
      used by NIC ->| lower buffer
      (rx_buffer)   +---------------
        page count  == USHRT_MAX
        rx_buffer->pagecnt_bias == USHRT_MAX - 1
      
      t2: Buffer is received, and redirected
                    +---------------
                    | upper buff (skb)
                    +---------------
      used by NIC ->| lower buffer
      (rx_buffer)   +---------------
      
      Now, prior calling xdp_do_redirect():
        page count  == USHRT_MAX
        rx_buffer->pagecnt_bias == USHRT_MAX - 2
      
      This means that buffer *cannot* be flipped/reused, because the skb is
      still using it.
      
      The problem arises when xdp_do_redirect() actually frees the
      segment. Then we get:
        page count  == USHRT_MAX - 1
        rx_buffer->pagecnt_bias == USHRT_MAX - 2
      
      From a recycle perspective, the buffer can be flipped and reused,
      which means that the skb data area is passed to the Rx HW ring!
      
      To work around this, the page count is stored prior calling
      xdp_do_redirect().
      
      Fixes: 9cbc948b ("igb: add XDP support")
      Signed-off-by: default avatarLi RongQing <lirongqing@baidu.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Tested-by: default avatarVishakha Jambekar <vishakha.jambekar@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      98dfb02a
    • Maciej Fijalkowski's avatar
      ixgbe: move headroom initialization to ixgbe_configure_rx_ring · 76064573
      Maciej Fijalkowski authored
      ixgbe_rx_offset(), that is supposed to initialize the Rx buffer headroom,
      relies on __IXGBE_RX_BUILD_SKB_ENABLED flag.
      
      Currently, the callsite of mentioned function is placed incorrectly
      within ixgbe_setup_rx_resources() where Rx ring's build skb flag is not
      set yet. This causes the XDP_REDIRECT to be partially broken due to
      inability to create xdp_frame in the headroom space, as the headroom is
      0.
      
      Fix this by moving ixgbe_rx_offset() to ixgbe_configure_rx_ring() after
      the flag setting, which happens to be set in ixgbe_set_rx_buffer_len.
      
      Fixes: c0d4e9d2 ("ixgbe: store the result of ixgbe_rx_offset() onto ixgbe_ring")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarVishakha Jambekar <vishakha.jambekar@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      76064573
    • Maciej Fijalkowski's avatar
      ice: move headroom initialization to ice_setup_rx_ctx · 89861c48
      Maciej Fijalkowski authored
      ice_rx_offset(), that is supposed to initialize the Rx buffer headroom,
      relies on ICE_RX_FLAGS_RING_BUILD_SKB flag as well as XDP prog presence.
      
      Currently, the callsite of mentioned function is placed incorrectly
      within ice_setup_rx_ring() where Rx ring's build skb flag is not
      set yet. This causes the XDP_REDIRECT to be partially broken due to
      inability to create xdp_frame in the headroom space, as the headroom is
      0.
      
      Fix this by moving ice_rx_offset() to ice_setup_rx_ctx() after the flag
      setting.
      
      Fixes: f1b1f409 ("ice: store the result of ice_rx_offset() onto ice_ring")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      89861c48
    • Maciej Fijalkowski's avatar
      i40e: move headroom initialization to i40e_configure_rx_ring · a8660626
      Maciej Fijalkowski authored
      i40e_rx_offset(), that is supposed to initialize the Rx buffer headroom,
      relies on I40E_RXR_FLAGS_BUILD_SKB_ENABLED flag.
      
      Currently, the callsite of mentioned function is placed incorrectly
      within i40e_setup_rx_descriptors() where Rx ring's build skb flag is not
      set yet. This causes the XDP_REDIRECT to be partially broken due to
      inability to create xdp_frame in the headroom space, as the headroom is
      0.
      
      For the record, below is the call graph:
      
      i40e_vsi_open
       i40e_vsi_setup_rx_resources
        i40e_setup_rx_descriptors
         i40e_rx_offset() <-- sets offset to 0 as build_skb flag is set below
      
       i40e_vsi_configure_rx
        i40e_configure_rx_ring
         set_ring_build_skb_enabled(ring) <-- set build_skb flag
      
      Fix this by moving i40e_rx_offset() to i40e_configure_rx_ring() after
      the flag setting.
      
      Fixes: f7bb0d71 ("i40e: store the result of i40e_rx_offset() onto i40e_ring")
      Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Co-developed-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Tested-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Tested-by: default avatarKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      a8660626
    • Magnus Karlsson's avatar
      ice: fix napi work done reporting in xsk path · ed0907e3
      Magnus Karlsson authored
      Fix the wrong napi work done reporting in the xsk path of the ice
      driver. The code in the main Rx processing loop was written to assume
      that the buffer allocation code returns true if all allocations where
      successful and false if not. In contrast with all other Intel NIC xsk
      drivers, the ice_alloc_rx_bufs_zc() has the inverted logic messing up
      the work done reporting in the napi loop.
      
      This can be fixed either by inverting the return value from
      ice_alloc_rx_bufs_zc() in the function that uses this in an incorrect
      way, or by changing the return value of ice_alloc_rx_bufs_zc(). We
      chose the latter as it makes all the xsk allocation functions for
      Intel NICs behave in the same way. My guess is that it was this
      unexpected discrepancy that gave rise to this bug in the first place.
      
      Fixes: 5bb0c4b5 ("ice, xsk: Move Rx allocation out of while-loop")
      Reported-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Tested-by: default avatarKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      ed0907e3
    • David S. Miller's avatar
      Merge branch 'htb-fixes' · 451b2596
      David S. Miller authored
      Maxim Mikityanskiy says:
      
      ====================
      Bugfixes for HTB
      
      The HTB offload feature introduced a few bugs in HTB. One affects the
      non-offload mode, preventing attaching qdiscs to HTB classes, and the
      other affects the error flow, when the netdev doesn't support the
      offload, but it was requested. This short series fixes them.
      ====================
      Acked-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      451b2596
    • Maxim Mikityanskiy's avatar
      sch_htb: Fix offload cleanup in htb_destroy on htb_init failure · fb3a3e37
      Maxim Mikityanskiy authored
      htb_init may fail to do the offload if it's not supported or if a
      runtime error happens when allocating direct qdiscs. In those cases
      TC_HTB_CREATE command is not sent to the driver, however, htb_destroy
      gets called anyway and attempts to send TC_HTB_DESTROY.
      
      It shouldn't happen, because the driver didn't receive TC_HTB_CREATE,
      and also because the driver may not support ndo_setup_tc at all, while
      q->offload is true, and htb_destroy mistakenly thinks the offload is
      supported. Trying to call ndo_setup_tc in the latter case will lead to a
      NULL pointer dereference.
      
      This commit fixes the issues with htb_destroy by deferring assignment of
      q->offload until after the TC_HTB_CREATE command. The necessary cleanup
      of the offload entities is already done in htb_init.
      
      Reported-by: syzbot+b53a709f04722ca12a3c@syzkaller.appspotmail.com
      Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb3a3e37
    • Maxim Mikityanskiy's avatar
      sch_htb: Fix select_queue for non-offload mode · 93bde210
      Maxim Mikityanskiy authored
      htb_select_queue assumes it's always the offload mode, and it ends up in
      calling ndo_setup_tc without any checks. It may lead to a NULL pointer
      dereference if ndo_setup_tc is not implemented, or to an error returned
      from the driver, which will prevent attaching qdiscs to HTB classes in
      the non-offload mode.
      
      This commit fixes the bug by adding the missing check to
      htb_select_queue. In the non-offload mode it will return sch->dev_queue,
      mimicking tc_modify_qdisc's behavior for the case where select_queue is
      not implemented.
      
      Reported-by: syzbot+b53a709f04722ca12a3c@syzkaller.appspotmail.com
      Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93bde210
    • Florian Fainelli's avatar
      net: phy: broadcom: Add power down exit reset state delay · 7a1468ba
      Florian Fainelli authored
      Per the datasheet, when we clear the power down bit, the PHY remains in
      an internal reset state for 40us and then resume normal operation.
      Account for that delay to avoid any issues in the future if
      genphy_resume() changes.
      
      Fixes: fe26821f ("net: phy: broadcom: Wire suspend/resume for BCM54810")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a1468ba
    • Tong Zhang's avatar
      mISDN: fix crash in fritzpci · a9f81244
      Tong Zhang authored
      setup_fritz() in avmfritz.c might fail with -EIO and in this case the
      isac.type and isac.write_reg is not initialized and remains 0(NULL).
      A subsequent call to isac_release() will dereference isac->write_reg and
      crash.
      
      [    1.737444] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [    1.737809] #PF: supervisor instruction fetch in kernel mode
      [    1.738106] #PF: error_code(0x0010) - not-present page
      [    1.738378] PGD 0 P4D 0
      [    1.738515] Oops: 0010 [#1] SMP NOPTI
      [    1.738711] CPU: 0 PID: 180 Comm: systemd-udevd Not tainted 5.12.0-rc2+ #78
      [    1.739077] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-p
      rebuilt.qemu.org 04/01/2014
      [    1.739664] RIP: 0010:0x0
      [    1.739807] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
      [    1.740200] RSP: 0018:ffffc9000027ba10 EFLAGS: 00010202
      [    1.740478] RAX: 0000000000000000 RBX: ffff888102f41840 RCX: 0000000000000027
      [    1.740853] RDX: 00000000000000ff RSI: 0000000000000020 RDI: ffff888102f41800
      [    1.741226] RBP: ffffc9000027ba20 R08: ffff88817bc18440 R09: ffffc9000027b808
      [    1.741600] R10: 0000000000000001 R11: 0000000000000001 R12: ffff888102f41840
      [    1.741976] R13: 00000000fffffffb R14: ffff888102f41800 R15: ffff8881008b0000
      [    1.742351] FS:  00007fda3a38a8c0(0000) GS:ffff88817bc00000(0000) knlGS:0000000000000000
      [    1.742774] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    1.743076] CR2: ffffffffffffffd6 CR3: 00000001021ec000 CR4: 00000000000006f0
      [    1.743452] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [    1.743828] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [    1.744206] Call Trace:
      [    1.744339]  isac_release+0xcc/0xe0 [mISDNipac]
      [    1.744582]  fritzpci_probe.cold+0x282/0x739 [avmfritz]
      [    1.744861]  local_pci_probe+0x48/0x80
      [    1.745063]  pci_device_probe+0x10f/0x1c0
      [    1.745278]  really_probe+0xfb/0x420
      [    1.745471]  driver_probe_device+0xe9/0x160
      [    1.745693]  device_driver_attach+0x5d/0x70
      [    1.745917]  __driver_attach+0x8f/0x150
      [    1.746123]  ? device_driver_attach+0x70/0x70
      [    1.746354]  bus_for_each_dev+0x7e/0xc0
      [    1.746560]  driver_attach+0x1e/0x20
      [    1.746751]  bus_add_driver+0x152/0x1f0
      [    1.746957]  driver_register+0x74/0xd0
      [    1.747157]  ? 0xffffffffc00d8000
      [    1.747334]  __pci_register_driver+0x54/0x60
      [    1.747562]  AVM_init+0x36/0x1000 [avmfritz]
      [    1.747791]  do_one_initcall+0x48/0x1d0
      [    1.747997]  ? __cond_resched+0x19/0x30
      [    1.748206]  ? kmem_cache_alloc_trace+0x390/0x440
      [    1.748458]  ? do_init_module+0x28/0x250
      [    1.748669]  do_init_module+0x62/0x250
      [    1.748870]  load_module+0x23ee/0x26a0
      [    1.749073]  __do_sys_finit_module+0xc2/0x120
      [    1.749307]  ? __do_sys_finit_module+0xc2/0x120
      [    1.749549]  __x64_sys_finit_module+0x1a/0x20
      [    1.749782]  do_syscall_64+0x38/0x90
      Signed-off-by: default avatarTong Zhang <ztong0001@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9f81244
    • Lv Yunlong's avatar
      net/qlcnic: Fix a use after free in qlcnic_83xx_get_minidump_template · db74623a
      Lv Yunlong authored
      In qlcnic_83xx_get_minidump_template, fw_dump->tmpl_hdr was freed by
      vfree(). But unfortunately, it is used when extended is true.
      
      Fixes: 7061b2bd ("qlogic: Deletion of unnecessary checks before two function calls")
      Signed-off-by: default avatarLv Yunlong <lyl2019@mail.ustc.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db74623a
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · ce6c13e4
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-03-11
      
      This series contains updates to igc and e1000e drivers.
      
      Sasha adds locking to reset task to prevent race condition for igc.
      
      Muhammad fixes reporting of supported pause frame as well as advertised
      pause frame for Tx/Rx off for igc.
      
      Andre fixes timestamp retrieval from the wrong timer for igc.
      
      Vitaly adds locking to reset task to prevent race condition for e1000e.
      
      Dinghao Liu adds a missed check to return on error in
      e1000_set_d0_lplu_state_82571.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce6c13e4
    • Tonghao Zhang's avatar
      net: sock: simplify tw proto registration · b80350f3
      Tonghao Zhang authored
      Introduce the new function tw_prot_init (inspired by
      req_prot_init) to simplify "proto_register" function.
      
      tw_prot_cleanup will take care of a partially initialized
      timewait_sock_ops.
      Signed-off-by: default avatarTonghao Zhang <xiangxia.m.yue@gmail.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b80350f3
  3. 11 Mar, 2021 7 commits
  4. 10 Mar, 2021 10 commits
    • Florian Fainelli's avatar
      net: dsa: b53: VLAN filtering is global to all users · d45c36ba
      Florian Fainelli authored
      The bcm_sf2 driver uses the b53 driver as a library but does not make
      usre of the b53_setup() function, this made it fail to inherit the
      vlan_filtering_is_global attribute. Fix this by moving the assignment to
      b53_switch_alloc() which is used by bcm_sf2.
      
      Fixes: 7228b23e ("net: dsa: b53: Let DSA handle mismatched VLAN filtering settings")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d45c36ba
    • Eric Dumazet's avatar
      net: sched: validate stab values · e323d865
      Eric Dumazet authored
      iproute2 package is well behaved, but malicious user space can
      provide illegal shift values and trigger UBSAN reports.
      
      Add stab parameter to red_check_params() to validate user input.
      
      syzbot reported:
      
      UBSAN: shift-out-of-bounds in ./include/net/red.h:312:18
      shift exponent 111 is too large for 64-bit type 'long unsigned int'
      CPU: 1 PID: 14662 Comm: syz-executor.3 Not tainted 5.12.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x141/0x1d7 lib/dump_stack.c:120
       ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
       __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327
       red_calc_qavg_from_idle_time include/net/red.h:312 [inline]
       red_calc_qavg include/net/red.h:353 [inline]
       choke_enqueue.cold+0x18/0x3dd net/sched/sch_choke.c:221
       __dev_xmit_skb net/core/dev.c:3837 [inline]
       __dev_queue_xmit+0x1943/0x2e00 net/core/dev.c:4150
       neigh_hh_output include/net/neighbour.h:499 [inline]
       neigh_output include/net/neighbour.h:508 [inline]
       ip6_finish_output2+0x911/0x1700 net/ipv6/ip6_output.c:117
       __ip6_finish_output net/ipv6/ip6_output.c:182 [inline]
       __ip6_finish_output+0x4c1/0xe10 net/ipv6/ip6_output.c:161
       ip6_finish_output+0x35/0x200 net/ipv6/ip6_output.c:192
       NF_HOOK_COND include/linux/netfilter.h:290 [inline]
       ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:215
       dst_output include/net/dst.h:448 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       NF_HOOK include/linux/netfilter.h:295 [inline]
       ip6_xmit+0x127e/0x1eb0 net/ipv6/ip6_output.c:320
       inet6_csk_xmit+0x358/0x630 net/ipv6/inet6_connection_sock.c:135
       dccp_transmit_skb+0x973/0x12c0 net/dccp/output.c:138
       dccp_send_reset+0x21b/0x2b0 net/dccp/output.c:535
       dccp_finish_passive_close net/dccp/proto.c:123 [inline]
       dccp_finish_passive_close+0xed/0x140 net/dccp/proto.c:118
       dccp_terminate_connection net/dccp/proto.c:958 [inline]
       dccp_close+0xb3c/0xe60 net/dccp/proto.c:1028
       inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
       inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:478
       __sock_release+0xcd/0x280 net/socket.c:599
       sock_close+0x18/0x20 net/socket.c:1258
       __fput+0x288/0x920 fs/file_table.c:280
       task_work_run+0xdd/0x1a0 kernel/task_work.c:140
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
      
      Fixes: 8afa10cb ("net_sched: red: Avoid illegal values")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e323d865
    • David S. Miller's avatar
    • Rafał Miłecki's avatar
      net: dsa: bcm_sf2: use 2 Gbps IMP port link on BCM4908 · 8373a0fe
      Rafał Miłecki authored
      BCM4908 uses 2 Gbps link between switch and the Ethernet interface.
      Without this BCM4908 devices were able to achieve only 2 x ~895 Mb/s.
      This allows handling e.g. NAT traffic with 940 Mb/s.
      Signed-off-by: default avatarRafał Miłecki <rafal@milecki.pl>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8373a0fe
    • Pavel Andrianov's avatar
      net: pxa168_eth: Fix a potential data race in pxa168_eth_remove · 0571a753
      Pavel Andrianov authored
      pxa168_eth_remove() firstly calls unregister_netdev(),
      then cancels a timeout work. unregister_netdev() shuts down a device
      interface and removes it from the kernel tables. If the timeout occurs
      in parallel, the timeout work (pxa168_eth_tx_timeout_task) performs stop
      and open of the device. It may lead to an inconsistent state and memory
      leaks.
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: default avatarPavel Andrianov <andrianov@ispras.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0571a753
    • Eric Dumazet's avatar
      macvlan: macvlan_count_rx() needs to be aware of preemption · dd4fa1da
      Eric Dumazet authored
      macvlan_count_rx() can be called from process context, it is thus
      necessary to disable preemption before calling u64_stats_update_begin()
      
      syzbot was able to spot this on 32bit arch:
      
      WARNING: CPU: 1 PID: 4632 at include/linux/seqlock.h:271 __seqprop_assert include/linux/seqlock.h:271 [inline]
      WARNING: CPU: 1 PID: 4632 at include/linux/seqlock.h:271 __seqprop_assert.constprop.0+0xf0/0x11c include/linux/seqlock.h:269
      Modules linked in:
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 1 PID: 4632 Comm: kworker/1:3 Not tainted 5.12.0-rc2-syzkaller #0
      Hardware name: ARM-Versatile Express
      Workqueue: events macvlan_process_broadcast
      Backtrace:
      [<82740468>] (dump_backtrace) from [<827406dc>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
       r7:00000080 r6:60000093 r5:00000000 r4:8422a3c4
      [<827406c4>] (show_stack) from [<82751b58>] (__dump_stack lib/dump_stack.c:79 [inline])
      [<827406c4>] (show_stack) from [<82751b58>] (dump_stack+0xb8/0xe8 lib/dump_stack.c:120)
      [<82751aa0>] (dump_stack) from [<82741270>] (panic+0x130/0x378 kernel/panic.c:231)
       r7:830209b4 r6:84069ea4 r5:00000000 r4:844350d0
      [<82741140>] (panic) from [<80244924>] (__warn+0xb0/0x164 kernel/panic.c:605)
       r3:8404ec8c r2:00000000 r1:00000000 r0:830209b4
       r7:0000010f
      [<80244874>] (__warn) from [<82741520>] (warn_slowpath_fmt+0x68/0xd4 kernel/panic.c:628)
       r7:81363f70 r6:0000010f r5:83018e50 r4:00000000
      [<827414bc>] (warn_slowpath_fmt) from [<81363f70>] (__seqprop_assert include/linux/seqlock.h:271 [inline])
      [<827414bc>] (warn_slowpath_fmt) from [<81363f70>] (__seqprop_assert.constprop.0+0xf0/0x11c include/linux/seqlock.h:269)
       r8:5a109000 r7:0000000f r6:a568dac0 r5:89802300 r4:00000001
      [<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (u64_stats_update_begin include/linux/u64_stats_sync.h:128 [inline])
      [<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (macvlan_count_rx include/linux/if_macvlan.h:47 [inline])
      [<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (macvlan_broadcast+0x154/0x26c drivers/net/macvlan.c:291)
       r5:89802300 r4:8a927740
      [<8136499c>] (macvlan_broadcast) from [<81365020>] (macvlan_process_broadcast+0x258/0x2d0 drivers/net/macvlan.c:317)
       r10:81364f78 r9:8a86d000 r8:8a9c7e7c r7:8413aa5c r6:00000000 r5:00000000
       r4:89802840
      [<81364dc8>] (macvlan_process_broadcast) from [<802696a4>] (process_one_work+0x2d4/0x998 kernel/workqueue.c:2275)
       r10:00000008 r9:8404ec98 r8:84367a02 r7:ddfe6400 r6:ddfe2d40 r5:898dac80
       r4:8a86d43c
      [<802693d0>] (process_one_work) from [<80269dcc>] (worker_thread+0x64/0x54c kernel/workqueue.c:2421)
       r10:00000008 r9:8a9c6000 r8:84006d00 r7:ddfe2d78 r6:898dac94 r5:ddfe2d40
       r4:898dac80
      [<80269d68>] (worker_thread) from [<80271f40>] (kthread+0x184/0x1a4 kernel/kthread.c:292)
       r10:85247e64 r9:898dac80 r8:80269d68 r7:00000000 r6:8a9c6000 r5:89a2ee40
       r4:8a97bd00
      [<80271dbc>] (kthread) from [<80200114>] (ret_from_fork+0x14/0x20 arch/arm/kernel/entry-common.S:158)
      Exception stack(0x8a9c7fb0 to 0x8a9c7ff8)
      
      Fixes: 412ca155 ("macvlan: Move broadcasts into a work queue")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd4fa1da
    • Ido Schimmel's avatar
      drop_monitor: Perform cleanup upon probe registration failure · 9398e9c0
      Ido Schimmel authored
      In the rare case that drop_monitor fails to register its probe on the
      'napi_poll' tracepoint, it will not deactivate its hysteresis timer as
      part of the error path. If the hysteresis timer was armed by the shortly
      lived 'kfree_skb' probe and user space retries to initiate tracing, a
      warning will be emitted for trying to initialize an active object [1].
      
      Fix this by properly undoing all the operations that were done prior to
      probe registration, in both software and hardware code paths.
      
      Note that syzkaller managed to fail probe registration by injecting a
      slab allocation failure [2].
      
      [1]
      ODEBUG: init active (active state 0) object type: timer_list hint: sched_send_work+0x0/0x60 include/linux/list.h:135
      WARNING: CPU: 1 PID: 8649 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
      Modules linked in:
      CPU: 1 PID: 8649 Comm: syz-executor.0 Not tainted 5.11.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
      [...]
      Call Trace:
       __debug_object_init+0x524/0xd10 lib/debugobjects.c:588
       debug_timer_init kernel/time/timer.c:722 [inline]
       debug_init kernel/time/timer.c:770 [inline]
       init_timer_key+0x2d/0x340 kernel/time/timer.c:814
       net_dm_trace_on_set net/core/drop_monitor.c:1111 [inline]
       set_all_monitor_traces net/core/drop_monitor.c:1188 [inline]
       net_dm_monitor_start net/core/drop_monitor.c:1295 [inline]
       net_dm_cmd_trace+0x720/0x1220 net/core/drop_monitor.c:1339
       genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:739
       genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
       genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:800
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
       netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
       netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2348
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2402
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2435
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [2]
       FAULT_INJECTION: forcing a failure.
       name failslab, interval 1, probability 0, space 0, times 1
       CPU: 1 PID: 8645 Comm: syz-executor.0 Not tainted 5.11.0-syzkaller #0
       Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       Call Trace:
        dump_stack+0xfa/0x151
        should_fail.cold+0x5/0xa
        should_failslab+0x5/0x10
        __kmalloc+0x72/0x3f0
        tracepoint_add_func+0x378/0x990
        tracepoint_probe_register+0x9c/0xe0
        net_dm_cmd_trace+0x7fc/0x1220
        genl_family_rcv_msg_doit+0x228/0x320
        genl_rcv_msg+0x328/0x580
        netlink_rcv_skb+0x153/0x420
        genl_rcv+0x24/0x40
        netlink_unicast+0x533/0x7d0
        netlink_sendmsg+0x856/0xd90
        sock_sendmsg+0xcf/0x120
        ____sys_sendmsg+0x6e8/0x810
        ___sys_sendmsg+0xf3/0x170
        __sys_sendmsg+0xe5/0x1b0
        do_syscall_64+0x2d/0x70
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: 70c69274 ("drop_monitor: Initialize timer and work item upon tracing enable")
      Fixes: 8ee2267a ("drop_monitor: Convert to using devlink tracepoint")
      Reported-by: syzbot+779559d6503f3a56213d@syzkaller.appspotmail.com
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9398e9c0
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 547fd083
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2021-03-10
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 8 non-merge commits during the last 5 day(s) which contain
      a total of 11 files changed, 136 insertions(+), 17 deletions(-).
      
      The main changes are:
      
      1) Reject bogus use of vmlinux BTF as map/prog creation BTF, from Alexei Starovoitov.
      
      2) Fix allocation failure splat in x86 JIT for large progs. Also fix overwriting
         percpu cgroup storage from tracing programs when nested, from Yonghong Song.
      
      3) Fix rx queue retrieval in XDP for multi-queue veth, from Maciej Fijalkowski.
      
      4) Fix bpf_check_mtu() helper API before freeze to have mtu_len as custom skb/xdp
         L3 input length, from Jesper Dangaard Brouer.
      
      5) Fix inode_storage's lookup_elem return value upon having bad fd, from Tal Lossos.
      
      6) Fix bpftool and libbpf cross-build on MacOS, from Georgi Valkov.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      547fd083
    • Wei Wang's avatar
      ipv6: fix suspecious RCU usage warning · 28259bac
      Wei Wang authored
      Syzbot reported the suspecious RCU usage in nexthop_fib6_nh() when
      called from ipv6_route_seq_show(). The reason is ipv6_route_seq_start()
      calls rcu_read_lock_bh(), while nexthop_fib6_nh() calls
      rcu_dereference_rtnl().
      The fix proposed is to add a variant of nexthop_fib6_nh() to use
      rcu_dereference_bh_rtnl() for ipv6_route_seq_show().
      
      The reported trace is as follows:
      ./include/net/nexthop.h:416 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by syz-executor.0/17895:
           at: seq_read+0x71/0x12a0 fs/seq_file.c:169
           at: seq_file_net include/linux/seq_file_net.h:19 [inline]
           at: ipv6_route_seq_start+0xaf/0x300 net/ipv6/ip6_fib.c:2616
      
      stack backtrace:
      CPU: 1 PID: 17895 Comm: syz-executor.0 Not tainted 4.15.0-syzkaller #0
      Call Trace:
       [<ffffffff849edf9e>] __dump_stack lib/dump_stack.c:17 [inline]
       [<ffffffff849edf9e>] dump_stack+0xd8/0x147 lib/dump_stack.c:53
       [<ffffffff8480b7fa>] lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5745
       [<ffffffff8459ada6>] nexthop_fib6_nh include/net/nexthop.h:416 [inline]
       [<ffffffff8459ada6>] ipv6_route_native_seq_show net/ipv6/ip6_fib.c:2488 [inline]
       [<ffffffff8459ada6>] ipv6_route_seq_show+0x436/0x7a0 net/ipv6/ip6_fib.c:2673
       [<ffffffff81c556df>] seq_read+0xccf/0x12a0 fs/seq_file.c:276
       [<ffffffff81dbc62c>] proc_reg_read+0x10c/0x1d0 fs/proc/inode.c:231
       [<ffffffff81bc28ae>] do_loop_readv_writev fs/read_write.c:714 [inline]
       [<ffffffff81bc28ae>] do_loop_readv_writev fs/read_write.c:701 [inline]
       [<ffffffff81bc28ae>] do_iter_read+0x49e/0x660 fs/read_write.c:935
       [<ffffffff81bc81ab>] vfs_readv+0xfb/0x170 fs/read_write.c:997
       [<ffffffff81c88847>] kernel_readv fs/splice.c:361 [inline]
       [<ffffffff81c88847>] default_file_splice_read+0x487/0x9c0 fs/splice.c:416
       [<ffffffff81c86189>] do_splice_to+0x129/0x190 fs/splice.c:879
       [<ffffffff81c86f66>] splice_direct_to_actor+0x256/0x890 fs/splice.c:951
       [<ffffffff81c8777d>] do_splice_direct+0x1dd/0x2b0 fs/splice.c:1060
       [<ffffffff81bc4747>] do_sendfile+0x597/0xce0 fs/read_write.c:1459
       [<ffffffff81bca205>] SYSC_sendfile64 fs/read_write.c:1520 [inline]
       [<ffffffff81bca205>] SyS_sendfile64+0x155/0x170 fs/read_write.c:1506
       [<ffffffff81015fcf>] do_syscall_64+0x1ff/0x310 arch/x86/entry/common.c:305
       [<ffffffff84a00076>] entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Fixes: f88d8ea6 ("ipv6: Plumb support for nexthop object in a fib6_info")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Ido Schimmel <idosch@idosch.org>
      Cc: Petr Machata <petrm@nvidia.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28259bac
    • David S. Miller's avatar
      Merge branch 'ip6ip6-crash' · c89489b4
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      Fix ip6ip6 crash for collect_md skbs
      
      Fix a NULL pointer deref panic I ran into for regular ip6ip6 tunnel devices
      when collect_md populated skbs were redirected to them for xmit. See patches
      for further details, thanks!
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c89489b4