1. 19 Jun, 2020 29 commits
    • Björn Töpel's avatar
      i40e: fix crash when Rx descriptor count is changed · 3995ecba
      Björn Töpel authored
      When the AF_XDP buffer allocator was introduced, the Rx SW ring
      "rx_bi" allocation was moved from i40e_setup_rx_descriptors()
      function, and was instead done in the i40e_configure_rx_ring()
      function.
      
      This broke the ethtool set_ringparam() hook for changing the Rx
      descriptor count, which was relying on i40e_setup_rx_descriptors() to
      handle the allocation.
      
      Fix this by adding an explicit i40e_alloc_rx_bi() call to
      i40e_set_ringparam().
      
      Fixes: be1222b5 ("i40e: Separate kernel allocated rx_bi rings from AF_XDP rings")
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      3995ecba
    • Ciara Loftus's avatar
      ice: protect ring accesses with WRITE_ONCE · b1d95cc2
      Ciara Loftus authored
      The READ_ONCE macro is used when reading rings prior to accessing the
      statistics pointer. The corresponding WRITE_ONCE usage when allocating and
      freeing the rings to ensure protected access was not in place. Introduce
      this.
      Signed-off-by: default avatarCiara Loftus <ciara.loftus@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b1d95cc2
    • Ciara Loftus's avatar
      i40e: protect ring accesses with READ- and WRITE_ONCE · d59e2679
      Ciara Loftus authored
      READ_ONCE should be used when reading rings prior to accessing the
      statistics pointer. Introduce this as well as the corresponding WRITE_ONCE
      usage when allocating and freeing the rings, to ensure protected access.
      Signed-off-by: default avatarCiara Loftus <ciara.loftus@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d59e2679
    • Ciara Loftus's avatar
      ixgbe: protect ring accesses with READ- and WRITE_ONCE · f140ad9f
      Ciara Loftus authored
      READ_ONCE should be used when reading rings prior to accessing the
      statistics pointer. Introduce this as well as the corresponding WRITE_ONCE
      usage when allocating and freeing the rings, to ensure protected access.
      Signed-off-by: default avatarCiara Loftus <ciara.loftus@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      f140ad9f
    • Eric Dumazet's avatar
      net: increment xmit_recursion level in dev_direct_xmit() · 0ad6f6e7
      Eric Dumazet authored
      Back in commit f60e5990 ("ipv6: protect skb->sk accesses
      from recursive dereference inside the stack") Hannes added code
      so that IPv6 stack would not trust skb->sk for typical cases
      where packet goes through 'standard' xmit path (__dev_queue_xmit())
      
      Alas af_packet had a dev_direct_xmit() path that was not
      dealing yet with xmit_recursion level.
      
      Also change sk_mc_loop() to dump a stack once only.
      
      Without this patch, syzbot was able to trigger :
      
      [1]
      [  153.567378] WARNING: CPU: 7 PID: 11273 at net/core/sock.c:721 sk_mc_loop+0x51/0x70
      [  153.567378] Modules linked in: nfnetlink ip6table_raw ip6table_filter iptable_raw iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 iptable_filter macsec macvtap tap macvlan 8021q hsr wireguard libblake2s blake2s_x86_64 libblake2s_generic udp_tunnel ip6_udp_tunnel libchacha20poly1305 poly1305_x86_64 chacha_x86_64 libchacha curve25519_x86_64 libcurve25519_generic netdevsim batman_adv dummy team bridge stp llc w1_therm wire i2c_mux_pca954x i2c_mux cdc_acm ehci_pci ehci_hcd mlx4_en mlx4_ib ib_uverbs ib_core mlx4_core
      [  153.567386] CPU: 7 PID: 11273 Comm: b159172088 Not tainted 5.8.0-smp-DEV #273
      [  153.567387] RIP: 0010:sk_mc_loop+0x51/0x70
      [  153.567388] Code: 66 83 f8 0a 75 24 0f b6 4f 12 b8 01 00 00 00 31 d2 d3 e0 a9 bf ef ff ff 74 07 48 8b 97 f0 02 00 00 0f b6 42 3a 83 e0 01 5d c3 <0f> 0b b8 01 00 00 00 5d c3 0f b6 87 18 03 00 00 5d c0 e8 04 83 e0
      [  153.567388] RSP: 0018:ffff95c69bb93990 EFLAGS: 00010212
      [  153.567388] RAX: 0000000000000011 RBX: ffff95c6e0ee3e00 RCX: 0000000000000007
      [  153.567389] RDX: ffff95c69ae50000 RSI: ffff95c6c30c3000 RDI: ffff95c6c30c3000
      [  153.567389] RBP: ffff95c69bb93990 R08: ffff95c69a77f000 R09: 0000000000000008
      [  153.567389] R10: 0000000000000040 R11: 00003e0e00026128 R12: ffff95c6c30c3000
      [  153.567390] R13: ffff95c6cc4fd500 R14: ffff95c6f84500c0 R15: ffff95c69aa13c00
      [  153.567390] FS:  00007fdc3a283700(0000) GS:ffff95c6ff9c0000(0000) knlGS:0000000000000000
      [  153.567390] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  153.567391] CR2: 00007ffee758e890 CR3: 0000001f9ba20003 CR4: 00000000001606e0
      [  153.567391] Call Trace:
      [  153.567391]  ip6_finish_output2+0x34e/0x550
      [  153.567391]  __ip6_finish_output+0xe7/0x110
      [  153.567391]  ip6_finish_output+0x2d/0xb0
      [  153.567392]  ip6_output+0x77/0x120
      [  153.567392]  ? __ip6_finish_output+0x110/0x110
      [  153.567392]  ip6_local_out+0x3d/0x50
      [  153.567392]  ipvlan_queue_xmit+0x56c/0x5e0
      [  153.567393]  ? ksize+0x19/0x30
      [  153.567393]  ipvlan_start_xmit+0x18/0x50
      [  153.567393]  dev_direct_xmit+0xf3/0x1c0
      [  153.567393]  packet_direct_xmit+0x69/0xa0
      [  153.567394]  packet_sendmsg+0xbf0/0x19b0
      [  153.567394]  ? plist_del+0x62/0xb0
      [  153.567394]  sock_sendmsg+0x65/0x70
      [  153.567394]  sock_write_iter+0x93/0xf0
      [  153.567394]  new_sync_write+0x18e/0x1a0
      [  153.567395]  __vfs_write+0x29/0x40
      [  153.567395]  vfs_write+0xb9/0x1b0
      [  153.567395]  ksys_write+0xb1/0xe0
      [  153.567395]  __x64_sys_write+0x1a/0x20
      [  153.567395]  do_syscall_64+0x43/0x70
      [  153.567396]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  153.567396] RIP: 0033:0x453549
      [  153.567396] Code: Bad RIP value.
      [  153.567396] RSP: 002b:00007fdc3a282cc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  153.567397] RAX: ffffffffffffffda RBX: 00000000004d32d0 RCX: 0000000000453549
      [  153.567397] RDX: 0000000000000020 RSI: 0000000020000300 RDI: 0000000000000003
      [  153.567398] RBP: 00000000004d32d8 R08: 0000000000000000 R09: 0000000000000000
      [  153.567398] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d32dc
      [  153.567398] R13: 00007ffee742260f R14: 00007fdc3a282dc0 R15: 00007fdc3a283700
      [  153.567399] ---[ end trace c1d5ae2b1059ec62 ]---
      
      f60e5990 ("ipv6: protect skb->sk accesses from recursive dereference inside the stack")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ad6f6e7
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Fix node reference count · 8dbe4c5d
      Florian Fainelli authored
      of_find_node_by_name() will do an of_node_put() on the "from" argument.
      With CONFIG_OF_DYNAMIC enabled which checks for device_node reference
      counts, we would be getting a warning like this:
      
      [    6.347230] refcount_t: increment on 0; use-after-free.
      [    6.352498] WARNING: CPU: 3 PID: 77 at lib/refcount.c:156
      refcount_inc_checked+0x38/0x44
      [    6.360601] Modules linked in:
      [    6.363661] CPU: 3 PID: 77 Comm: kworker/3:1 Tainted: G        W
      5.4.46-gb78b3e9956e6 #13
      [    6.372546] Hardware name: BCM97278SV (DT)
      [    6.376649] Workqueue: events deferred_probe_work_func
      [    6.381796] pstate: 60000005 (nZCv daif -PAN -UAO)
      [    6.386595] pc : refcount_inc_checked+0x38/0x44
      [    6.391133] lr : refcount_inc_checked+0x38/0x44
      ...
      [    6.478791] Call trace:
      [    6.481243]  refcount_inc_checked+0x38/0x44
      [    6.485433]  kobject_get+0x3c/0x4c
      [    6.488840]  of_node_get+0x24/0x34
      [    6.492247]  of_irq_find_parent+0x3c/0xe0
      [    6.496263]  of_irq_parse_one+0xe4/0x1d0
      [    6.500191]  irq_of_parse_and_map+0x44/0x84
      [    6.504381]  bcm_sf2_sw_probe+0x22c/0x844
      [    6.508397]  platform_drv_probe+0x58/0xa8
      [    6.512413]  really_probe+0x238/0x3fc
      [    6.516081]  driver_probe_device+0x11c/0x12c
      [    6.520358]  __device_attach_driver+0xa8/0x100
      [    6.524808]  bus_for_each_drv+0xb4/0xd0
      [    6.528650]  __device_attach+0xd0/0x164
      [    6.532493]  device_initial_probe+0x24/0x30
      [    6.536682]  bus_probe_device+0x38/0x98
      [    6.540524]  deferred_probe_work_func+0xa8/0xd4
      [    6.545061]  process_one_work+0x178/0x288
      [    6.549078]  process_scheduled_works+0x44/0x48
      [    6.553529]  worker_thread+0x218/0x270
      [    6.557285]  kthread+0xdc/0xe4
      [    6.560344]  ret_from_fork+0x10/0x18
      [    6.563925] ---[ end trace 68f65caf69bb152a ]---
      
      Fix this by adding a of_node_get() to increment the reference count
      prior to the call.
      
      Fixes: afa3b592 ("net: dsa: bcm_sf2: Ensure correct sub-node is parsed")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8dbe4c5d
    • Alexander Lobakin's avatar
      net: ethtool: add missing NETIF_F_GSO_FRAGLIST feature string · eddbf5d0
      Alexander Lobakin authored
      Commit 3b335832 ("net: Add fraglist GRO/GSO feature flags") missed
      an entry for NETIF_F_GSO_FRAGLIST in netdev_features_strings array. As
      a result, fraglist GSO feature is not shown in 'ethtool -k' output and
      can't be toggled on/off.
      The fix is trivial.
      
      Fixes: 3b335832 ("net: Add fraglist GRO/GSO feature flags")
      Signed-off-by: default avatarAlexander Lobakin <alobakin@pm.me>
      Reviewed-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eddbf5d0
    • David Christensen's avatar
      tg3: driver sleeps indefinitely when EEH errors exceed eeh_max_freezes · 3a2656a2
      David Christensen authored
      The driver function tg3_io_error_detected() calls napi_disable twice,
      without an intervening napi_enable, when the number of EEH errors exceeds
      eeh_max_freezes, resulting in an indefinite sleep while holding rtnl_lock.
      
      Add check for pcierr_recovery which skips code already executed for the
      "Frozen" state.
      Signed-off-by: default avatarDavid Christensen <drc@linux.vnet.ibm.com>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a2656a2
    • Martin's avatar
      bareudp: Fixed multiproto mode configuration · 4c98045c
      Martin authored
      Code to handle multiproto configuration is missing.
      
      Fixes: 4b5f6723 ("net: Special handling for IP & MPLS")
      Signed-off-by: default avatarMartin <martin.varghese@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c98045c
    • David S. Miller's avatar
      Merge branch 's390-qeth-fixes' · e807fa3f
      David S. Miller authored
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2020-06-17
      
      please apply the following patch series for qeth to netdev's net tree.
      
      The first patch fixes a regression in the error handling for a specific
      cmd type. I have some follow-ups queued up for net-next to clean this
      up properly...
      
      The second patch fine-tunes the HW offload restrictions that went in
      with this merge window. In some setups we don't need to apply them.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e807fa3f
    • Julian Wiedmann's avatar
      s390/qeth: let isolation mode override HW offload restrictions · 8cebedb6
      Julian Wiedmann authored
      When a device is configured with ISOLATION_MODE_FWD, traffic never goes
      through the internal switch. Don't apply the offload restrictions in
      this case.
      
      Fixes: c619e9a6 ("s390/qeth: don't use restricted offloads for local traffic")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8cebedb6
    • Julian Wiedmann's avatar
      s390/qeth: fix error handling for isolation mode cmds · e2dfcfba
      Julian Wiedmann authored
      Current(?) OSA devices also store their cmd-specific return codes for
      SET_ACCESS_CONTROL cmds into the top-level cmd->hdr.return_code.
      So once we added stricter checking for the top-level field a while ago,
      none of the error logic that rolls back the user's configuration to its
      old state is applied any longer.
      
      For this specific cmd, go back to the old model where we peek into the
      cmd structure even though the top-level field indicated an error.
      
      Fixes: 686c97ee ("s390/qeth: fix error handling in adapter command callbacks")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2dfcfba
    • David S. Miller's avatar
      Merge branch 'mptcp-cope-with-syncookie-on-MP_JOINs' · f3c7a6e0
      David S. Miller authored
      Paolo Abeni says:
      
      ====================
      mptcp: cope with syncookie on MP_JOINs
      
      Currently syncookies on MP_JOIN connections are not handled correctly: the
      connections fallback to TCP and are kept alive instead of resetting them at
      fallback time.
      
      The first patch propagates the required information up to syn_recv_sock time,
      and the 2nd patch addresses the unifying the error path for all MP_JOIN
      requests.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3c7a6e0
    • Paolo Abeni's avatar
      mptcp: drop MP_JOIN request sock on syn cookies · 9e365ff5
      Paolo Abeni authored
      Currently any MPTCP socket using syn cookies will fallback to
      TCP at 3rd ack time. In case of MP_JOIN requests, the RFC mandate
      closing the child and sockets, but the existing error paths
      do not handle the syncookie scenario correctly.
      
      Address the issue always forcing the child shutdown in case of
      MP_JOIN fallback.
      
      Fixes: ae2dd716 ("mptcp: handle tcp fallback when using syn cookies")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e365ff5
    • Paolo Abeni's avatar
      mptcp: cache msk on MP_JOIN init_req · 8fd4de12
      Paolo Abeni authored
      The msk ownership is transferred to the child socket at
      3rd ack time, so that we avoid more lookups later. If the
      request does not reach the 3rd ack, the MSK reference is
      dropped at request sock release time.
      
      As a side effect, fallback is now tracked by a NULL msk
      reference instead of zeroed 'mp_join' field. This will
      simplify the next patch.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8fd4de12
    • guodeqing's avatar
      net: Fix the arp error in some cases · 5eea3a63
      guodeqing authored
      ie.,
      $ ifconfig eth0 6.6.6.6 netmask 255.255.255.0
      
      $ ip rule add from 6.6.6.6 table 6666
      
      $ ip route add 9.9.9.9 via 6.6.6.6
      
      $ ping -I 6.6.6.6 9.9.9.9
      PING 9.9.9.9 (9.9.9.9) from 6.6.6.6 : 56(84) bytes of data.
      
      3 packets transmitted, 0 received, 100% packet loss, time 2079ms
      
      $ arp
      Address     HWtype  HWaddress           Flags Mask            Iface
      6.6.6.6             (incomplete)                              eth0
      
      The arp request address is error, this is because fib_table_lookup in
      fib_check_nh lookup the destnation 9.9.9.9 nexthop, the scope of
      the fib result is RT_SCOPE_LINK,the correct scope is RT_SCOPE_HOST.
      Here I add a check of whether this is RT_TABLE_MAIN to solve this problem.
      
      Fixes: 3bfd8472 ("net: Use passed in table for nexthop lookups")
      Signed-off-by: default avatarguodeqing <geffrey.guo@huawei.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5eea3a63
    • David S. Miller's avatar
      Merge branch 'sja1105-fixes' · ad103e03
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Fix VLAN checks for SJA1105 DSA tc-flower filters
      
      This fixes a ridiculous situation where the driver, in VLAN-unaware
      mode, would refuse accepting any tc filter:
      
      tc filter replace dev sw1p3 ingress flower skip_sw \
      	dst_mac 42:be:24:9b:76:20 \
      	action gate (...)
      Error: sja1105: Can only gate based on {DMAC, VID, PCP}.
      
      tc filter replace dev sw1p3 ingress protocol 802.1Q flower skip_sw \
      	vlan_id 1 vlan_prio 0 dst_mac 42:be:24:9b:76:20 \
      	action gate (...)
      Error: sja1105: Can only gate based on DMAC.
      
      So, without changing the VLAN awareness state, it says it doesn't want
      VLAN-aware rules, and it doesn't want VLAN-unaware rules either. One
      would say it's in Schrodinger's state...
      
      Now, the situation has been made worse by commit 7f14937f ("net:
      dsa: sja1105: keep the VLAN awareness state in a driver variable"),
      which made VLAN awareness a ternary attribute, but after inspecting the
      code from before that patch with a truth table, it looks like the
      logical bug was there even before.
      
      While attempting to fix this, I also noticed some leftover debugging
      code in one of the places that needed to be fixed. It would have
      appeared in the context of patch 3/3 anyway, so I decided to create a
      patch that removes it.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad103e03
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix checks for VLAN state in gate action · 5182a622
      Vladimir Oltean authored
      This action requires the VLAN awareness state of the switch to be of the
      same type as the key that's being added:
      
      - If the switch is unaware of VLAN, then the tc filter key must only
        contain the destination MAC address.
      - If the switch is VLAN-aware, the key must also contain the VLAN ID and
        PCP.
      
      But this check doesn't work unless we verify the VLAN awareness state on
      both the "if" and the "else" branches.
      
      Fixes: 834f8933 ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5182a622
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix checks for VLAN state in redirect action · c6ae970b
      Vladimir Oltean authored
      This action requires the VLAN awareness state of the switch to be of the
      same type as the key that's being added:
      
      - If the switch is unaware of VLAN, then the tc filter key must only
        contain the destination MAC address.
      - If the switch is VLAN-aware, the key must also contain the VLAN ID and
        PCP.
      
      But this check doesn't work unless we verify the VLAN awareness state on
      both the "if" and the "else" branches.
      
      Fixes: dfacc5a2 ("net: dsa: sja1105: support flow-based redirection via virtual links")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6ae970b
    • Vladimir Oltean's avatar
      net: dsa: sja1105: remove debugging code in sja1105_vl_gate · 5b3b396c
      Vladimir Oltean authored
      This shouldn't be there.
      
      Fixes: 834f8933 ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b3b396c
    • David S. Miller's avatar
      Merge branch 'act_gate-fixes' · b64ee485
      David S. Miller authored
      Davide Caratti says:
      
      ====================
      two fixes for 'act_gate' control plane
      
      - patch 1/2 attempts to fix the error path of tcf_gate_init() when users
        try to configure 'act_gate' rules with wrong parameters
      - patch 2/2 is a follow-up of a recent fix for NULL dereference in
        the error path of tcf_gate_init()
      
      further work will introduce a tdc test for 'act_gate'.
      
      changes since v2:
        - fix undefined behavior in patch 1/2
        - improve comment in patch 2/2
      changes since v1:
        coding style fixes in patch 1/2 and 2/2
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b64ee485
    • Davide Caratti's avatar
      net/sched: act_gate: fix configuration of the periodic timer · c362a06e
      Davide Caratti authored
      assigning a dummy value of 'clock_id' to avoid cancellation of the cycle
      timer before its initialization was a temporary solution, and we still
      need to handle the case where act_gate timer parameters are changed by
      commands like the following one:
      
       # tc action replace action gate <parameters>
      
      the fix consists in the following items:
      
      1) remove the workaround assignment of 'clock_id', and init the list of
         entries before the first error path after IDR atomic check/allocation
      2) validate 'clock_id' earlier: there is no need to do IDR atomic
         check/allocation if we know that 'clock_id' is a bad value
      3) use a dedicated function, 'gate_setup_timer()', to ensure that the
         timer is cancelled and re-initialized on action overwrite, and also
         ensure we initialize the timer in the error path of tcf_gate_init()
      
      v3: improve comment in the error path of tcf_gate_init() (thanks to
          Vladimir Oltean)
      v2: avoid 'goto' in gate_setup_timer (thanks to Cong Wang)
      
      CC: Ivan Vecera <ivecera@redhat.com>
      Fixes: a01c2454 ("net/sched: fix a couple of splats in the error path of tfc_gate_init()")
      Fixes: a51c328d ("net: qos: introduce a gate control flow action")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c362a06e
    • Davide Caratti's avatar
      net/sched: act_gate: fix NULL dereference in tcf_gate_init() · 7024339a
      Davide Caratti authored
      it is possible to see a KASAN use-after-free, immediately followed by a
      NULL dereference crash, with the following command:
      
       # tc action add action gate index 3 cycle-time 100000000ns \
       > cycle-time-ext 100000000ns clockid CLOCK_TAI
      
       BUG: KASAN: use-after-free in tcf_action_init_1+0x8eb/0x960
       Write of size 1 at addr ffff88810a5908bc by task tc/883
      
       CPU: 0 PID: 883 Comm: tc Not tainted 5.7.0+ #188
       Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
       Call Trace:
        dump_stack+0x75/0xa0
        print_address_description.constprop.6+0x1a/0x220
        kasan_report.cold.9+0x37/0x7c
        tcf_action_init_1+0x8eb/0x960
        tcf_action_init+0x157/0x2a0
        tcf_action_add+0xd9/0x2f0
        tc_ctl_action+0x2a3/0x39d
        rtnetlink_rcv_msg+0x5f3/0x920
        netlink_rcv_skb+0x120/0x380
        netlink_unicast+0x439/0x630
        netlink_sendmsg+0x714/0xbf0
        sock_sendmsg+0xe2/0x110
        ____sys_sendmsg+0x5b4/0x890
        ___sys_sendmsg+0xe9/0x160
        __sys_sendmsg+0xd3/0x170
        do_syscall_64+0x9a/0x370
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [...]
      
       KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077]
       CPU: 0 PID: 883 Comm: tc Tainted: G    B             5.7.0+ #188
       Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
       RIP: 0010:tcf_action_fill_size+0xa3/0xf0
       [....]
       RSP: 0018:ffff88813a48f250 EFLAGS: 00010212
       RAX: dffffc0000000000 RBX: 0000000000000094 RCX: ffffffffa47c3eb6
       RDX: 000000000000000e RSI: 0000000000000008 RDI: 0000000000000070
       RBP: ffff88810a590800 R08: 0000000000000004 R09: ffffed1027491e03
       R10: 0000000000000003 R11: ffffed1027491e03 R12: 0000000000000000
       R13: 0000000000000000 R14: dffffc0000000000 R15: ffff88810a590800
       FS:  00007f62cae8ce40(0000) GS:ffff888147c00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f62c9d20a10 CR3: 000000013a52a000 CR4: 0000000000340ef0
       Call Trace:
        tcf_action_init+0x172/0x2a0
        tcf_action_add+0xd9/0x2f0
        tc_ctl_action+0x2a3/0x39d
        rtnetlink_rcv_msg+0x5f3/0x920
        netlink_rcv_skb+0x120/0x380
        netlink_unicast+0x439/0x630
        netlink_sendmsg+0x714/0xbf0
        sock_sendmsg+0xe2/0x110
        ____sys_sendmsg+0x5b4/0x890
        ___sys_sendmsg+0xe9/0x160
        __sys_sendmsg+0xd3/0x170
        do_syscall_64+0x9a/0x370
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      this is caused by the test on 'cycletime_ext', that is still unassigned
      when the action is newly created. This makes the action .init() return 0
      without calling tcf_idr_insert(), hence the UAF + crash.
      
      rework the logic that prevents zero values of cycle-time, as follows:
      
      1) 'tcfg_cycletime_ext' seems to be unused in the action software path,
         and it was already possible by other means to obtain non-zero
         cycletime and zero cycletime-ext. So, removing that test should not
         cause any damage.
      2) while at it, we must prevent overwriting configuration data with wrong
         ones: use a temporary variable for 'tcfg_cycletime', and validate it
         preserving the original semantic (that allowed computing the cycle
         time as the sum of all intervals, when not specified by
         TCA_GATE_CYCLE_TIME).
      3) remove the test on 'tcfg_cycletime', no more useful, and avoid
         returning -EFAULT, which did not seem an appropriate return value for
         a wrong netlink attribute.
      
      v3: fix uninitialized 'cycletime' (thanks to Vladimir Oltean)
      v2: remove useless 'return;' at the end of void gate_get_start_time()
      
      Fixes: a51c328d ("net: qos: introduce a gate control flow action")
      CC: Ivan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7024339a
    • Taehee Yoo's avatar
      ip_tunnel: fix use-after-free in ip_tunnel_lookup() · ba61539c
      Taehee Yoo authored
      In the datapath, the ip_tunnel_lookup() is used and it internally uses
      fallback tunnel device pointer, which is fb_tunnel_dev.
      This pointer variable should be set to NULL when a fb interface is deleted.
      But there is no routine to set fb_tunnel_dev pointer to NULL.
      So, this pointer will be still used after interface is deleted and
      it eventually results in the use-after-free problem.
      
      Test commands:
          ip netns add A
          ip netns add B
          ip link add eth0 type veth peer name eth1
          ip link set eth0 netns A
          ip link set eth1 netns B
      
          ip netns exec A ip link set lo up
          ip netns exec A ip link set eth0 up
          ip netns exec A ip link add gre1 type gre local 10.0.0.1 \
      	    remote 10.0.0.2
          ip netns exec A ip link set gre1 up
          ip netns exec A ip a a 10.0.100.1/24 dev gre1
          ip netns exec A ip a a 10.0.0.1/24 dev eth0
      
          ip netns exec B ip link set lo up
          ip netns exec B ip link set eth1 up
          ip netns exec B ip link add gre1 type gre local 10.0.0.2 \
      	    remote 10.0.0.1
          ip netns exec B ip link set gre1 up
          ip netns exec B ip a a 10.0.100.2/24 dev gre1
          ip netns exec B ip a a 10.0.0.2/24 dev eth1
          ip netns exec A hping3 10.0.100.2 -2 --flood -d 60000 &
          ip netns del B
      
      Splat looks like:
      [   77.793450][    C3] ==================================================================
      [   77.794702][    C3] BUG: KASAN: use-after-free in ip_tunnel_lookup+0xcc4/0xf30
      [   77.795573][    C3] Read of size 4 at addr ffff888060bd9c84 by task hping3/2905
      [   77.796398][    C3]
      [   77.796664][    C3] CPU: 3 PID: 2905 Comm: hping3 Not tainted 5.8.0-rc1+ #616
      [   77.797474][    C3] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   77.798453][    C3] Call Trace:
      [   77.798815][    C3]  <IRQ>
      [   77.799142][    C3]  dump_stack+0x9d/0xdb
      [   77.799605][    C3]  print_address_description.constprop.7+0x2cc/0x450
      [   77.800365][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.800908][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.801517][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.802145][    C3]  kasan_report+0x154/0x190
      [   77.802821][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.803503][    C3]  ip_tunnel_lookup+0xcc4/0xf30
      [   77.804165][    C3]  __ipgre_rcv+0x1ab/0xaa0 [ip_gre]
      [   77.804862][    C3]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   77.805621][    C3]  gre_rcv+0x304/0x1910 [ip_gre]
      [   77.806293][    C3]  ? lock_acquire+0x1a9/0x870
      [   77.806925][    C3]  ? gre_rcv+0xfe/0x354 [gre]
      [   77.807559][    C3]  ? erspan_xmit+0x2e60/0x2e60 [ip_gre]
      [   77.808305][    C3]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   77.809032][    C3]  ? rcu_read_lock_held+0x90/0xa0
      [   77.809713][    C3]  gre_rcv+0x1b8/0x354 [gre]
      [ ... ]
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Fixes: c5441932 ("GRE: Refactor GRE tunneling code.")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba61539c
    • Taehee Yoo's avatar
      ip6_gre: fix use-after-free in ip6gre_tunnel_lookup() · dafabb65
      Taehee Yoo authored
      In the datapath, the ip6gre_tunnel_lookup() is used and it internally uses
      fallback tunnel device pointer, which is fb_tunnel_dev.
      This pointer variable should be set to NULL when a fb interface is deleted.
      But there is no routine to set fb_tunnel_dev pointer to NULL.
      So, this pointer will be still used after interface is deleted and
      it eventually results in the use-after-free problem.
      
      Test commands:
          ip netns add A
          ip netns add B
          ip link add eth0 type veth peer name eth1
          ip link set eth0 netns A
          ip link set eth1 netns B
      
          ip netns exec A ip link set lo up
          ip netns exec A ip link set eth0 up
          ip netns exec A ip link add ip6gre1 type ip6gre local fc:0::1 \
      	    remote fc:0::2
          ip netns exec A ip -6 a a fc:100::1/64 dev ip6gre1
          ip netns exec A ip link set ip6gre1 up
          ip netns exec A ip -6 a a fc:0::1/64 dev eth0
          ip netns exec A ip link set ip6gre0 up
      
          ip netns exec B ip link set lo up
          ip netns exec B ip link set eth1 up
          ip netns exec B ip link add ip6gre1 type ip6gre local fc:0::2 \
      	    remote fc:0::1
          ip netns exec B ip -6 a a fc:100::2/64 dev ip6gre1
          ip netns exec B ip link set ip6gre1 up
          ip netns exec B ip -6 a a fc:0::2/64 dev eth1
          ip netns exec B ip link set ip6gre0 up
          ip netns exec A ping fc:100::2 -s 60000 &
          ip netns del B
      
      Splat looks like:
      [   73.087285][    C1] BUG: KASAN: use-after-free in ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.088361][    C1] Read of size 4 at addr ffff888040559218 by task ping/1429
      [   73.089317][    C1]
      [   73.089638][    C1] CPU: 1 PID: 1429 Comm: ping Not tainted 5.7.0+ #602
      [   73.090531][    C1] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   73.091725][    C1] Call Trace:
      [   73.092160][    C1]  <IRQ>
      [   73.092556][    C1]  dump_stack+0x96/0xdb
      [   73.093122][    C1]  print_address_description.constprop.6+0x2cc/0x450
      [   73.094016][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.094894][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.095767][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.096619][    C1]  kasan_report+0x154/0x190
      [   73.097209][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.097989][    C1]  ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.098750][    C1]  ? gre_del_protocol+0x60/0x60 [gre]
      [   73.099500][    C1]  gre_rcv+0x1c5/0x1450 [ip6_gre]
      [   73.100199][    C1]  ? ip6gre_header+0xf00/0xf00 [ip6_gre]
      [   73.100985][    C1]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   73.101830][    C1]  ? ip6_input_finish+0x5/0xf0
      [   73.102483][    C1]  ip6_protocol_deliver_rcu+0xcbb/0x1510
      [   73.103296][    C1]  ip6_input_finish+0x5b/0xf0
      [   73.103920][    C1]  ip6_input+0xcd/0x2c0
      [   73.104473][    C1]  ? ip6_input_finish+0xf0/0xf0
      [   73.105115][    C1]  ? rcu_read_lock_held+0x90/0xa0
      [   73.105783][    C1]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   73.106548][    C1]  ipv6_rcv+0x1f1/0x300
      [ ... ]
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dafabb65
    • Taehee Yoo's avatar
      net: core: reduce recursion limit value · fb7861d1
      Taehee Yoo authored
      In the current code, ->ndo_start_xmit() can be executed recursively only
      10 times because of stack memory.
      But, in the case of the vxlan, 10 recursion limit value results in
      a stack overflow.
      In the current code, the nested interface is limited by 8 depth.
      There is no critical reason that the recursion limitation value should
      be 10.
      So, it would be good to be the same value with the limitation value of
      nesting interface depth.
      
      Test commands:
          ip link add vxlan10 type vxlan vni 10 dstport 4789 srcport 4789 4789
          ip link set vxlan10 up
          ip a a 192.168.10.1/24 dev vxlan10
          ip n a 192.168.10.2 dev vxlan10 lladdr fc:22:33:44:55:66 nud permanent
      
          for i in {9..0}
          do
              let A=$i+1
      	ip link add vxlan$i type vxlan vni $i dstport 4789 srcport 4789 4789
      	ip link set vxlan$i up
      	ip a a 192.168.$i.1/24 dev vxlan$i
      	ip n a 192.168.$i.2 dev vxlan$i lladdr fc:22:33:44:55:66 nud permanent
      	bridge fdb add fc:22:33:44:55:66 dev vxlan$A dst 192.168.$i.2 self
          done
          hping3 192.168.10.2 -2 -d 60000
      
      Splat looks like:
      [  103.814237][ T1127] =============================================================================
      [  103.871955][ T1127] BUG kmalloc-2k (Tainted: G    B            ): Padding overwritten. 0x00000000897a2e4f-0x000
      [  103.873187][ T1127] -----------------------------------------------------------------------------
      [  103.873187][ T1127]
      [  103.874252][ T1127] INFO: Slab 0x000000005cccc724 objects=5 used=5 fp=0x0000000000000000 flags=0x10000000001020
      [  103.881323][ T1127] CPU: 3 PID: 1127 Comm: hping3 Tainted: G    B             5.7.0+ #575
      [  103.882131][ T1127] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  103.883006][ T1127] Call Trace:
      [  103.883324][ T1127]  dump_stack+0x96/0xdb
      [  103.883716][ T1127]  slab_err+0xad/0xd0
      [  103.884106][ T1127]  ? _raw_spin_unlock+0x1f/0x30
      [  103.884620][ T1127]  ? get_partial_node.isra.78+0x140/0x360
      [  103.885214][ T1127]  slab_pad_check.part.53+0xf7/0x160
      [  103.885769][ T1127]  ? pskb_expand_head+0x110/0xe10
      [  103.886316][ T1127]  check_slab+0x97/0xb0
      [  103.886763][ T1127]  alloc_debug_processing+0x84/0x1a0
      [  103.887308][ T1127]  ___slab_alloc+0x5a5/0x630
      [  103.887765][ T1127]  ? pskb_expand_head+0x110/0xe10
      [  103.888265][ T1127]  ? lock_downgrade+0x730/0x730
      [  103.888762][ T1127]  ? pskb_expand_head+0x110/0xe10
      [  103.889244][ T1127]  ? __slab_alloc+0x3e/0x80
      [  103.889675][ T1127]  __slab_alloc+0x3e/0x80
      [  103.890108][ T1127]  __kmalloc_node_track_caller+0xc7/0x420
      [ ... ]
      
      Fixes: 11a766ce ("net: Increase xmit RECURSION_LIMIT to 10.")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb7861d1
    • Yang Yingliang's avatar
      net: fix memleak in register_netdevice() · 814152a8
      Yang Yingliang authored
      I got a memleak report when doing some fuzz test:
      
      unreferenced object 0xffff888112584000 (size 13599):
        comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
        hex dump (first 32 bytes):
          74 61 70 30 00 00 00 00 00 00 00 00 00 00 00 00  tap0............
          00 ee d9 19 81 88 ff ff 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000002f60ba65>] __kmalloc_node+0x309/0x3a0
          [<0000000075b211ec>] kvmalloc_node+0x7f/0xc0
          [<00000000d3a97396>] alloc_netdev_mqs+0x76/0xfc0
          [<00000000609c3655>] __tun_chr_ioctl+0x1456/0x3d70
          [<000000001127ca24>] ksys_ioctl+0xe5/0x130
          [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
          [<00000000e1023498>] do_syscall_64+0x56/0xa0
          [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      unreferenced object 0xffff888111845cc0 (size 8):
        comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
        hex dump (first 8 bytes):
          74 61 70 30 00 88 ff ff                          tap0....
        backtrace:
          [<000000004c159777>] kstrdup+0x35/0x70
          [<00000000d8b496ad>] kstrdup_const+0x3d/0x50
          [<00000000494e884a>] kvasprintf_const+0xf1/0x180
          [<0000000097880a2b>] kobject_set_name_vargs+0x56/0x140
          [<000000008fbdfc7b>] dev_set_name+0xab/0xe0
          [<000000005b99e3b4>] netdev_register_kobject+0xc0/0x390
          [<00000000602704fe>] register_netdevice+0xb61/0x1250
          [<000000002b7ca244>] __tun_chr_ioctl+0x1cd1/0x3d70
          [<000000001127ca24>] ksys_ioctl+0xe5/0x130
          [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
          [<00000000e1023498>] do_syscall_64+0x56/0xa0
          [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      unreferenced object 0xffff88811886d800 (size 512):
        comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
        hex dump (first 32 bytes):
          00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
          ff ff ff ff ff ff ff ff c0 66 3d a3 ff ff ff ff  .........f=.....
        backtrace:
          [<0000000050315800>] device_add+0x61e/0x1950
          [<0000000021008dfb>] netdev_register_kobject+0x17e/0x390
          [<00000000602704fe>] register_netdevice+0xb61/0x1250
          [<000000002b7ca244>] __tun_chr_ioctl+0x1cd1/0x3d70
          [<000000001127ca24>] ksys_ioctl+0xe5/0x130
          [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
          [<00000000e1023498>] do_syscall_64+0x56/0xa0
          [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      If call_netdevice_notifiers() failed, then rollback_registered()
      calls netdev_unregister_kobject() which holds the kobject. The
      reference cannot be put because the netdev won't be add to todo
      list, so it will leads a memleak, we need put the reference to
      avoid memleak.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      814152a8
    • Sascha Hauer's avatar
      net: ethernet: mvneta: Add 2500BaseX support for SoCs without comphy · 1a642ca7
      Sascha Hauer authored
      The older SoCs like Armada XP support a 2500BaseX mode in the datasheets
      referred to as DR-SGMII (Double rated SGMII) or HS-SGMII (High Speed
      SGMII). This is an upclocked 1000BaseX mode, thus
      PHY_INTERFACE_MODE_2500BASEX is the appropriate mode define for it.
      adding support for it merely means writing the correct magic value into
      the MVNETA_SERDES_CFG register.
      Signed-off-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a642ca7
    • Sascha Hauer's avatar
      net: ethernet: mvneta: Fix Serdes configuration for SoCs without comphy · b4748553
      Sascha Hauer authored
      The MVNETA_SERDES_CFG register is only available on older SoCs like the
      Armada XP. On newer SoCs like the Armada 38x the fields are moved to
      comphy. This patch moves the writes to this register next to the comphy
      initialization, so that depending on the SoC either comphy or
      MVNETA_SERDES_CFG is configured.
      With this we no longer write to the MVNETA_SERDES_CFG on SoCs where it
      doesn't exist.
      Suggested-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4748553
  2. 17 Jun, 2020 11 commits