1. 18 Mar, 2020 39 commits
    • Hangbin Liu's avatar
      net/ipv6: remove the old peer route if change it to a new one · dfe7df51
      Hangbin Liu authored
      [ Upstream commit d0098e4c ]
      
      When we modify the peer route and changed it to a new one, we should
      remove the old route first. Before the fix:
      
      + ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 256 pref medium
      2001:db8::2 proto kernel metric 256 pref medium
      + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 256 pref medium
      2001:db8::2 proto kernel metric 256 pref medium
      
      After the fix:
      + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 256 pref medium
      2001:db8::3 proto kernel metric 256 pref medium
      
      This patch depend on the previous patch "net/ipv6: need update peer route
      when modify metric" to update new peer route after delete old one.
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dfe7df51
    • Hangbin Liu's avatar
      net/ipv6: need update peer route when modify metric · d8cfddaf
      Hangbin Liu authored
      [ Upstream commit 61794012 ]
      
      When we modify the route metric, the peer address's route need also
      be updated. Before the fix:
      
      + ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2 metric 60
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 60 pref medium
      2001:db8::2 proto kernel metric 60 pref medium
      + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::2 metric 61
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 61 pref medium
      2001:db8::2 proto kernel metric 60 pref medium
      
      After the fix:
      + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::2 metric 61
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 61 pref medium
      2001:db8::2 proto kernel metric 61 pref medium
      
      Fixes: 8308f3ff ("net/ipv6: Add support for specifying metric of connected routes")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8cfddaf
    • Hangbin Liu's avatar
      selftests/net/fib_tests: update addr_metric_test for peer route testing · b5a8261a
      Hangbin Liu authored
      [ Upstream commit 0d29169a ]
      
      This patch update {ipv4, ipv6}_addr_metric_test with
      1. Set metric of address with peer route and see if the route added
      correctly.
      2. Modify metric and peer address for peer route and see if the route
      changed correctly.
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5a8261a
    • Heiner Kallweit's avatar
      net: phy: fix MDIO bus PM PHY resuming · e37fc53a
      Heiner Kallweit authored
      [ Upstream commit 611d779a ]
      
      So far we have the unfortunate situation that mdio_bus_phy_may_suspend()
      is called in suspend AND resume path, assuming that function result is
      the same. After the original change this is no longer the case,
      resulting in broken resume as reported by Geert.
      
      To fix this call mdio_bus_phy_may_suspend() in the suspend path only,
      and let the phy_device store the info whether it was suspended by
      MDIO bus PM.
      
      Fixes: 503ba7c6 ("net: phy: Avoid multiple suspends")
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Tested-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e37fc53a
    • Jakub Kicinski's avatar
      nfc: add missing attribute validation for vendor subcommand · 18164e79
      Jakub Kicinski authored
      [ Upstream commit 6ba3da44 ]
      
      Add missing attribute validation for vendor subcommand attributes
      to the netlink policy.
      
      Fixes: 9e58095f ("NFC: netlink: Implement vendor command support")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18164e79
    • Jakub Kicinski's avatar
      nfc: add missing attribute validation for deactivate target · b1980619
      Jakub Kicinski authored
      [ Upstream commit 88e706d5 ]
      
      Add missing attribute validation for NFC_ATTR_TARGET_INDEX
      to the netlink policy.
      
      Fixes: 4d63adfe ("NFC: Add NFC_CMD_DEACTIVATE_TARGET support")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1980619
    • Jakub Kicinski's avatar
      nfc: add missing attribute validation for SE API · ae855e78
      Jakub Kicinski authored
      [ Upstream commit 361d23e4 ]
      
      Add missing attribute validation for NFC_ATTR_SE_INDEX
      to the netlink policy.
      
      Fixes: 5ce3f32b ("NFC: netlink: SE API implementation")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae855e78
    • Jakub Kicinski's avatar
      team: add missing attribute validation for array index · 98be4504
      Jakub Kicinski authored
      [ Upstream commit 669fcd77 ]
      
      Add missing attribute validation for TEAM_ATTR_OPTION_ARRAY_INDEX
      to the netlink policy.
      
      Fixes: b1303326 ("team: introduce array options")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98be4504
    • Jakub Kicinski's avatar
      team: add missing attribute validation for port ifindex · 1c2dcaf8
      Jakub Kicinski authored
      [ Upstream commit dd25cb27 ]
      
      Add missing attribute validation for TEAM_ATTR_OPTION_PORT_IFINDEX
      to the netlink policy.
      
      Fixes: 80f7c668 ("team: add support for per-port options")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c2dcaf8
    • Jakub Kicinski's avatar
      net: fq: add missing attribute validation for orphan mask · 09ec15bb
      Jakub Kicinski authored
      [ Upstream commit 7e6dc03e ]
      
      Add missing attribute validation for TCA_FQ_ORPHAN_MASK
      to the netlink policy.
      
      Fixes: 06eb395f ("pkt_sched: fq: better control of DDOS traffic")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09ec15bb
    • Jakub Kicinski's avatar
      macsec: add missing attribute validation for port · 388b9d45
      Jakub Kicinski authored
      [ Upstream commit 31d9a1c5 ]
      
      Add missing attribute validation for IFLA_MACSEC_PORT
      to the netlink policy.
      
      Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      388b9d45
    • Jakub Kicinski's avatar
      can: add missing attribute validation for termination · 89b4332a
      Jakub Kicinski authored
      [ Upstream commit ab02ad66 ]
      
      Add missing attribute validation for IFLA_CAN_TERMINATION
      to the netlink policy.
      
      Fixes: 12a6075c ("can: dev: add CAN interface termination API")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89b4332a
    • Jakub Kicinski's avatar
      nl802154: add missing attribute validation for dev_type · 1768ebf3
      Jakub Kicinski authored
      [ Upstream commit b60673c4 ]
      
      Add missing attribute type validation for IEEE802154_ATTR_DEV_TYPE
      to the netlink policy.
      
      Fixes: 90c049b2 ("ieee802154: interface type to be added")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1768ebf3
    • Jakub Kicinski's avatar
      nl802154: add missing attribute validation · 4785c665
      Jakub Kicinski authored
      [ Upstream commit 9322cd7c ]
      
      Add missing attribute validation for several u8 types.
      
      Fixes: 2c21d115 ("net: add NL802154 interface for configuration of 802.15.4 devices")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4785c665
    • Jakub Kicinski's avatar
      fib: add missing attribute validation for tun_id · 4e4a3292
      Jakub Kicinski authored
      [ Upstream commit 4c16d64e ]
      
      Add missing netlink policy entry for FRA_TUN_ID.
      
      Fixes: e7030878 ("fib: Add fib rule match on tunnel id")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e4a3292
    • Jakub Kicinski's avatar
      devlink: validate length of param values · f9f04772
      Jakub Kicinski authored
      [ Upstream commit 8750939b ]
      
      DEVLINK_ATTR_PARAM_VALUE_DATA may have different types
      so it's not checked by the normal netlink policy. Make
      sure the attribute length is what we expect.
      
      Fixes: e3b7ca18 ("devlink: Add param set command")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9f04772
    • Eric Dumazet's avatar
      net: memcg: fix lockdep splat in inet_csk_accept() · 34636d24
      Eric Dumazet authored
      commit 06669ea3 upstream.
      
      Locking newsk while still holding the listener lock triggered
      a lockdep splat [1]
      
      We can simply move the memcg code after we release the listener lock,
      as this can also help if multiple threads are sharing a common listener.
      
      Also fix a typo while reading socket sk_rmem_alloc.
      
      [1]
      WARNING: possible recursive locking detected
      5.6.0-rc3-syzkaller #0 Not tainted
      --------------------------------------------
      syz-executor598/9524 is trying to acquire lock:
      ffff88808b5b8b90 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline]
      ffff88808b5b8b90 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x69f/0xd30 net/ipv4/inet_connection_sock.c:492
      
      but task is already holding lock:
      ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline]
      ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x8d/0xd30 net/ipv4/inet_connection_sock.c:445
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(sk_lock-AF_INET6);
        lock(sk_lock-AF_INET6);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      1 lock held by syz-executor598/9524:
       #0: ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline]
       #0: ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x8d/0xd30 net/ipv4/inet_connection_sock.c:445
      
      stack backtrace:
      CPU: 0 PID: 9524 Comm: syz-executor598 Not tainted 5.6.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x188/0x20d lib/dump_stack.c:118
       print_deadlock_bug kernel/locking/lockdep.c:2370 [inline]
       check_deadlock kernel/locking/lockdep.c:2411 [inline]
       validate_chain kernel/locking/lockdep.c:2954 [inline]
       __lock_acquire.cold+0x114/0x288 kernel/locking/lockdep.c:3954
       lock_acquire+0x197/0x420 kernel/locking/lockdep.c:4484
       lock_sock_nested+0xc5/0x110 net/core/sock.c:2947
       lock_sock include/net/sock.h:1541 [inline]
       inet_csk_accept+0x69f/0xd30 net/ipv4/inet_connection_sock.c:492
       inet_accept+0xe9/0x7c0 net/ipv4/af_inet.c:734
       __sys_accept4_file+0x3ac/0x5b0 net/socket.c:1758
       __sys_accept4+0x53/0x90 net/socket.c:1809
       __do_sys_accept4 net/socket.c:1821 [inline]
       __se_sys_accept4 net/socket.c:1818 [inline]
       __x64_sys_accept4+0x93/0xf0 net/socket.c:1818
       do_syscall_64+0xf6/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4445c9
      Code: e8 0c 0d 03 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffc35b37608 EFLAGS: 00000246 ORIG_RAX: 0000000000000120
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004445c9
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 0000000000306777 R09: 0000000000306777
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00000000004053d0 R14: 0000000000000000 R15: 0000000000000000
      
      Fixes: d752a498 ("net: memcg: late association of sock to memcg")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34636d24
    • Shakeel Butt's avatar
      net: memcg: late association of sock to memcg · 9d914194
      Shakeel Butt authored
      [ Upstream commit d752a498 ]
      
      If a TCP socket is allocated in IRQ context or cloned from unassociated
      (i.e. not associated to a memcg) in IRQ context then it will remain
      unassociated for its whole life. Almost half of the TCPs created on the
      system are created in IRQ context, so, memory used by such sockets will
      not be accounted by the memcg.
      
      This issue is more widespread in cgroup v1 where network memory
      accounting is opt-in but it can happen in cgroup v2 if the source socket
      for the cloning was created in root memcg.
      
      To fix the issue, just do the association of the sockets at the accept()
      time in the process context and then force charge the memory buffer
      already used and reserved by the socket.
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d914194
    • Shakeel Butt's avatar
      cgroup: memcg: net: do not associate sock with unrelated cgroup · 941464dc
      Shakeel Butt authored
      [ Upstream commit e876ecc6 ]
      
      We are testing network memory accounting in our setup and noticed
      inconsistent network memory usage and often unrelated cgroups network
      usage correlates with testing workload. On further inspection, it
      seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in
      irq context specially for cgroup v1.
      
      mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context
      and kind of assumes that this can only happen from sk_clone_lock()
      and the source sock object has already associated cgroup. However in
      cgroup v1, where network memory accounting is opt-in, the source sock
      can be unassociated with any cgroup and the new cloned sock can get
      associated with unrelated interrupted cgroup.
      
      Cgroup v2 can also suffer if the source sock object was created by
      process in the root cgroup or if sk_alloc() is called in irq context.
      The fix is to just do nothing in interrupt.
      
      WARNING: Please note that about half of the TCP sockets are allocated
      from the IRQ context, so, memory used by such sockets will not be
      accouted by the memcg.
      
      The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
      
      CPU: 70 PID: 12720 Comm: ssh Tainted:  5.6.0-smp-DEV #1
      Hardware name: ...
      Call Trace:
       <IRQ>
       dump_stack+0x57/0x75
       mem_cgroup_sk_alloc+0xe9/0xf0
       sk_clone_lock+0x2a7/0x420
       inet_csk_clone_lock+0x1b/0x110
       tcp_create_openreq_child+0x23/0x3b0
       tcp_v6_syn_recv_sock+0x88/0x730
       tcp_check_req+0x429/0x560
       tcp_v6_rcv+0x72d/0xa40
       ip6_protocol_deliver_rcu+0xc9/0x400
       ip6_input+0x44/0xd0
       ? ip6_protocol_deliver_rcu+0x400/0x400
       ip6_rcv_finish+0x71/0x80
       ipv6_rcv+0x5b/0xe0
       ? ip6_sublist_rcv+0x2e0/0x2e0
       process_backlog+0x108/0x1e0
       net_rx_action+0x26b/0x460
       __do_softirq+0x104/0x2a6
       do_softirq_own_stack+0x2a/0x40
       </IRQ>
       do_softirq.part.19+0x40/0x50
       __local_bh_enable_ip+0x51/0x60
       ip6_finish_output2+0x23d/0x520
       ? ip6table_mangle_hook+0x55/0x160
       __ip6_finish_output+0xa1/0x100
       ip6_finish_output+0x30/0xd0
       ip6_output+0x73/0x120
       ? __ip6_finish_output+0x100/0x100
       ip6_xmit+0x2e3/0x600
       ? ipv6_anycast_cleanup+0x50/0x50
       ? inet6_csk_route_socket+0x136/0x1e0
       ? skb_free_head+0x1e/0x30
       inet6_csk_xmit+0x95/0xf0
       __tcp_transmit_skb+0x5b4/0xb20
       __tcp_send_ack.part.60+0xa3/0x110
       tcp_send_ack+0x1d/0x20
       tcp_rcv_state_process+0xe64/0xe80
       ? tcp_v6_connect+0x5d1/0x5f0
       tcp_v6_do_rcv+0x1b1/0x3f0
       ? tcp_v6_do_rcv+0x1b1/0x3f0
       __release_sock+0x7f/0xd0
       release_sock+0x30/0xa0
       __inet_stream_connect+0x1c3/0x3b0
       ? prepare_to_wait+0xb0/0xb0
       inet_stream_connect+0x3b/0x60
       __sys_connect+0x101/0x120
       ? __sys_getsockopt+0x11b/0x140
       __x64_sys_connect+0x1a/0x20
       do_syscall_64+0x51/0x200
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
      Fixes: 2d758073 ("mm: memcontrol: consolidate cgroup socket tracking")
      Fixes: d979a39d ("cgroup: duplicate cgroup reference when cloning sockets")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      941464dc
    • Vasundhara Volam's avatar
      bnxt_en: reinitialize IRQs when MTU is modified · aadf2a72
      Vasundhara Volam authored
      [ Upstream commit a9b952d2 ]
      
      MTU changes may affect the number of IRQs so we must call
      bnxt_close_nic()/bnxt_open_nic() with the irq_re_init parameter
      set to true.  The reason is that a larger MTU may require
      aggregation rings not needed with smaller MTU.  We may not be
      able to allocate the required number of aggregation rings and
      so we reduce the number of channels which will change the number
      of IRQs.  Without this patch, it may crash eventually in
      pci_disable_msix() when the IRQs are not properly unwound.
      
      Fixes: c0c050c5 ("bnxt_en: New Broadcom ethernet driver.")
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aadf2a72
    • Edward Cree's avatar
      sfc: detach from cb_page in efx_copy_channel() · 989e5462
      Edward Cree authored
      [ Upstream commit 4b1bd9db ]
      
      It's a resource, not a parameter, so we can't copy it into the new
       channel's TX queues, otherwise aliasing will lead to resource-
       management bugs if the channel is subsequently torn down without
       being initialised.
      
      Before the Fixes:-tagged commit there was a similar bug with
       tsoh_page, but I'm not sure it's worth doing another fix for such
       old kernels.
      
      Fixes: e9117e50 ("sfc: Firmware-Assisted TSO version 2")
      Suggested-by: default avatarDerek Shute <Derek.Shute@stratus.com>
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      989e5462
    • You-Sheng Yang's avatar
      r8152: check disconnect status after long sleep · 8a207f28
      You-Sheng Yang authored
      [ Upstream commit d64c7a08 ]
      
      Dell USB Type C docking WD19/WD19DC attaches additional peripherals as:
      
        /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
            |__ Port 1: Dev 11, If 0, Class=Hub, Driver=hub/4p, 5000M
                |__ Port 3: Dev 12, If 0, Class=Hub, Driver=hub/4p, 5000M
                |__ Port 4: Dev 13, If 0, Class=Vendor Specific Class,
                    Driver=r8152, 5000M
      
      where usb 2-1-3 is a hub connecting all USB Type-A/C ports on the dock.
      
      When hotplugging such dock with additional usb devices already attached on
      it, the probing process may reset usb 2.1 port, therefore r8152 ethernet
      device is also reset. However, during r8152 device init there are several
      for-loops that, when it's unable to retrieve hardware registers due to
      being disconnected from USB, may take up to 14 seconds each in practice,
      and that has to be completed before USB may re-enumerate devices on the
      bus. As a result, devices attached to the dock will only be available
      after nearly 1 minute after the dock was plugged in:
      
        [ 216.388290] [250] r8152 2-1.4:1.0: usb_probe_interface
        [ 216.388292] [250] r8152 2-1.4:1.0: usb_probe_interface - got id
        [ 258.830410] r8152 2-1.4:1.0 (unnamed net_device) (uninitialized): PHY not ready
        [ 258.830460] r8152 2-1.4:1.0 (unnamed net_device) (uninitialized): Invalid header when reading pass-thru MAC addr
        [ 258.830464] r8152 2-1.4:1.0 (unnamed net_device) (uninitialized): Get ether addr fail
      
      This happens in, for example, r8153_init:
      
        static int generic_ocp_read(struct r8152 *tp, u16 index, u16 size,
      			    void *data, u16 type)
        {
          if (test_bit(RTL8152_UNPLUG, &tp->flags))
            return -ENODEV;
          ...
        }
      
        static u16 ocp_read_word(struct r8152 *tp, u16 type, u16 index)
        {
          u32 data;
          ...
          generic_ocp_read(tp, index, sizeof(tmp), &tmp, type | byen);
      
          data = __le32_to_cpu(tmp);
          ...
          return (u16)data;
        }
      
        static void r8153_init(struct r8152 *tp)
        {
          ...
          if (test_bit(RTL8152_UNPLUG, &tp->flags))
            return;
      
          for (i = 0; i < 500; i++) {
            if (ocp_read_word(tp, MCU_TYPE_PLA, PLA_BOOT_CTRL) &
                AUTOLOAD_DONE)
              break;
            msleep(20);
          }
          ...
        }
      
      Since ocp_read_word() doesn't check the return status of
      generic_ocp_read(), and the only exit condition for the loop is to have
      a match in the returned value, such loops will only ends after exceeding
      its maximum runs when the device has been marked as disconnected, which
      takes 500 * 20ms = 10 seconds in theory, 14 in practice.
      
      To solve this long latency another test to RTL8152_UNPLUG flag should be
      added after those 20ms sleep to skip unnecessary loops, so that the device
      probe can complete early and proceed to parent port reset/reprobe process.
      
      This can be reproduced on all kernel versions up to latest v5.6-rc2, but
      after v5.5-rc7 the reproduce rate is dramatically lowered to 1/30 or less
      while it was around 1/2.
      Signed-off-by: default avatarYou-Sheng Yang <vicamo.yang@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a207f28
    • Colin Ian King's avatar
      net: systemport: fix index check to avoid an array out of bounds access · c8149165
      Colin Ian King authored
      [ Upstream commit c0368595 ]
      
      Currently the bounds check on index is off by one and can lead to
      an out of bounds access on array priv->filters_loc when index is
      RXCHK_BRCM_TAG_MAX.
      
      Fixes: bb9051a2 ("net: systemport: Add support for WAKE_FILTER")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8149165
    • Remi Pommarel's avatar
      net: stmmac: dwmac1000: Disable ACS if enhanced descs are not used · ba245652
      Remi Pommarel authored
      [ Upstream commit b723bd93 ]
      
      ACS (auto PAD/FCS stripping) removes FCS off 802.3 packets (LLC) so that
      there is no need to manually strip it for such packets. The enhanced DMA
      descriptors allow to flag LLC packets so that the receiving callback can
      use that to strip FCS manually or not. On the other hand, normal
      descriptors do not support that.
      
      Thus in order to not truncate LLC packet ACS should be disabled when
      using normal DMA descriptors.
      
      Fixes: 47dd7a54 ("net: add support for STMicroelectronics Ethernet controllers.")
      Signed-off-by: default avatarRemi Pommarel <repk@triplefau.lt>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba245652
    • Willem de Bruijn's avatar
      net/packet: tpacket_rcv: do not increment ring index on drop · 64fabf9b
      Willem de Bruijn authored
      [ Upstream commit 46e4c421 ]
      
      In one error case, tpacket_rcv drops packets after incrementing the
      ring producer index.
      
      If this happens, it does not update tp_status to TP_STATUS_USER and
      thus the reader is stalled for an iteration of the ring, causing out
      of order arrival.
      
      The only such error path is when virtio_net_hdr_from_skb fails due
      to encountering an unknown GSO type.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64fabf9b
    • Dan Carpenter's avatar
      net: nfc: fix bounds checking bugs on "pipe" · 7e78a7fd
      Dan Carpenter authored
      [ Upstream commit a3aefbfe ]
      
      This is similar to commit 674d9de0 ("NFC: Fix possible memory
      corruption when handling SHDLC I-Frame commands") and commit d7ee81ad
      ("NFC: nci: Add some bounds checking in nci_hci_cmd_received()") which
      added range checks on "pipe".
      
      The "pipe" variable comes skb->data[0] in nfc_hci_msg_rx_work().
      It's in the 0-255 range.  We're using it as the array index into the
      hdev->pipes[] array which has NFC_HCI_MAX_PIPES (128) members.
      
      Fixes: 118278f2 ("NFC: hci: Add pipes table to reference them with a tuple {gate, host}")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e78a7fd
    • Dmitry Bogdanov's avatar
      net: macsec: update SCI upon MAC address change. · 28cedae5
      Dmitry Bogdanov authored
      [ Upstream commit 6fc498bc ]
      
      SCI should be updated, because it contains MAC in its first 6 octets.
      
      Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver")
      Signed-off-by: default avatarDmitry Bogdanov <dbogdanov@marvell.com>
      Signed-off-by: default avatarMark Starovoytov <mstarovoitov@marvell.com>
      Signed-off-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28cedae5
    • Pablo Neira Ayuso's avatar
      netlink: Use netlink header as base to calculate bad attribute offset · 7aa760f0
      Pablo Neira Ayuso authored
      [ Upstream commit 84b32680 ]
      
      Userspace might send a batch that is composed of several netlink
      messages. The netlink_ack() function must use the pointer to the netlink
      header as base to calculate the bad attribute offset.
      
      Fixes: 2d4bc933 ("netlink: extended ACK reporting")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7aa760f0
    • Hangbin Liu's avatar
      net/ipv6: use configured metric when add peer route · 53e404ed
      Hangbin Liu authored
      [ Upstream commit 07758eb9 ]
      
      When we add peer address with metric configured, IPv4 could set the dest
      metric correctly, but IPv6 do not. e.g.
      
      ]# ip addr add 192.0.2.1 peer 192.0.2.2/32 dev eth1 metric 20
      ]# ip route show dev eth1
      192.0.2.2 proto kernel scope link src 192.0.2.1 metric 20
      ]# ip addr add 2001:db8::1 peer 2001:db8::2/128 dev eth1 metric 20
      ]# ip -6 route show dev eth1
      2001:db8::1 proto kernel metric 20 pref medium
      2001:db8::2 proto kernel metric 256 pref medium
      
      Fix this by using configured metric instead of default one.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Fixes: 8308f3ff ("net/ipv6: Add support for specifying metric of connected routes")
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53e404ed
    • Mahesh Bandewar's avatar
      ipvlan: don't deref eth hdr before checking it's set · eb273bb8
      Mahesh Bandewar authored
      [ Upstream commit ad819276 ]
      
      IPvlan in L3 mode discards outbound multicast packets but performs
      the check before ensuring the ether-header is set or not. This is
      an error that Eric found through code browsing.
      
      Fixes: 2ad7bf36 (“ipvlan: Initial check-in of the IPVLAN driver.”)
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb273bb8
    • Eric Dumazet's avatar
      ipvlan: do not use cond_resched_rcu() in ipvlan_process_multicast() · cb9e7197
      Eric Dumazet authored
      [ Upstream commit afe207d8 ]
      
      Commit e18b353f ("ipvlan: add cond_resched_rcu() while
      processing muticast backlog") added a cond_resched_rcu() in a loop
      using rcu protection to iterate over slaves.
      
      This is breaking rcu rules, so lets instead use cond_resched()
      at a point we can reschedule
      
      Fixes: e18b353f ("ipvlan: add cond_resched_rcu() while processing muticast backlog")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb9e7197
    • Jiri Wiesner's avatar
      ipvlan: do not add hardware address of master to its unicast filter list · ee98e615
      Jiri Wiesner authored
      [ Upstream commit 63aae7b1 ]
      
      There is a problem when ipvlan slaves are created on a master device that
      is a vmxnet3 device (ipvlan in VMware guests). The vmxnet3 driver does not
      support unicast address filtering. When an ipvlan device is brought up in
      ipvlan_open(), the ipvlan driver calls dev_uc_add() to add the hardware
      address of the vmxnet3 master device to the unicast address list of the
      master device, phy_dev->uc. This inevitably leads to the vmxnet3 master
      device being forced into promiscuous mode by __dev_set_rx_mode().
      
      Promiscuous mode is switched on the master despite the fact that there is
      still only one hardware address that the master device should use for
      filtering in order for the ipvlan device to be able to receive packets.
      The comment above struct net_device describes the uc_promisc member as a
      "counter, that indicates, that promiscuous mode has been enabled due to
      the need to listen to additional unicast addresses in a device that does
      not implement ndo_set_rx_mode()". Moreover, the design of ipvlan
      guarantees that only the hardware address of a master device,
      phy_dev->dev_addr, will be used to transmit and receive all packets from
      its ipvlan slaves. Thus, the unicast address list of the master device
      should not be modified by ipvlan_open() and ipvlan_stop() in order to make
      ipvlan a workable option on masters that do not support unicast address
      filtering.
      
      Fixes: 2ad7bf36 ("ipvlan: Initial check-in of the IPVLAN driver")
      Reported-by: default avatarPer Sundstrom <per.sundstrom@redqube.se>
      Signed-off-by: default avatarJiri Wiesner <jwiesner@suse.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee98e615
    • Mahesh Bandewar's avatar
      ipvlan: add cond_resched_rcu() while processing muticast backlog · 79a958d8
      Mahesh Bandewar authored
      [ Upstream commit e18b353f ]
      
      If there are substantial number of slaves created as simulated by
      Syzbot, the backlog processing could take much longer and result
      into the issue found in the Syzbot report.
      
      INFO: rcu_sched detected stalls on CPUs/tasks:
              (detected by 1, t=10502 jiffies, g=5049, c=5048, q=752)
      All QSes seen, last rcu_sched kthread activity 10502 (4294965563-4294955061), jiffies_till_next_fqs=1, root ->qsmask 0x0
      syz-executor.1  R  running task on cpu   1  10984 11210   3866 0x30020008 179034491270
      Call Trace:
       <IRQ>
       [<ffffffff81497163>] _sched_show_task kernel/sched/core.c:8063 [inline]
       [<ffffffff81497163>] _sched_show_task.cold+0x2fd/0x392 kernel/sched/core.c:8030
       [<ffffffff8146a91b>] sched_show_task+0xb/0x10 kernel/sched/core.c:8073
       [<ffffffff815c931b>] print_other_cpu_stall kernel/rcu/tree.c:1577 [inline]
       [<ffffffff815c931b>] check_cpu_stall kernel/rcu/tree.c:1695 [inline]
       [<ffffffff815c931b>] __rcu_pending kernel/rcu/tree.c:3478 [inline]
       [<ffffffff815c931b>] rcu_pending kernel/rcu/tree.c:3540 [inline]
       [<ffffffff815c931b>] rcu_check_callbacks.cold+0xbb4/0xc29 kernel/rcu/tree.c:2876
       [<ffffffff815e3962>] update_process_times+0x32/0x80 kernel/time/timer.c:1635
       [<ffffffff816164f0>] tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:161
       [<ffffffff81616ae4>] tick_sched_timer+0x44/0x130 kernel/time/tick-sched.c:1193
       [<ffffffff815e75f7>] __run_hrtimer kernel/time/hrtimer.c:1393 [inline]
       [<ffffffff815e75f7>] __hrtimer_run_queues+0x307/0xd90 kernel/time/hrtimer.c:1455
       [<ffffffff815e90ea>] hrtimer_interrupt+0x2ea/0x730 kernel/time/hrtimer.c:1513
       [<ffffffff844050f4>] local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1031 [inline]
       [<ffffffff844050f4>] smp_apic_timer_interrupt+0x144/0x5e0 arch/x86/kernel/apic/apic.c:1056
       [<ffffffff84401cbe>] apic_timer_interrupt+0x8e/0xa0 arch/x86/entry/entry_64.S:778
      RIP: 0010:do_raw_read_lock+0x22/0x80 kernel/locking/spinlock_debug.c:153
      RSP: 0018:ffff8801dad07ab8 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff12
      RAX: 0000000000000000 RBX: ffff8801c4135680 RCX: 0000000000000000
      RDX: 1ffff10038826afe RSI: ffff88019d816bb8 RDI: ffff8801c41357f0
      RBP: ffff8801dad07ac0 R08: 0000000000004b15 R09: 0000000000310273
      R10: ffff88019d816bb8 R11: 0000000000000001 R12: ffff8801c41357e8
      R13: 0000000000000000 R14: ffff8801cfb19850 R15: ffff8801cfb198b0
       [<ffffffff8101460e>] __raw_read_lock_bh include/linux/rwlock_api_smp.h:177 [inline]
       [<ffffffff8101460e>] _raw_read_lock_bh+0x3e/0x50 kernel/locking/spinlock.c:240
       [<ffffffff840d78ca>] ipv6_chk_mcast_addr+0x11a/0x6f0 net/ipv6/mcast.c:1006
       [<ffffffff84023439>] ip6_mc_input+0x319/0x8e0 net/ipv6/ip6_input.c:482
       [<ffffffff840211c8>] dst_input include/net/dst.h:449 [inline]
       [<ffffffff840211c8>] ip6_rcv_finish+0x408/0x610 net/ipv6/ip6_input.c:78
       [<ffffffff840214de>] NF_HOOK include/linux/netfilter.h:292 [inline]
       [<ffffffff840214de>] NF_HOOK include/linux/netfilter.h:286 [inline]
       [<ffffffff840214de>] ipv6_rcv+0x10e/0x420 net/ipv6/ip6_input.c:278
       [<ffffffff83a29efa>] __netif_receive_skb_one_core+0x12a/0x1f0 net/core/dev.c:5303
       [<ffffffff83a2a15c>] __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:5417
       [<ffffffff83a2f536>] process_backlog+0x216/0x6c0 net/core/dev.c:6243
       [<ffffffff83a30d1b>] napi_poll net/core/dev.c:6680 [inline]
       [<ffffffff83a30d1b>] net_rx_action+0x47b/0xfb0 net/core/dev.c:6748
       [<ffffffff846002c8>] __do_softirq+0x2c8/0x99a kernel/softirq.c:317
       [<ffffffff813e656a>] invoke_softirq kernel/softirq.c:399 [inline]
       [<ffffffff813e656a>] irq_exit+0x16a/0x1a0 kernel/softirq.c:439
       [<ffffffff84405115>] exiting_irq arch/x86/include/asm/apic.h:561 [inline]
       [<ffffffff84405115>] smp_apic_timer_interrupt+0x165/0x5e0 arch/x86/kernel/apic/apic.c:1058
       [<ffffffff84401cbe>] apic_timer_interrupt+0x8e/0xa0 arch/x86/entry/entry_64.S:778
       </IRQ>
      RIP: 0010:__sanitizer_cov_trace_pc+0x26/0x50 kernel/kcov.c:102
      RSP: 0018:ffff880196033bd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff12
      RAX: ffff88019d8161c0 RBX: 00000000ffffffff RCX: ffffc90003501000
      RDX: 0000000000000002 RSI: ffffffff816236d1 RDI: 0000000000000005
      RBP: ffff880196033bd8 R08: ffff88019d8161c0 R09: 0000000000000000
      R10: 1ffff10032c067f0 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000080 R14: 0000000000000000 R15: 0000000000000000
       [<ffffffff816236d1>] do_futex+0x151/0x1d50 kernel/futex.c:3548
       [<ffffffff816260f0>] C_SYSC_futex kernel/futex_compat.c:201 [inline]
       [<ffffffff816260f0>] compat_SyS_futex+0x270/0x3b0 kernel/futex_compat.c:175
       [<ffffffff8101da17>] do_syscall_32_irqs_on arch/x86/entry/common.c:353 [inline]
       [<ffffffff8101da17>] do_fast_syscall_32+0x357/0xe1c arch/x86/entry/common.c:415
       [<ffffffff84401a9b>] entry_SYSENTER_compat+0x8b/0x9d arch/x86/entry/entry_64_compat.S:139
      RIP: 0023:0xf7f23c69
      RSP: 002b:00000000f5d1f12c EFLAGS: 00000282 ORIG_RAX: 00000000000000f0
      RAX: ffffffffffffffda RBX: 000000000816af88 RCX: 0000000000000080
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000816af8c
      RBP: 00000000f5d1f228 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      rcu_sched kthread starved for 10502 jiffies! g5049 c5048 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=1
      rcu_sched       R  running task on cpu   1  13048     8      2 0x90000000 179099587640
      Call Trace:
       [<ffffffff8147321f>] context_switch+0x60f/0xa60 kernel/sched/core.c:3209
       [<ffffffff8100095a>] __schedule+0x5aa/0x1da0 kernel/sched/core.c:3934
       [<ffffffff810021df>] schedule+0x8f/0x1b0 kernel/sched/core.c:4011
       [<ffffffff8101116d>] schedule_timeout+0x50d/0xee0 kernel/time/timer.c:1803
       [<ffffffff815c13f1>] rcu_gp_kthread+0xda1/0x3b50 kernel/rcu/tree.c:2327
       [<ffffffff8144b318>] kthread+0x348/0x420 kernel/kthread.c:246
       [<ffffffff84400266>] ret_from_fork+0x56/0x70 arch/x86/entry/entry_64.S:393
      
      Fixes: ba35f858 (“ipvlan: Defer multicast / broadcast processing to a work-queue”)
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79a958d8
    • Hangbin Liu's avatar
      ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interface · 3d95a5e3
      Hangbin Liu authored
      [ Upstream commit 60380488 ]
      
      Rafał found an issue that for non-Ethernet interface, if we down and up
      frequently, the memory will be consumed slowly.
      
      The reason is we add allnodes/allrouters addressed in multicast list in
      ipv6_add_dev(). When link down, we call ipv6_mc_down(), store all multicast
      addresses via mld_add_delrec(). But when link up, we don't call ipv6_mc_up()
      for non-Ethernet interface to remove the addresses. This makes idev->mc_tomb
      getting bigger and bigger. The call stack looks like:
      
      addrconf_notify(NETDEV_REGISTER)
      	ipv6_add_dev
      		ipv6_dev_mc_inc(ff01::1)
      		ipv6_dev_mc_inc(ff02::1)
      		ipv6_dev_mc_inc(ff02::2)
      
      addrconf_notify(NETDEV_UP)
      	addrconf_dev_config
      		/* Alas, we support only Ethernet autoconfiguration. */
      		return;
      
      addrconf_notify(NETDEV_DOWN)
      	addrconf_ifdown
      		ipv6_mc_down
      			igmp6_group_dropped(ff02::2)
      				mld_add_delrec(ff02::2)
      			igmp6_group_dropped(ff02::1)
      			igmp6_group_dropped(ff01::1)
      
      After investigating, I can't found a rule to disable multicast on
      non-Ethernet interface. In RFC2460, the link could be Ethernet, PPP, ATM,
      tunnels, etc. In IPv4, it doesn't check the dev type when calls ip_mc_up()
      in inetdev_event(). Even for IPv6, we don't check the dev type and call
      ipv6_add_dev(), ipv6_dev_mc_inc() after register device.
      
      So I think it's OK to fix this memory consumer by calling ipv6_mc_up() for
      non-Ethernet interface.
      
      v2: Also check IFF_MULTICAST flag to make sure the interface supports
          multicast
      Reported-by: default avatarRafał Miłecki <zajec5@gmail.com>
      Tested-by: default avatarRafał Miłecki <zajec5@gmail.com>
      Fixes: 74235a25 ("[IPV6] addrconf: Fix IPv6 on tuntap tunnels")
      Fixes: 1666d49e ("mld: do not remove mld souce list info when set link down")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d95a5e3
    • Dmitry Yakunin's avatar
      inet_diag: return classid for all socket types · 24dd755f
      Dmitry Yakunin authored
      [ Upstream commit 83f73c5b ]
      
      In commit 1ec17dbd ("inet_diag: fix reporting cgroup classid and
      fallback to priority") croup classid reporting was fixed. But this works
      only for TCP sockets because for other socket types icsk parameter can
      be NULL and classid code path is skipped. This change moves classid
      handling to inet_diag_msg_attrs_fill() function.
      
      Also inet_diag_msg_attrs_size() helper was added and addends in
      nlmsg_new() were reordered to save order from inet_sk_diag_fill().
      
      Fixes: 1ec17dbd ("inet_diag: fix reporting cgroup classid and fallback to priority")
      Signed-off-by: default avatarDmitry Yakunin <zeil@yandex-team.ru>
      Reviewed-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24dd755f
    • Eric Dumazet's avatar
      gre: fix uninit-value in __iptunnel_pull_header · 33f0d95c
      Eric Dumazet authored
      [ Upstream commit 17c25caf ]
      
      syzbot found an interesting case of the kernel reading
      an uninit-value [1]
      
      Problem is in the handling of ETH_P_WCCP in gre_parse_header()
      
      We look at the byte following GRE options to eventually decide
      if the options are four bytes longer.
      
      Use skb_header_pointer() to not pull bytes if we found
      that no more bytes were needed.
      
      All callers of gre_parse_header() are properly using pskb_may_pull()
      anyway before proceeding to next header.
      
      [1]
      BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2303 [inline]
      BUG: KMSAN: uninit-value in __iptunnel_pull_header+0x30c/0xbd0 net/ipv4/ip_tunnel_core.c:94
      CPU: 1 PID: 11784 Comm: syz-executor940 Not tainted 5.6.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x220 lib/dump_stack.c:118
       kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
       __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
       pskb_may_pull include/linux/skbuff.h:2303 [inline]
       __iptunnel_pull_header+0x30c/0xbd0 net/ipv4/ip_tunnel_core.c:94
       iptunnel_pull_header include/net/ip_tunnels.h:411 [inline]
       gre_rcv+0x15e/0x19c0 net/ipv6/ip6_gre.c:606
       ip6_protocol_deliver_rcu+0x181b/0x22c0 net/ipv6/ip6_input.c:432
       ip6_input_finish net/ipv6/ip6_input.c:473 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip6_input net/ipv6/ip6_input.c:482 [inline]
       ip6_mc_input+0xdf2/0x1460 net/ipv6/ip6_input.c:576
       dst_input include/net/dst.h:442 [inline]
       ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ipv6_rcv+0x683/0x710 net/ipv6/ip6_input.c:306
       __netif_receive_skb_one_core net/core/dev.c:5198 [inline]
       __netif_receive_skb net/core/dev.c:5312 [inline]
       netif_receive_skb_internal net/core/dev.c:5402 [inline]
       netif_receive_skb+0x66b/0xf20 net/core/dev.c:5461
       tun_rx_batched include/linux/skbuff.h:4321 [inline]
       tun_get_user+0x6aef/0x6f60 drivers/net/tun.c:1997
       tun_chr_write_iter+0x1f2/0x360 drivers/net/tun.c:2026
       call_write_iter include/linux/fs.h:1901 [inline]
       new_sync_write fs/read_write.c:483 [inline]
       __vfs_write+0xa5a/0xca0 fs/read_write.c:496
       vfs_write+0x44a/0x8f0 fs/read_write.c:558
       ksys_write+0x267/0x450 fs/read_write.c:611
       __do_sys_write fs/read_write.c:623 [inline]
       __se_sys_write fs/read_write.c:620 [inline]
       __ia32_sys_write+0xdb/0x120 fs/read_write.c:620
       do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline]
       do_fast_syscall_32+0x3c7/0x6e0 arch/x86/entry/common.c:410
       entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139
      RIP: 0023:0xf7f62d99
      Code: 90 e8 0b 00 00 00 f3 90 0f ae e8 eb f9 8d 74 26 00 89 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
      RSP: 002b:00000000fffedb2c EFLAGS: 00000217 ORIG_RAX: 0000000000000004
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020002580
      RDX: 0000000000000fca RSI: 0000000000000036 RDI: 0000000000000004
      RBP: 0000000000008914 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline]
       kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127
       kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82
       slab_alloc_node mm/slub.c:2793 [inline]
       __kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4401
       __kmalloc_reserve net/core/skbuff.c:142 [inline]
       __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:210
       alloc_skb include/linux/skbuff.h:1051 [inline]
       alloc_skb_with_frags+0x18c/0xa70 net/core/skbuff.c:5766
       sock_alloc_send_pskb+0xada/0xc60 net/core/sock.c:2242
       tun_alloc_skb drivers/net/tun.c:1529 [inline]
       tun_get_user+0x10ae/0x6f60 drivers/net/tun.c:1843
       tun_chr_write_iter+0x1f2/0x360 drivers/net/tun.c:2026
       call_write_iter include/linux/fs.h:1901 [inline]
       new_sync_write fs/read_write.c:483 [inline]
       __vfs_write+0xa5a/0xca0 fs/read_write.c:496
       vfs_write+0x44a/0x8f0 fs/read_write.c:558
       ksys_write+0x267/0x450 fs/read_write.c:611
       __do_sys_write fs/read_write.c:623 [inline]
       __se_sys_write fs/read_write.c:620 [inline]
       __ia32_sys_write+0xdb/0x120 fs/read_write.c:620
       do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline]
       do_fast_syscall_32+0x3c7/0x6e0 arch/x86/entry/common.c:410
       entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139
      
      Fixes: 95f5c64c ("gre: Move utility functions to common headers")
      Fixes: c5441932 ("GRE: Refactor GRE tunneling code.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33f0d95c
    • Dmitry Yakunin's avatar
      cgroup, netclassid: periodically release file_lock on classid updating · d6d11db2
      Dmitry Yakunin authored
      [ Upstream commit 018d26fc ]
      
      In our production environment we have faced with problem that updating
      classid in cgroup with heavy tasks cause long freeze of the file tables
      in this tasks. By heavy tasks we understand tasks with many threads and
      opened sockets (e.g. balancers). This freeze leads to an increase number
      of client timeouts.
      
      This patch implements following logic to fix this issue:
      аfter iterating 1000 file descriptors file table lock will be released
      thus providing a time gap for socket creation/deletion.
      
      Now update is non atomic and socket may be skipped using calls:
      
      dup2(oldfd, newfd);
      close(oldfd);
      
      But this case is not typical. Moreover before this patch skip is possible
      too by hiding socket fd in unix socket buffer.
      
      New sockets will be allocated with updated classid because cgroup state
      is updated before start of the file descriptors iteration.
      
      So in common cases this patch has no side effects.
      Signed-off-by: default avatarDmitry Yakunin <zeil@yandex-team.ru>
      Reviewed-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6d11db2
    • Florian Fainelli's avatar
      net: phy: Avoid multiple suspends · ba389f36
      Florian Fainelli authored
      commit 503ba7c6 upstream.
      
      It is currently possible for a PHY device to be suspended as part of a
      network device driver's suspend call while it is still being attached to
      that net_device, either via phy_suspend() or implicitly via phy_stop().
      
      Later on, when the MDIO bus controller get suspended, we would attempt
      to suspend again the PHY because it is still attached to a network
      device.
      
      This is both a waste of time and creates an opportunity for improper
      clock/power management bugs to creep in.
      
      Fixes: 803dd9c7 ("net: phy: avoid suspending twice a PHY")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba389f36
    • David S. Miller's avatar
      phy: Revert toggling reset changes. · 458c058c
      David S. Miller authored
      commit 7b566f70 upstream.
      
      This reverts:
      
      ef1b5bf5 ("net: phy: Fix not to call phy_resume() if PHY is not attached")
      8c85f4b8 ("net: phy: micrel: add toggling phy reset if PHY is not  attached")
      
      Andrew Lunn informs me that there are alternative efforts
      underway to fix this more properly.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [just take the ef1b5bf5 revert - gregkh]
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      458c058c
  2. 16 Mar, 2020 1 commit