1. 21 Dec, 2019 8 commits
    • Aaron Conole's avatar
      openvswitch: support asymmetric conntrack · 6f99afcc
      Aaron Conole authored
      [ Upstream commit 5d50aa83 ]
      
      The openvswitch module shares a common conntrack and NAT infrastructure
      exposed via netfilter.  It's possible that a packet needs both SNAT and
      DNAT manipulation, due to e.g. tuple collision.  Netfilter can support
      this because it runs through the NAT table twice - once on ingress and
      again after egress.  The openvswitch module doesn't have such capability.
      
      Like netfilter hook infrastructure, we should run through NAT twice to
      keep the symmetry.
      
      Fixes: 05752523 ("openvswitch: Interface with NAT.")
      Signed-off-by: default avatarAaron Conole <aconole@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f99afcc
    • Mian Yousaf Kaukab's avatar
      net: thunderx: start phy before starting autonegotiation · 13156081
      Mian Yousaf Kaukab authored
      [ Upstream commit a350d2e7 ]
      
      Since commit 2b3e88ea ("net: phy: improve phy state checking")
      phy_start_aneg() expects phy state to be >= PHY_UP. Call phy_start()
      before calling phy_start_aneg() during probe so that autonegotiation
      is initiated.
      
      As phy_start() takes care of calling phy_start_aneg(), drop the explicit
      call to phy_start_aneg().
      
      Network fails without this patch on Octeon TX.
      
      Fixes: 2b3e88ea ("net: phy: improve phy state checking")
      Signed-off-by: default avatarMian Yousaf Kaukab <ykaukab@suse.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13156081
    • Dust Li's avatar
      net: sched: fix dump qlen for sch_mq/sch_mqprio with NOLOCK subqueues · 0c5a4dd6
      Dust Li authored
      [ Upstream commit 2f23cd42 ]
      
      sch->q.len hasn't been set if the subqueue is a NOLOCK qdisc
       in mq_dump() and mqprio_dump().
      
      Fixes: ce679e8d ("net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio")
      Signed-off-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c5a4dd6
    • Grygorii Strashko's avatar
      net: ethernet: ti: cpsw: fix extra rx interrupt · 64334e4f
      Grygorii Strashko authored
      [ Upstream commit 51302f77 ]
      
      Now RX interrupt is triggered twice every time, because in
      cpsw_rx_interrupt() it is asked first and then disabled. So there will be
      pending interrupt always, when RX interrupt is enabled again in NAPI
      handler.
      
      Fix it by first disabling IRQ and then do ask.
      
      Fixes: 870915fe ("drivers: net: cpsw: remove disable_irq/enable_irq as irq can be masked from cpsw itself")
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64334e4f
    • Alexander Lobakin's avatar
      net: dsa: fix flow dissection on Tx path · a7d80e75
      Alexander Lobakin authored
      [ Upstream commit 8bef0af0 ]
      
      Commit 43e66528 ("net-next: dsa: fix flow dissection") added an
      ability to override protocol and network offset during flow dissection
      for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
      in order to fix skb hashing for RPS on Rx path.
      
      However, skb_hash() and added part of code can be invoked not only on
      Rx, but also on Tx path if we have a multi-queued device and:
       - kernel is running on UP system or
       - XPS is not configured.
      
      The call stack in this two cases will be like: dev_queue_xmit() ->
      __dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
      skb_tx_hash() -> skb_get_hash().
      
      The problem is that skbs queued for Tx have both network offset and
      correct protocol already set up even after inserting a CPU tag by DSA
      tagger, so calling tag_ops->flow_dissect() on this path actually only
      breaks flow dissection and hashing.
      
      This can be observed by adding debug prints just before and right after
      tag_ops->flow_dissect() call to the related block of code:
      
      Before the patch:
      
      Rx path (RPS):
      
      [   19.240001] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   19.244271] tag_ops->flow_dissect()
      [   19.247811] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */
      
      [   19.215435] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   19.219746] tag_ops->flow_dissect()
      [   19.223241] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */
      
      [   18.654057] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   18.658332] tag_ops->flow_dissect()
      [   18.661826] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */
      
      Tx path (UP system):
      
      [   18.759560] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
      [   18.763933] tag_ops->flow_dissect()
      [   18.767485] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      [   22.800020] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
      [   22.804392] tag_ops->flow_dissect()
      [   22.807921] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      [   16.898342] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
      [   16.902705] tag_ops->flow_dissect()
      [   16.906227] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      After:
      
      Rx path (RPS):
      
      [   16.520993] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   16.525260] tag_ops->flow_dissect()
      [   16.528808] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */
      
      [   15.484807] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   15.490417] tag_ops->flow_dissect()
      [   15.495223] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */
      
      [   17.134621] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   17.138895] tag_ops->flow_dissect()
      [   17.142388] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */
      
      Tx path (UP system):
      
      [   15.499558] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
      
      [   20.664689] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
      
      [   18.565782] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
      
      In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
      to prevent code from calling tag_ops->flow_dissect() on Tx.
      I also decided to initialize 'offset' variable so tagger callbacks can
      now safely leave it untouched without provoking a chaos.
      
      Fixes: 43e66528 ("net-next: dsa: fix flow dissection")
      Signed-off-by: default avatarAlexander Lobakin <alobakin@dlink.ru>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7d80e75
    • Nikolay Aleksandrov's avatar
      net: bridge: deny dev_set_mac_address() when unregistering · bb168ebe
      Nikolay Aleksandrov authored
      [ Upstream commit c4b4c421 ]
      
      We have an interesting memory leak in the bridge when it is being
      unregistered and is a slave to a master device which would change the
      mac of its slaves on unregister (e.g. bond, team). This is a very
      unusual setup but we do end up leaking 1 fdb entry because
      dev_set_mac_address() would cause the bridge to insert the new mac address
      into its table after all fdbs are flushed, i.e. after dellink() on the
      bridge has finished and we call NETDEV_UNREGISTER the bond/team would
      release it and will call dev_set_mac_address() to restore its original
      address and that in turn will add an fdb in the bridge.
      One fix is to check for the bridge dev's reg_state in its
      ndo_set_mac_address callback and return an error if the bridge is not in
      NETREG_REGISTERED.
      
      Easy steps to reproduce:
       1. add bond in mode != A/B
       2. add any slave to the bond
       3. add bridge dev as a slave to the bond
       4. destroy the bridge device
      
      Trace:
       unreferenced object 0xffff888035c4d080 (size 128):
         comm "ip", pid 4068, jiffies 4296209429 (age 1413.753s)
         hex dump (first 32 bytes):
           41 1d c9 36 80 88 ff ff 00 00 00 00 00 00 00 00  A..6............
           d2 19 c9 5e 3f d7 00 00 00 00 00 00 00 00 00 00  ...^?...........
         backtrace:
           [<00000000ddb525dc>] kmem_cache_alloc+0x155/0x26f
           [<00000000633ff1e0>] fdb_create+0x21/0x486 [bridge]
           [<0000000092b17e9c>] fdb_insert+0x91/0xdc [bridge]
           [<00000000f2a0f0ff>] br_fdb_change_mac_address+0xb3/0x175 [bridge]
           [<000000001de02dbd>] br_stp_change_bridge_id+0xf/0xff [bridge]
           [<00000000ac0e32b1>] br_set_mac_address+0x76/0x99 [bridge]
           [<000000006846a77f>] dev_set_mac_address+0x63/0x9b
           [<00000000d30738fc>] __bond_release_one+0x3f6/0x455 [bonding]
           [<00000000fc7ec01d>] bond_netdev_event+0x2f2/0x400 [bonding]
           [<00000000305d7795>] notifier_call_chain+0x38/0x56
           [<0000000028885d4a>] call_netdevice_notifiers+0x1e/0x23
           [<000000008279477b>] rollback_registered_many+0x353/0x6a4
           [<0000000018ef753a>] unregister_netdevice_many+0x17/0x6f
           [<00000000ba854b7a>] rtnl_delete_link+0x3c/0x43
           [<00000000adf8618d>] rtnl_dellink+0x1dc/0x20a
           [<000000009b6395fd>] rtnetlink_rcv_msg+0x23d/0x268
      
      Fixes: 43598813 ("bridge: add local MAC address to forwarding table (v2)")
      Reported-by: syzbot+2add91c08eb181fea1bf@syzkaller.appspotmail.com
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb168ebe
    • Vladyslav Tarasiuk's avatar
      mqprio: Fix out-of-bounds access in mqprio_dump · 588fac83
      Vladyslav Tarasiuk authored
      [ Upstream commit 9f104c77 ]
      
      When user runs a command like
      tc qdisc add dev eth1 root mqprio
      KASAN stack-out-of-bounds warning is emitted.
      Currently, NLA_ALIGN macro used in mqprio_dump provides too large
      buffer size as argument for nla_put and memcpy down the call stack.
      The flow looks like this:
      1. nla_put expects exact object size as an argument;
      2. Later it provides this size to memcpy;
      3. To calculate correct padding for SKB, nla_put applies NLA_ALIGN
         macro itself.
      
      Therefore, NLA_ALIGN should not be applied to the nla_put parameter.
      Otherwise it will lead to out-of-bounds memory access in memcpy.
      
      Fixes: 4e8b86c0 ("mqprio: Introduce new hardware offload mode and shaper in mqprio")
      Signed-off-by: default avatarVladyslav Tarasiuk <vladyslavt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      588fac83
    • Eric Dumazet's avatar
      inet: protect against too small mtu values. · d80d67cd
      Eric Dumazet authored
      [ Upstream commit 501a90c9 ]
      
      syzbot was once again able to crash a host by setting a very small mtu
      on loopback device.
      
      Let's make inetdev_valid_mtu() available in include/net/ip.h,
      and use it in ip_setup_cork(), so that we protect both ip_append_page()
      and __ip_append_data()
      
      Also add a READ_ONCE() when the device mtu is read.
      
      Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(),
      even if other code paths might write over this field.
      
      Add a big comment in include/linux/netdevice.h about dev->mtu
      needing READ_ONCE()/WRITE_ONCE() annotations.
      
      Hopefully we will add the missing ones in followup patches.
      
      [1]
      
      refcount_t: saturated; leaking memory.
      WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x197/0x210 lib/dump_stack.c:118
       panic+0x2e3/0x75c kernel/panic.c:221
       __warn.cold+0x2f/0x3e kernel/panic.c:582
       report_bug+0x289/0x300 lib/bug.c:195
       fixup_bug arch/x86/kernel/traps.c:174 [inline]
       fixup_bug arch/x86/kernel/traps.c:169 [inline]
       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
       do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
       invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
      RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
      Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd <0f> 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89
      RSP: 0018:ffff88809689f550 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c
      RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1
      R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001
      R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40
       refcount_add include/linux/refcount.h:193 [inline]
       skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999
       sock_wmalloc+0xf1/0x120 net/core/sock.c:2096
       ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383
       udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276
       inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821
       kernel_sendpage+0x92/0xf0 net/socket.c:3794
       sock_sendpage+0x8b/0xc0 net/socket.c:936
       pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458
       splice_from_pipe_feed fs/splice.c:512 [inline]
       __splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636
       splice_from_pipe+0x108/0x170 fs/splice.c:671
       generic_splice_sendpage+0x3c/0x50 fs/splice.c:842
       do_splice_from fs/splice.c:861 [inline]
       direct_splice_actor+0x123/0x190 fs/splice.c:1035
       splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990
       do_splice_direct+0x1da/0x2a0 fs/splice.c:1078
       do_sendfile+0x597/0xd00 fs/read_write.c:1464
       __do_sys_sendfile64 fs/read_write.c:1525 [inline]
       __se_sys_sendfile64 fs/read_write.c:1511 [inline]
       __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x441409
      Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409
      RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005
      RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010
      R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180
      R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000
      Kernel Offset: disabled
      Rebooting in 86400 seconds..
      
      Fixes: 1470ddf7 ("inet: Remove explicit write references to sk/inet in ip_append_data")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d80d67cd
  2. 17 Dec, 2019 32 commits