1. 12 Oct, 2017 40 commits
    • Nikolay Aleksandrov's avatar
      net: rtnetlink: fix info leak in RTM_GETSTATS call · 95206ea3
      Nikolay Aleksandrov authored
      
      [ Upstream commit ce024f42 ]
      
      When RTM_GETSTATS was added the fields of its header struct were not all
      initialized when returning the result thus leaking 4 bytes of information
      to user-space per rtnl_fill_statsinfo call, so initialize them now. Thanks
      to Alexander Potapenko for the detailed report and bisection.
      Reported-by: default avatarAlexander Potapenko <glider@google.com>
      Fixes: 10c9ead9 ("rtnetlink: add new RTM_GETSTATS message to dump link stats")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95206ea3
    • Parthasarathy Bhuvaragan's avatar
      tipc: use only positive error codes in messages · 58b1b840
      Parthasarathy Bhuvaragan authored
      
      [ Upstream commit aad06212 ]
      
      In commit e3a77561 ("tipc: split up function tipc_msg_eval()"),
      we have updated the function tipc_msg_lookup_dest() to set the error
      codes to negative values at destination lookup failures. Thus when
      the function sets the error code to -TIPC_ERR_NO_NAME, its inserted
      into the 4 bit error field of the message header as 0xf instead of
      TIPC_ERR_NO_NAME (1). The value 0xf is an unknown error code.
      
      In this commit, we set only positive error code.
      
      Fixes: e3a77561 ("tipc: split up function tipc_msg_eval()")
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58b1b840
    • Xin Long's avatar
      ip6_tunnel: update mtu properly for ARPHRD_ETHER tunnel device in tx path · 09788d46
      Xin Long authored
      
      [ Upstream commit d41bb33b ]
      
      Now when updating mtu in tx path, it doesn't consider ARPHRD_ETHER tunnel
      device, like ip6gre_tap tunnel, for which it should also subtract ether
      header to get the correct mtu.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09788d46
    • Xin Long's avatar
      ip6_gre: ip6gre_tap device should keep dst · ab4da56f
      Xin Long authored
      
      [ Upstream commit 2d40557c ]
      
      The patch 'ip_gre: ipgre_tap device should keep dst' fixed
      a issue that ipgre_tap mtu couldn't be updated in tx path.
      
      The same fix is needed for ip6gre_tap as well.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab4da56f
    • Jason A. Donenfeld's avatar
      netlink: do not proceed if dump's start() errs · b4a11925
      Jason A. Donenfeld authored
      
      [ Upstream commit fef0035c ]
      
      Drivers that use the start method for netlink dumping rely on dumpit not
      being called if start fails. For example, ila_xlat.c allocates memory
      and assigns it to cb->args[0] in its start() function. It might fail to
      do that and return -ENOMEM instead. However, even when returning an
      error, dumpit will be called, which, in the example above, quickly
      dereferences the memory in cb->args[0], which will OOPS the kernel. This
      is but one example of how this goes wrong.
      
      Since start() has always been a function with an int return type, it
      therefore makes sense to use it properly, rather than ignoring it. This
      patch thus returns early and does not call dumpit() when start() fails.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Reviewed-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4a11925
    • Christoph Paasch's avatar
      net: Set sk_prot_creator when cloning sockets to the right proto · cf2eaf16
      Christoph Paasch authored
      
      [ Upstream commit 9d538fa6 ]
      
      sk->sk_prot and sk->sk_prot_creator can differ when the app uses
      IPV6_ADDRFORM (transforming an IPv6-socket to an IPv4-one).
      Which is why sk_prot_creator is there to make sure that sk_prot_free()
      does the kmem_cache_free() on the right kmem_cache slab.
      
      Now, if such a socket gets transformed back to a listening socket (using
      connect() with AF_UNSPEC) we will allocate an IPv4 tcp_sock through
      sk_clone_lock() when a new connection comes in. But sk_prot_creator will
      still point to the IPv6 kmem_cache (as everything got copied in
      sk_clone_lock()). When freeing, we will thus put this
      memory back into the IPv6 kmem_cache although it was allocated in the
      IPv4 cache. I have seen memory corruption happening because of this.
      
      With slub-debugging and MEMCG_KMEM enabled this gives the warning
      	"cache_from_obj: Wrong slab cache. TCPv6 but object is from TCP"
      
      A C-program to trigger this:
      
      void main(void)
      {
              int fd = socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP);
              int new_fd, newest_fd, client_fd;
              struct sockaddr_in6 bind_addr;
              struct sockaddr_in bind_addr4, client_addr1, client_addr2;
              struct sockaddr unsp;
              int val;
      
              memset(&bind_addr, 0, sizeof(bind_addr));
              bind_addr.sin6_family = AF_INET6;
              bind_addr.sin6_port = ntohs(42424);
      
              memset(&client_addr1, 0, sizeof(client_addr1));
              client_addr1.sin_family = AF_INET;
              client_addr1.sin_port = ntohs(42424);
              client_addr1.sin_addr.s_addr = inet_addr("127.0.0.1");
      
              memset(&client_addr2, 0, sizeof(client_addr2));
              client_addr2.sin_family = AF_INET;
              client_addr2.sin_port = ntohs(42421);
              client_addr2.sin_addr.s_addr = inet_addr("127.0.0.1");
      
              memset(&unsp, 0, sizeof(unsp));
              unsp.sa_family = AF_UNSPEC;
      
              bind(fd, (struct sockaddr *)&bind_addr, sizeof(bind_addr));
      
              listen(fd, 5);
      
              client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
              connect(client_fd, (struct sockaddr *)&client_addr1, sizeof(client_addr1));
              new_fd = accept(fd, NULL, NULL);
              close(fd);
      
              val = AF_INET;
              setsockopt(new_fd, SOL_IPV6, IPV6_ADDRFORM, &val, sizeof(val));
      
              connect(new_fd, &unsp, sizeof(unsp));
      
              memset(&bind_addr4, 0, sizeof(bind_addr4));
              bind_addr4.sin_family = AF_INET;
              bind_addr4.sin_port = ntohs(42421);
              bind(new_fd, (struct sockaddr *)&bind_addr4, sizeof(bind_addr4));
      
              listen(new_fd, 5);
      
              client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
              connect(client_fd, (struct sockaddr *)&client_addr2, sizeof(client_addr2));
      
              newest_fd = accept(new_fd, NULL, NULL);
              close(new_fd);
      
              close(client_fd);
              close(new_fd);
      }
      
      As far as I can see, this bug has been there since the beginning of the
      git-days.
      Signed-off-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf2eaf16
    • Willem de Bruijn's avatar
      packet: only test po->has_vnet_hdr once in packet_snd · 24ee394a
      Willem de Bruijn authored
      
      [ Upstream commit da7c9561 ]
      
      Packet socket option po->has_vnet_hdr can be updated concurrently with
      other operations if no ring is attached.
      
      Do not test the option twice in packet_snd, as the value may change in
      between calls. A race on setsockopt disable may cause a packet > mtu
      to be sent without having GSO options set.
      
      Fixes: bfd5f4a3 ("packet: Add GSO/csum offload support.")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24ee394a
    • Willem de Bruijn's avatar
      packet: in packet_do_bind, test fanout with bind_lock held · 0f22167d
      Willem de Bruijn authored
      
      [ Upstream commit 4971613c ]
      
      Once a socket has po->fanout set, it remains a member of the group
      until it is destroyed. The prot_hook must be constant and identical
      across sockets in the group.
      
      If fanout_add races with packet_do_bind between the test of po->fanout
      and taking the lock, the bind call may make type or dev inconsistent
      with that of the fanout group.
      
      Hold po->bind_lock when testing po->fanout to avoid this race.
      
      I had to introduce artificial delay (local_bh_enable) to actually
      observe the race.
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f22167d
    • Florian Fainelli's avatar
      net: dsa: Fix network device registration order · 6eab1f82
      Florian Fainelli authored
      
      [ Upstream commit e804441c ]
      
      We cannot be registering the network device first, then setting its
      carrier off and finally connecting it to a PHY, doing that leaves a
      window during which the carrier is at best inconsistent, and at worse
      the device is not usable without a down/up sequence since the network
      device is visible to user space with possibly no PHY device attached.
      
      Re-order steps so that they make logical sense. This fixes some devices
      where the port was not usable after e.g: an unbind then bind of the
      driver.
      
      Fixes: 0071f56e ("dsa: Register netdev before phy")
      Fixes: 91da11f8 ("net: Distributed Switch Architecture protocol support")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6eab1f82
    • Alexander Potapenko's avatar
      tun: bail out from tun_get_user() if the skb is empty · b8990d2e
      Alexander Potapenko authored
      
      [ Upstream commit 2580c4c1 ]
      
      KMSAN (https://github.com/google/kmsan) reported accessing uninitialized
      skb->data[0] in the case the skb is empty (i.e. skb->len is 0):
      
      ================================================
      BUG: KMSAN: use of uninitialized memory in tun_get_user+0x19ba/0x3770
      CPU: 0 PID: 3051 Comm: probe Not tainted 4.13.0+ #3140
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
      ...
       __msan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:477
       tun_get_user+0x19ba/0x3770 drivers/net/tun.c:1301
       tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365
       call_write_iter ./include/linux/fs.h:1743
       new_sync_write fs/read_write.c:457
       __vfs_write+0x6c3/0x7f0 fs/read_write.c:470
       vfs_write+0x3e4/0x770 fs/read_write.c:518
       SYSC_write+0x12f/0x2b0 fs/read_write.c:565
       SyS_write+0x55/0x80 fs/read_write.c:557
       do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284
       entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:245
      ...
      origin:
      ...
       kmsan_poison_shadow+0x6e/0xc0 mm/kmsan/kmsan.c:211
       slab_alloc_node mm/slub.c:2732
       __kmalloc_node_track_caller+0x351/0x370 mm/slub.c:4351
       __kmalloc_reserve net/core/skbuff.c:138
       __alloc_skb+0x26a/0x810 net/core/skbuff.c:231
       alloc_skb ./include/linux/skbuff.h:903
       alloc_skb_with_frags+0x1d7/0xc80 net/core/skbuff.c:4756
       sock_alloc_send_pskb+0xabf/0xfe0 net/core/sock.c:2037
       tun_alloc_skb drivers/net/tun.c:1144
       tun_get_user+0x9a8/0x3770 drivers/net/tun.c:1274
       tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365
       call_write_iter ./include/linux/fs.h:1743
       new_sync_write fs/read_write.c:457
       __vfs_write+0x6c3/0x7f0 fs/read_write.c:470
       vfs_write+0x3e4/0x770 fs/read_write.c:518
       SYSC_write+0x12f/0x2b0 fs/read_write.c:565
       SyS_write+0x55/0x80 fs/read_write.c:557
       do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284
       return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:245
      ================================================
      
      Make sure tun_get_user() doesn't touch skb->data[0] unless there is
      actual data.
      
      C reproducer below:
      ==========================
          // autogenerated by syzkaller (http://github.com/google/syzkaller)
      
          #define _GNU_SOURCE
      
          #include <fcntl.h>
          #include <linux/if_tun.h>
          #include <netinet/ip.h>
          #include <net/if.h>
          #include <string.h>
          #include <sys/ioctl.h>
      
          int main()
          {
            int sock = socket(PF_INET, SOCK_STREAM, IPPROTO_IP);
            int tun_fd = open("/dev/net/tun", O_RDWR);
            struct ifreq req;
            memset(&req, 0, sizeof(struct ifreq));
            strcpy((char*)&req.ifr_name, "gre0");
            req.ifr_flags = IFF_UP | IFF_MULTICAST;
            ioctl(tun_fd, TUNSETIFF, &req);
            ioctl(sock, SIOCSIFFLAGS, "gre0");
            write(tun_fd, "hi", 0);
            return 0;
          }
      ==========================
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8990d2e
    • Sabrina Dubroca's avatar
      l2tp: fix race condition in l2tp_tunnel_delete · b4a9b12d
      Sabrina Dubroca authored
      
      [ Upstream commit 62b982ee ]
      
      If we try to delete the same tunnel twice, the first delete operation
      does a lookup (l2tp_tunnel_get), finds the tunnel, calls
      l2tp_tunnel_delete, which queues it for deletion by
      l2tp_tunnel_del_work.
      
      The second delete operation also finds the tunnel and calls
      l2tp_tunnel_delete. If the workqueue has already fired and started
      running l2tp_tunnel_del_work, then l2tp_tunnel_delete will queue the
      same tunnel a second time, and try to free the socket again.
      
      Add a dead flag to prevent firing the workqueue twice. Then we can
      remove the check of queue_work's result that was meant to prevent that
      race but doesn't.
      
      Reproducer:
      
          ip l2tp add tunnel tunnel_id 3000 peer_tunnel_id 4000 local 192.168.0.2 remote 192.168.0.1 encap udp udp_sport 5000 udp_dport 6000
          ip l2tp add session name l2tp1 tunnel_id 3000 session_id 1000 peer_session_id 2000
          ip link set l2tp1 up
          ip l2tp del tunnel tunnel_id 3000
          ip l2tp del tunnel tunnel_id 3000
      
      Fixes: f8ccac0e ("l2tp: put tunnel socket release on a workqueue")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4a9b12d
    • Ridge Kennedy's avatar
      l2tp: Avoid schedule while atomic in exit_net · e5941137
      Ridge Kennedy authored
      
      [ Upstream commit 12d656af ]
      
      While destroying a network namespace that contains a L2TP tunnel a
      "BUG: scheduling while atomic" can be observed.
      
      Enabling lockdep shows that this is happening because l2tp_exit_net()
      is calling l2tp_tunnel_closeall() (via l2tp_tunnel_delete()) from
      within an RCU critical section.
      
      l2tp_exit_net() takes rcu_read_lock_bh()
        << list_for_each_entry_rcu() >>
        l2tp_tunnel_delete()
          l2tp_tunnel_closeall()
            __l2tp_session_unhash()
              synchronize_rcu() << Illegal inside RCU critical section >>
      
      BUG: sleeping function called from invalid context
      in_atomic(): 1, irqs_disabled(): 0, pid: 86, name: kworker/u16:2
      INFO: lockdep is turned off.
      CPU: 2 PID: 86 Comm: kworker/u16:2 Tainted: G        W  O    4.4.6-at1 #2
      Hardware name: Xen HVM domU, BIOS 4.6.1-xs125300 05/09/2016
      Workqueue: netns cleanup_net
       0000000000000000 ffff880202417b90 ffffffff812b0013 ffff880202410ac0
       ffffffff81870de8 ffff880202417bb8 ffffffff8107aee8 ffffffff81870de8
       0000000000000c51 0000000000000000 ffff880202417be0 ffffffff8107b024
      Call Trace:
       [<ffffffff812b0013>] dump_stack+0x85/0xc2
       [<ffffffff8107aee8>] ___might_sleep+0x148/0x240
       [<ffffffff8107b024>] __might_sleep+0x44/0x80
       [<ffffffff810b21bd>] synchronize_sched+0x2d/0xe0
       [<ffffffff8109be6d>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffff8105c7bb>] ? __local_bh_enable_ip+0x6b/0xc0
       [<ffffffff816a1b00>] ? _raw_spin_unlock_bh+0x30/0x40
       [<ffffffff81667482>] __l2tp_session_unhash+0x172/0x220
       [<ffffffff81667397>] ? __l2tp_session_unhash+0x87/0x220
       [<ffffffff8166888b>] l2tp_tunnel_closeall+0x9b/0x140
       [<ffffffff81668c74>] l2tp_tunnel_delete+0x14/0x60
       [<ffffffff81668dd0>] l2tp_exit_net+0x110/0x270
       [<ffffffff81668d5c>] ? l2tp_exit_net+0x9c/0x270
       [<ffffffff815001c3>] ops_exit_list.isra.6+0x33/0x60
       [<ffffffff81501166>] cleanup_net+0x1b6/0x280
       ...
      
      This bug can easily be reproduced with a few steps:
      
       $ sudo unshare -n bash  # Create a shell in a new namespace
       # ip link set lo up
       # ip addr add 127.0.0.1 dev lo
       # ip l2tp add tunnel remote 127.0.0.1 local 127.0.0.1 tunnel_id 1 \
          peer_tunnel_id 1 udp_sport 50000 udp_dport 50000
       # ip l2tp add session name foo tunnel_id 1 session_id 1 \
          peer_session_id 1
       # ip link set foo up
       # exit  # Exit the shell, in turn exiting the namespace
       $ dmesg
       ...
       [942121.089216] BUG: scheduling while atomic: kworker/u16:3/13872/0x00000200
       ...
      
      To fix this, move the call to l2tp_tunnel_closeall() out of the RCU
      critical section, and instead call it from l2tp_tunnel_del_work(), which
      is running from the l2tp_wq workqueue.
      
      Fixes: 2b551c6e ("l2tp: close sessions before initiating tunnel delete")
      Signed-off-by: default avatarRidge Kennedy <ridge.kennedy@alliedtelesis.co.nz>
      Acked-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5941137
    • Alexey Kodanev's avatar
      vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit · 6689f835
      Alexey Kodanev authored
      
      [ Upstream commit 36f6ee22 ]
      
      When running LTP IPsec tests, KASan might report:
      
      BUG: KASAN: use-after-free in vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
      Read of size 4 at addr ffff880dc6ad1980 by task swapper/0/0
      ...
      Call Trace:
        <IRQ>
        dump_stack+0x63/0x89
        print_address_description+0x7c/0x290
        kasan_report+0x28d/0x370
        ? vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
        __asan_report_load4_noabort+0x19/0x20
        vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
        ? vti_init_net+0x190/0x190 [ip_vti]
        ? save_stack_trace+0x1b/0x20
        ? save_stack+0x46/0xd0
        dev_hard_start_xmit+0x147/0x510
        ? icmp_echo.part.24+0x1f0/0x210
        __dev_queue_xmit+0x1394/0x1c60
      ...
      Freed by task 0:
        save_stack_trace+0x1b/0x20
        save_stack+0x46/0xd0
        kasan_slab_free+0x70/0xc0
        kmem_cache_free+0x81/0x1e0
        kfree_skbmem+0xb1/0xe0
        kfree_skb+0x75/0x170
        kfree_skb_list+0x3e/0x60
        __dev_queue_xmit+0x1298/0x1c60
        dev_queue_xmit+0x10/0x20
        neigh_resolve_output+0x3a8/0x740
        ip_finish_output2+0x5c0/0xe70
        ip_finish_output+0x4ba/0x680
        ip_output+0x1c1/0x3a0
        xfrm_output_resume+0xc65/0x13d0
        xfrm_output+0x1e4/0x380
        xfrm4_output_finish+0x5c/0x70
      
      Can be fixed if we get skb->len before dst_output().
      
      Fixes: b9959fd3 ("vti: switch to new ip tunnel code")
      Fixes: 22e1b23d ("vti6: Support inter address family tunneling.")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6689f835
    • Timur Tabi's avatar
      net: qcom/emac: specify the correct size when mapping a DMA buffer · 852bdea5
      Timur Tabi authored
      
      [ Upstream commit a93ad944 ]
      
      When mapping the RX DMA buffers, the driver was accidentally specifying
      zero for the buffer length.  Under normal circumstances, SWIOTLB does not
      need to allocate a bounce buffer, so the address is just mapped without
      checking the size field.  This is why the error was not detected earlier.
      
      Fixes: b9b17deb ("net: emac: emac gigabit ethernet controller driver")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTimur Tabi <timur@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      852bdea5
    • Konstantin Khlebnikov's avatar
      net_sched: always reset qdisc backlog in qdisc_reset() · 5600c758
      Konstantin Khlebnikov authored
      
      [ Upstream commit c8e18129 ]
      
      SKB stored in qdisc->gso_skb also counted into backlog.
      
      Some qdiscs don't reset backlog to zero in ->reset(),
      for example sfq just dequeue and free all queued skb.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Fixes: 2ccccf5f ("net_sched: update hierarchical backlog too")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5600c758
    • Meng Xu's avatar
      isdn/i4l: fetch the ppp_write buffer in one shot · 93eef217
      Meng Xu authored
      
      [ Upstream commit 02388bf8 ]
      
      In isdn_ppp_write(), the header (i.e., protobuf) of the buffer is
      fetched twice from userspace. The first fetch is used to peek at the
      protocol of the message and reset the huptimer if necessary; while the
      second fetch copies in the whole buffer. However, given that buf resides
      in userspace memory, a user process can race to change its memory content
      across fetches. By doing so, we can either avoid resetting the huptimer
      for any type of packets (by first setting proto to PPP_LCP and later
      change to the actual type) or force resetting the huptimer for LCP
      packets.
      
      This patch changes this double-fetch behavior into two single fetches
      decided by condition (lp->isdn_device < 0 || lp->isdn_channel <0).
      A more detailed discussion can be found at
      https://marc.info/?l=linux-kernel&m=150586376926123&w=2Signed-off-by: default avatarMeng Xu <mengxu.gatech@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93eef217
    • Yonghong Song's avatar
      bpf: one perf event close won't free bpf program attached by another perf event · 0dee549f
      Yonghong Song authored
      
      [ Upstream commit ec9dd352 ]
      
      This patch fixes a bug exhibited by the following scenario:
        1. fd1 = perf_event_open with attr.config = ID1
        2. attach bpf program prog1 to fd1
        3. fd2 = perf_event_open with attr.config = ID1
           <this will be successful>
        4. user program closes fd2 and prog1 is detached from the tracepoint.
        5. user program with fd1 does not work properly as tracepoint
           no output any more.
      
      The issue happens at step 4. Multiple perf_event_open can be called
      successfully, but only one bpf prog pointer in the tp_event. In the
      current logic, any fd release for the same tp_event will free
      the tp_event->prog.
      
      The fix is to free tp_event->prog only when the closing fd
      corresponds to the one which registered the program.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0dee549f
    • Willem de Bruijn's avatar
      packet: hold bind lock when rebinding to fanout hook · 6f7cdd4a
      Willem de Bruijn authored
      
      [ Upstream commit 008ba2a1 ]
      
      Packet socket bind operations must hold the po->bind_lock. This keeps
      po->running consistent with whether the socket is actually on a ptype
      list to receive packets.
      
      fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then
      binds the fanout object to receive through packet_rcv_fanout.
      
      Make it hold the po->bind_lock when testing po->running and rebinding.
      Else, it can race with other rebind operations, such as that in
      packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates
      can result in a socket being added to a fanout group twice, causing
      use-after-free KASAN bug reports, among others.
      
      Reported independently by both trinity and syzkaller.
      Verified that the syzkaller reproducer passes after this patch.
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Reported-by: default avatarnixioaming <nixiaoming@huawei.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f7cdd4a
    • Christian Lamparter's avatar
      net: emac: Fix napi poll list corruption · 6eac2cd2
      Christian Lamparter authored
      
      [ Upstream commit f5595606 ]
      
      This patch is pretty much a carbon copy of
      commit 3079c652 ("caif: Fix napi poll list corruption")
      with "caif" replaced by "emac".
      
      The commit d75b1ade ("net: less interrupt masking in NAPI")
      breaks emac.
      
      It is now required that if the entire budget is consumed when poll
      returns, the napi poll_list must remain empty.  However, like some
      other drivers emac tries to do a last-ditch check and if there is
      more work it will call napi_reschedule and then immediately process
      some of this new work.  Should the entire budget be consumed while
      processing such new work then we will violate the new caller
      contract.
      
      This patch fixes this by not touching any work when we reschedule
      in emac.
      Signed-off-by: default avatarChristian Lamparter <chunkeey@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6eac2cd2
    • Eric Dumazet's avatar
      tcp: fastopen: fix on syn-data transmit failure · b463521d
      Eric Dumazet authored
      
      [ Upstream commit b5b7db8d ]
      
      Our recent change exposed a bug in TCP Fastopen Client that syzkaller
      found right away [1]
      
      When we prepare skb with SYN+DATA, we attempt to transmit it,
      and we update socket state as if the transmit was a success.
      
      In socket RTX queue we have two skbs, one with the SYN alone,
      and a second one containing the DATA.
      
      When (malicious) ACK comes in, we now complain that second one had no
      skb_mstamp.
      
      The proper fix is to make sure that if the transmit failed, we do not
      pretend we sent the DATA skb, and make it our send_head.
      
      When 3WHS completes, we can now send the DATA right away, without having
      to wait for a timeout.
      
      [1]
      WARNING: CPU: 0 PID: 100189 at net/ipv4/tcp_input.c:3117 tcp_clean_rtx_queue+0x2057/0x2ab0 net/ipv4/tcp_input.c:3117()
      
       WARN_ON_ONCE(last_ackt == 0);
      
      Modules linked in:
      CPU: 0 PID: 100189 Comm: syz-executor1 Not tainted
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       0000000000000000 ffff8800b35cb1d8 ffffffff81cad00d 0000000000000000
       ffffffff828a4347 ffff88009f86c080 ffffffff8316eb20 0000000000000d7f
       ffff8800b35cb220 ffffffff812c33c2 ffff8800baad2440 00000009d46575c0
      Call Trace:
       [<ffffffff81cad00d>] __dump_stack
       [<ffffffff81cad00d>] dump_stack+0xc1/0x124
       [<ffffffff812c33c2>] warn_slowpath_common+0xe2/0x150
       [<ffffffff812c361e>] warn_slowpath_null+0x2e/0x40
       [<ffffffff828a4347>] tcp_clean_rtx_queue+0x2057/0x2ab0 n
       [<ffffffff828ae6fd>] tcp_ack+0x151d/0x3930
       [<ffffffff828baa09>] tcp_rcv_state_process+0x1c69/0x4fd0
       [<ffffffff828efb7f>] tcp_v4_do_rcv+0x54f/0x7c0
       [<ffffffff8258aacb>] sk_backlog_rcv
       [<ffffffff8258aacb>] __release_sock+0x12b/0x3a0
       [<ffffffff8258ad9e>] release_sock+0x5e/0x1c0
       [<ffffffff8294a785>] inet_wait_for_connect
       [<ffffffff8294a785>] __inet_stream_connect+0x545/0xc50
       [<ffffffff82886f08>] tcp_sendmsg_fastopen
       [<ffffffff82886f08>] tcp_sendmsg+0x2298/0x35a0
       [<ffffffff82952515>] inet_sendmsg+0xe5/0x520
       [<ffffffff8257152f>] sock_sendmsg_nosec
       [<ffffffff8257152f>] sock_sendmsg+0xcf/0x110
      
      Fixes: 8c72c65b ("tcp: update skb->skb_mstamp more carefully")
      Fixes: 783237e8 ("net-tcp: Fast Open client - sending SYN-data")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b463521d
    • Davide Caratti's avatar
      net/sched: cls_matchall: fix crash when used with classful qdisc · b13bc543
      Davide Caratti authored
      
      [ Upstream commit 3ff4cbec ]
      
      this script, edited from Linux Advanced Routing and Traffic Control guide
      
      tc q a dev en0 root handle 1: htb default a
      tc c a dev en0 parent 1:  classid 1:1 htb rate 6mbit burst 15k
      tc c a dev en0 parent 1:1 classid 1:a htb rate 5mbit ceil 6mbit burst 15k
      tc c a dev en0 parent 1:1 classid 1:b htb rate 1mbit ceil 6mbit burst 15k
      tc f a dev en0 parent 1:0 prio 1 $clsname $clsargs classid 1:b
      ping $address -c1
      tc -s c s dev en0
      
      classifies traffic to 1:b or 1:a, depending on whether the packet matches
      or not the pattern $clsargs of filter $clsname. However, when $clsname is
      'matchall', a systematic crash can be observed in htb_classify(). HTB and
      classful qdiscs don't assign initial value to struct tcf_result, but then
      they expect it to contain valid values after filters have been run. Thus,
      current 'matchall' ignores the TCA_MATCHALL_CLASSID attribute, configured
      by user, and makes HTB (and classful qdiscs) dereference random pointers.
      
      By assigning head->res to *res in mall_classify(), before the actions are
      invoked, we fix this crash and enable TCA_MATCHALL_CLASSID functionality,
      that had no effect on 'matchall' classifier since its first introduction.
      
      BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1460213Reported-by: default avatarJiri Benc <jbenc@redhat.com>
      Fixes: b87f7936 ("net/sched: introduce Match-all classifier")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b13bc543
    • Xin Long's avatar
      ip6_tunnel: do not allow loading ip6_tunnel if ipv6 is disabled in cmdline · 13c8bd7a
      Xin Long authored
      
      [ Upstream commit 8c22dab0 ]
      
      If ipv6 has been disabled from cmdline since kernel started, it makes
      no sense to allow users to create any ip6 tunnel. Otherwise, it could
      some potential problem.
      
      Jianlin found a kernel crash caused by this in ip6_gre when he set
      ipv6.disable=1 in grub:
      
      [  209.588865] Unable to handle kernel paging request for data at address 0x00000080
      [  209.588872] Faulting instruction address: 0xc000000000a3aa6c
      [  209.588879] Oops: Kernel access of bad area, sig: 11 [#1]
      [  209.589062] NIP [c000000000a3aa6c] fib_rules_lookup+0x4c/0x260
      [  209.589071] LR [c000000000b9ad90] fib6_rule_lookup+0x50/0xb0
      [  209.589076] Call Trace:
      [  209.589097] fib6_rule_lookup+0x50/0xb0
      [  209.589106] rt6_lookup+0xc4/0x110
      [  209.589116] ip6gre_tnl_link_config+0x214/0x2f0 [ip6_gre]
      [  209.589125] ip6gre_newlink+0x138/0x3a0 [ip6_gre]
      [  209.589134] rtnl_newlink+0x798/0xb80
      [  209.589142] rtnetlink_rcv_msg+0xec/0x390
      [  209.589151] netlink_rcv_skb+0x138/0x150
      [  209.589159] rtnetlink_rcv+0x48/0x70
      [  209.589169] netlink_unicast+0x538/0x640
      [  209.589175] netlink_sendmsg+0x40c/0x480
      [  209.589184] ___sys_sendmsg+0x384/0x4e0
      [  209.589194] SyS_sendmsg+0xd4/0x140
      [  209.589201] SyS_socketcall+0x3e0/0x4f0
      [  209.589209] system_call+0x38/0xe0
      
      This patch is to return -EOPNOTSUPP in ip6_tunnel_init if ipv6 has been
      disabled from cmdline.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13c8bd7a
    • Fahad Kunnathadi's avatar
      net: phy: Fix mask value write on gmii2rgmii converter speed register · fc2fe7a0
      Fahad Kunnathadi authored
      
      [ Upstream commit f2654a47 ]
      
      To clear Speed Selection in MDIO control register(0x10),
      ie, clear bits 6 and 13 to zero while keeping other bits same.
      Before AND operation,The Mask value has to be perform with bitwise NOT
      operation (ie, ~ operator)
      
      This patch clears current speed selection before writing the
      new speed settings to gmii2rgmii converter
      
      Fixes: f411a616 ("net: phy: Add gmiitorgmii converter support")
      Signed-off-by: default avatarFahad Kunnathadi <fahad.kunnathadi@dexceldesigns.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc2fe7a0
    • Xin Long's avatar
      ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header · e814bae3
      Xin Long authored
      
      [ Upstream commit 76cc0d32 ]
      
      Now in ip6gre_header before packing the ipv6 header, it skb_push t->hlen
      which only includes encap_hlen + tun_hlen. It means greh and inner header
      would be over written by ipv6 stuff and ipv6h might have no chance to set
      up.
      
      Jianlin found this issue when using remote any on ip6_gre, the packets he
      captured on gre dev are truncated:
      
      22:50:26.210866 Out ethertype IPv6 (0x86dd), length 120: truncated-ip6 -\
      8128 bytes missing!(flowlabel 0x92f40, hlim 0, next-header Options (0)  \
      payload length: 8192) ::1:2000:0 > ::1:0:86dd: HBH [trunc] ip-proto-128 \
      8184
      
      It should also skb_push ipv6hdr so that ipv6h points to the right position
      to set ipv6 stuff up.
      
      This patch is to skb_push hlen + sizeof(*ipv6h) and also fix some indents
      in ip6gre_header.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e814bae3
    • Subash Abhinov Kasiviswanathan's avatar
      udpv6: Fix the checksum computation when HW checksum does not apply · f0a5af78
      Subash Abhinov Kasiviswanathan authored
      
      [ Upstream commit 63ecc3d9 ]
      
      While trying an ESP transport mode encryption for UDPv6 packets of
      datagram size 1436 with MTU 1500, checksum error was observed in
      the secondary fragment.
      
      This error occurs due to the UDP payload checksum being missed out
      when computing the full checksum for these packets in
      udp6_hwcsum_outgoing().
      
      Fixes: d39d938c ("ipv6: Introduce udpv6_send_skb()")
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0a5af78
    • Eric Dumazet's avatar
      tcp: fix data delivery rate · 85908cca
      Eric Dumazet authored
      
      [ Upstream commit fc225799 ]
      
      Now skb->mstamp_skb is updated later, we also need to call
      tcp_rate_skb_sent() after the update is done.
      
      Fixes: 8c72c65b ("tcp: update skb->skb_mstamp more carefully")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85908cca
    • Edward Cree's avatar
      bpf/verifier: reject BPF_ALU64|BPF_END · e159492b
      Edward Cree authored
      
      [ Upstream commit e67b8a68 ]
      
      Neither ___bpf_prog_run nor the JITs accept it.
      Also adds a new test case.
      
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e159492b
    • Eric Dumazet's avatar
      tcp: update skb->skb_mstamp more carefully · 186a9c5e
      Eric Dumazet authored
      
      [ Upstream commit 8c72c65b ]
      
      liujian reported a problem in TCP_USER_TIMEOUT processing with a patch
      in tcp_probe_timer() :
            https://www.spinics.net/lists/netdev/msg454496.html
      
      After investigations, the root cause of the problem is that we update
      skb->skb_mstamp of skbs in write queue, even if the attempt to send a
      clone or copy of it failed. One reason being a routing problem.
      
      This patch prevents this, solving liujian issue.
      
      It also removes a potential RTT miscalculation, since
      __tcp_retransmit_skb() is not OR-ing TCP_SKB_CB(skb)->sacked with
      TCPCB_EVER_RETRANS if a failure happens, but skb->skb_mstamp has
      been changed.
      
      A future ACK would then lead to a very small RTT sample and min_rtt
      would then be lowered to this too small value.
      
      Tested:
      
      # cat user_timeout.pkt
      --local_ip=192.168.102.64
      
          0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
         +0 bind(3, ..., ...) = 0
         +0 listen(3, 1) = 0
      
         +0 `ifconfig tun0 192.168.102.64/16; ip ro add 192.0.2.1 dev tun0`
      
         +0 < S 0:0(0) win 0 <mss 1460>
         +0 > S. 0:0(0) ack 1 <mss 1460>
      
        +.1 < . 1:1(0) ack 1 win 65530
         +0 accept(3, ..., ...) = 4
      
         +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
         +0 write(4, ..., 24) = 24
         +0 > P. 1:25(24) ack 1 win 29200
         +.1 < . 1:1(0) ack 25 win 65530
      
      //change the ipaddress
         +1 `ifconfig tun0 192.168.0.10/16`
      
         +1 write(4, ..., 24) = 24
         +1 write(4, ..., 24) = 24
         +1 write(4, ..., 24) = 24
         +1 write(4, ..., 24) = 24
      
         +0 `ifconfig tun0 192.168.102.64/16`
         +0 < . 1:2(1) ack 25 win 65530
         +0 `ifconfig tun0 192.168.0.10/16`
      
         +3 write(4, ..., 24) = -1
      
      # ./packetdrill user_timeout.pkt
      Signed-off-by: default avatarEric Dumazet <edumazet@googl.com>
      Reported-by: default avatarliujian <liujian56@huawei.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      186a9c5e
    • Dan Carpenter's avatar
      sctp: potential read out of bounds in sctp_ulpevent_type_enabled() · b70bb9bb
      Dan Carpenter authored
      
      [ Upstream commit fa5f7b51 ]
      
      This code causes a static checker warning because Smatch doesn't trust
      anything that comes from skb->data.  I've reviewed this code and I do
      think skb->data can be controlled by the user here.
      
      The sctp_event_subscribe struct has 13 __u8 fields and we want to see
      if ours is non-zero.  sn_type can be any value in the 0-USHRT_MAX range.
      We're subtracting SCTP_SN_TYPE_BASE which is 1 << 15 so we could read
      either before the start of the struct or after the end.
      
      This is a very old bug and it's surprising that it would go undetected
      for so long but my theory is that it just doesn't have a big impact so
      it would be hard to notice.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b70bb9bb
    • Jiri Pirko's avatar
      net: sched: fix use-after-free in tcf_action_destroy and tcf_del_walker · f86d3b1a
      Jiri Pirko authored
      
      [ Upstream commit 255cd50f ]
      
      Recent commit d7fb60b9 ("net_sched: get rid of tcfa_rcu") removed
      freeing in call_rcu, which changed already existing hard-to-hit
      race condition into 100% hit:
      
      [  598.599825] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
      [  598.607782] IP: tcf_action_destroy+0xc0/0x140
      
      Or:
      
      [   40.858924] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
      [   40.862840] IP: tcf_generic_walker+0x534/0x820
      
      Fix this by storing the ops and use them directly for module_put call.
      
      Fixes: a85a970a ("net_sched: move tc_action into tcf_common")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f86d3b1a
    • Yuval Mintz's avatar
      mlxsw: spectrum: Prevent mirred-related crash on removal · f860ca54
      Yuval Mintz authored
      
      [ Upstream commit 6399ebcc ]
      
      When removing the offloading of mirred actions under
      matchall classifiers, mlxsw would find the destination port
      associated with the offloaded action and utilize it for undoing
      the configuration.
      
      Depending on the order by which ports are removed, it's possible that
      the destination port would get removed before the source port.
      In such a scenario, when actions would be flushed for the source port
      mlxsw would perform an illegal dereference as the destination port is
      no longer listed.
      
      Since the only item necessary for undoing the configuration on the
      destination side is the port-id and that in turn is already maintained
      by mlxsw on the source-port, simply stop trying to access the
      destination port and use the port-id directly instead.
      
      Fixes: 763b4b70 ("mlxsw: spectrum: Add support in matchall mirror TC offloading")
      Signed-off-by: default avatarYuval Mintz <yuvalm@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f860ca54
    • Takashi Iwai's avatar
      ALSA: usx2y: Suppress kernel warning at page allocation failures · 065af12f
      Takashi Iwai authored
      commit 7682e399 upstream.
      
      The usx2y driver allocates the stream read/write buffers in continuous
      pages depending on the stream setup, and this may spew the kernel
      warning messages with a stack trace like:
        WARNING: CPU: 1 PID: 1846 at mm/page_alloc.c:3883
        __alloc_pages_slowpath+0x1ef2/0x2d70
        Modules linked in:
        CPU: 1 PID: 1846 Comm: kworker/1:2 Not tainted
        ....
      
      It may confuse user as if it were any serious error, although this is
      no fatal error and the driver handles the error case gracefully.
      Since the driver has already some sanity check of the given size (128
      and 256 pages), it can't pass any crazy value.  So it's merely page
      fragmentation.
      
      This patch adds __GFP_NOWARN to each caller for suppressing such
      kernel warnings.  The original issue was spotted by syzkaller.
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      065af12f
    • Takashi Sakamoto's avatar
      Revert "ALSA: echoaudio: purge contradictions between dimension matrix members... · 40e21932
      Takashi Sakamoto authored
      Revert "ALSA: echoaudio: purge contradictions between dimension matrix members and total number of members"
      
      commit 51db452d upstream.
      
      This reverts commit 275353bb to fix a regression which can abort
      'alsactl' program in alsa-utils due to assertion in alsa-lib.
      
      alsactl: control.c:2513: snd_ctl_elem_value_get_integer: Assertion `idx < sizeof(obj->value.integer.value) / sizeof(obj->value.integer.value[0])' failed.
      
      alsactl: control.c:2976: snd_ctl_elem_value_get_integer: Assertion `idx < ARRAY_SIZE(obj->value.integer.value)' failed.
      
      This commit is a band-aid. In a point of usage of ALSA control interface,
      the drivers still bring an issue that they prevent userspace applications
      to have a consistent way to parse each levels of the dimension information
      via ALSA control interface.
      
      Let me investigate this issue. Current implementation of the drivers
      have three control element sets with dimension information:
       * 'Monitor Mixer Volume' (type: integer)
       * 'VMixer Volume' (type: integer)
       * 'VU-meters' (type: boolean)
      
      Although the number of elements named as 'Monitor Mixer Volume' differs
      depending on drivers in this group, it can be calculated by macros
      defined by each driver (= (BX_NUM - BX_ANALOG_IN) * BX_ANALOG_IN). Each
      of the elements has one member for value and has dimension information
      with 2 levels (= BX_ANALOG_IN * (BX_NUM - BX_ANALOG_IN)). For these
      elements, userspace applications are expected to handle the dimension
      information so that all of the elements construct a matrix where the
      number of rows and columns are represented by the dimension information.
      
      The same way is applied to elements named as 'VMixer Volume'. The number
      of these elements can also be calculated by macros defined by each
      drivers (= PX_ANALOG_IN * BX_ANALOG_IN). Each of the element has one
      member for value and has dimension information with 2 levels
      (= BX_ANALOG_IN * PX_ANALOG_IN). All of the elements construct a matrix
      with the dimension information.
      
      An element named as 'VU-meters' gets a different way in a point of
      dimension information. The element includes 96 members for value. The
      element has dimension information with 3 levels (= 3 or 2 * 16 * 2). For
      this element, userspace applications are expected to handle the dimension
      information so that all of the members for value construct a matrix
      where the number of rows and columns are represented by the dimension
      information. This is different from the way for the former.
      
      As a summary, the drivers were not designed to produce a consistent way to
      parse the dimension information. This makes it hard for general userspace
      applications such as amixer to parse the information by a consistent way,
      and actually no userspace applications except for 'echomixer' utilize the
      dimension information. Additionally, no drivers excluding this group use
      the information.
      
      The reverted commit was written based on the latter way. A commit
      860c1994 ('ALSA: control: add dimension validator for userspace
      elements') is written based on the latter way, too. The patch should be
      reconsider too in the same time to re-define a consistent way to parse the
      dimension information.
      Reported-by: default avatarMark Hills <mark@xwax.org>
      Reported-by: default avatarS. Christian Collins <s.chriscollins@gmail.com>
      Fixes: 275353bb ('ALSA: echoaudio: purge contradictions between dimension matrix members and total number of members')
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      40e21932
    • Guneshwor Singh's avatar
      ALSA: compress: Remove unused variable · 984b6c96
      Guneshwor Singh authored
      commit a931b9ce upstream.
      
      Commit 04c5d5a4 ("ALSA: compress: Embed struct device") removed
      the statement that used 'str' but didn't remove the variable itself.
      So remove it.
      
      [Adding stable to Cc since pr_debug() may refer to the uninitialized
       buffer -- tiwai]
      
      Fixes: 04c5d5a4 ("ALSA: compress: Embed struct device")
      Signed-off-by: default avatarGuneshwor Singh <guneshwor.o.singh@intel.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      984b6c96
    • Casey Schaufler's avatar
      lsm: fix smack_inode_removexattr and xattr_getsecurity memleak · 88c195d6
      Casey Schaufler authored
      commit 57e7ba04 upstream.
      
      security_inode_getsecurity() provides the text string value
      of a security attribute. It does not provide a "secctx".
      The code in xattr_getsecurity() that calls security_inode_getsecurity()
      and then calls security_release_secctx() happened to work because
      SElinux and Smack treat the attribute and the secctx the same way.
      It fails for cap_inode_getsecurity(), because that module has no
      secctx that ever needs releasing. It turns out that Smack is the
      one that's doing things wrong by not allocating memory when instructed
      to do so by the "alloc" parameter.
      
      The fix is simple enough. Change the security_release_secctx() to
      kfree() because it isn't a secctx being returned by
      security_inode_getsecurity(). Change Smack to allocate the string when
      told to do so.
      
      Note: this also fixes memory leaks for LSMs which implement
      inode_getsecurity but not release_secctx, such as capabilities.
      Signed-off-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Reported-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88c195d6
    • Sergey Senozhatsky's avatar
      lib/ratelimit.c: use deferred printk() version · 1c089129
      Sergey Senozhatsky authored
      commit 656d61ce upstream.
      
      printk_ratelimit() invokes ___ratelimit() which may invoke a normal
      printk() (pr_warn() in this particular case) to warn about suppressed
      output.  Given that printk_ratelimit() may be called from anywhere, that
      pr_warn() is dangerous - it may end up deadlocking the system.  Fix
      ___ratelimit() by using deferred printk().
      
      Sasha reported the following lockdep error:
      
       : Unregister pv shared memory for cpu 8
       : select_fallback_rq: 3 callbacks suppressed
       : process 8583 (trinity-c78) no longer affine to cpu8
       :
       : ======================================================
       : WARNING: possible circular locking dependency detected
       : 4.14.0-rc2-next-20170927+ #252 Not tainted
       : ------------------------------------------------------
       : migration/8/62 is trying to acquire lock:
       : (&port_lock_key){-.-.}, at: serial8250_console_write()
       :
       : but task is already holding lock:
       : (&rq->lock){-.-.}, at: sched_cpu_dying()
       :
       : which lock already depends on the new lock.
       :
       :
       : the existing dependency chain (in reverse order) is:
       :
       : -> #3 (&rq->lock){-.-.}:
       : __lock_acquire()
       : lock_acquire()
       : _raw_spin_lock()
       : task_fork_fair()
       : sched_fork()
       : copy_process.part.31()
       : _do_fork()
       : kernel_thread()
       : rest_init()
       : start_kernel()
       : x86_64_start_reservations()
       : x86_64_start_kernel()
       : verify_cpu()
       :
       : -> #2 (&p->pi_lock){-.-.}:
       : __lock_acquire()
       : lock_acquire()
       : _raw_spin_lock_irqsave()
       : try_to_wake_up()
       : default_wake_function()
       : woken_wake_function()
       : __wake_up_common()
       : __wake_up_common_lock()
       : __wake_up()
       : tty_wakeup()
       : tty_port_default_wakeup()
       : tty_port_tty_wakeup()
       : uart_write_wakeup()
       : serial8250_tx_chars()
       : serial8250_handle_irq.part.25()
       : serial8250_default_handle_irq()
       : serial8250_interrupt()
       : __handle_irq_event_percpu()
       : handle_irq_event_percpu()
       : handle_irq_event()
       : handle_level_irq()
       : handle_irq()
       : do_IRQ()
       : ret_from_intr()
       : native_safe_halt()
       : default_idle()
       : arch_cpu_idle()
       : default_idle_call()
       : do_idle()
       : cpu_startup_entry()
       : rest_init()
       : start_kernel()
       : x86_64_start_reservations()
       : x86_64_start_kernel()
       : verify_cpu()
       :
       : -> #1 (&tty->write_wait){-.-.}:
       : __lock_acquire()
       : lock_acquire()
       : _raw_spin_lock_irqsave()
       : __wake_up_common_lock()
       : __wake_up()
       : tty_wakeup()
       : tty_port_default_wakeup()
       : tty_port_tty_wakeup()
       : uart_write_wakeup()
       : serial8250_tx_chars()
       : serial8250_handle_irq.part.25()
       : serial8250_default_handle_irq()
       : serial8250_interrupt()
       : __handle_irq_event_percpu()
       : handle_irq_event_percpu()
       : handle_irq_event()
       : handle_level_irq()
       : handle_irq()
       : do_IRQ()
       : ret_from_intr()
       : native_safe_halt()
       : default_idle()
       : arch_cpu_idle()
       : default_idle_call()
       : do_idle()
       : cpu_startup_entry()
       : rest_init()
       : start_kernel()
       : x86_64_start_reservations()
       : x86_64_start_kernel()
       : verify_cpu()
       :
       : -> #0 (&port_lock_key){-.-.}:
       : check_prev_add()
       : __lock_acquire()
       : lock_acquire()
       : _raw_spin_lock_irqsave()
       : serial8250_console_write()
       : univ8250_console_write()
       : console_unlock()
       : vprintk_emit()
       : vprintk_default()
       : vprintk_func()
       : printk()
       : ___ratelimit()
       : __printk_ratelimit()
       : select_fallback_rq()
       : sched_cpu_dying()
       : cpuhp_invoke_callback()
       : take_cpu_down()
       : multi_cpu_stop()
       : cpu_stopper_thread()
       : smpboot_thread_fn()
       : kthread()
       : ret_from_fork()
       :
       : other info that might help us debug this:
       :
       : Chain exists of:
       :   &port_lock_key --> &p->pi_lock --> &rq->lock
       :
       :  Possible unsafe locking scenario:
       :
       :        CPU0                    CPU1
       :        ----                    ----
       :   lock(&rq->lock);
       :                                lock(&p->pi_lock);
       :                                lock(&rq->lock);
       :   lock(&port_lock_key);
       :
       :  *** DEADLOCK ***
       :
       : 4 locks held by migration/8/62:
       : #0: (&p->pi_lock){-.-.}, at: sched_cpu_dying()
       : #1: (&rq->lock){-.-.}, at: sched_cpu_dying()
       : #2: (printk_ratelimit_state.lock){....}, at: ___ratelimit()
       : #3: (console_lock){+.+.}, at: vprintk_emit()
       :
       : stack backtrace:
       : CPU: 8 PID: 62 Comm: migration/8 Not tainted 4.14.0-rc2-next-20170927+ #252
       : Call Trace:
       : dump_stack()
       : print_circular_bug()
       : check_prev_add()
       : ? add_lock_to_list.isra.26()
       : ? check_usage()
       : ? kvm_clock_read()
       : ? kvm_sched_clock_read()
       : ? sched_clock()
       : ? check_preemption_disabled()
       : __lock_acquire()
       : ? __lock_acquire()
       : ? add_lock_to_list.isra.26()
       : ? debug_check_no_locks_freed()
       : ? memcpy()
       : lock_acquire()
       : ? serial8250_console_write()
       : _raw_spin_lock_irqsave()
       : ? serial8250_console_write()
       : serial8250_console_write()
       : ? serial8250_start_tx()
       : ? lock_acquire()
       : ? memcpy()
       : univ8250_console_write()
       : console_unlock()
       : ? __down_trylock_console_sem()
       : vprintk_emit()
       : vprintk_default()
       : vprintk_func()
       : printk()
       : ? show_regs_print_info()
       : ? lock_acquire()
       : ___ratelimit()
       : __printk_ratelimit()
       : select_fallback_rq()
       : sched_cpu_dying()
       : ? sched_cpu_starting()
       : ? rcutree_dying_cpu()
       : ? sched_cpu_starting()
       : cpuhp_invoke_callback()
       : ? cpu_disable_common()
       : take_cpu_down()
       : ? trace_hardirqs_off_caller()
       : ? cpuhp_invoke_callback()
       : multi_cpu_stop()
       : ? __this_cpu_preempt_check()
       : ? cpu_stop_queue_work()
       : cpu_stopper_thread()
       : ? cpu_stop_create()
       : smpboot_thread_fn()
       : ? sort_range()
       : ? schedule()
       : ? __kthread_parkme()
       : kthread()
       : ? sort_range()
       : ? kthread_create_on_node()
       : ret_from_fork()
       : process 9121 (trinity-c78) no longer affine to cpu8
       : smpboot: CPU 8 is now offline
      
      Link: http://lkml.kernel.org/r/20170928120405.18273-1-sergey.senozhatsky@gmail.com
      Fixes: 6b1d174b ("ratelimit: extend to print suppressed messages on release")
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reported-by: default avatarSasha Levin <levinsasha928@gmail.com>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c089129
    • Michal Hocko's avatar
      mm, oom_reaper: skip mm structs with mmu notifiers · 2b819707
      Michal Hocko authored
      commit 4d4bbd85 upstream.
      
      Andrea has noticed that the oom_reaper doesn't invalidate the range via
      mmu notifiers (mmu_notifier_invalidate_range_start/end) and that can
      corrupt the memory of the kvm guest for example.
      
      tlb_flush_mmu_tlbonly already invokes mmu notifiers but that is not
      sufficient as per Andrea:
      
       "mmu_notifier_invalidate_range cannot be used in replacement of
        mmu_notifier_invalidate_range_start/end. For KVM
        mmu_notifier_invalidate_range is a noop and rightfully so. A MMU
        notifier implementation has to implement either ->invalidate_range
        method or the invalidate_range_start/end methods, not both. And if you
        implement invalidate_range_start/end like KVM is forced to do, calling
        mmu_notifier_invalidate_range in common code is a noop for KVM.
      
        For those MMU notifiers that can get away only implementing
        ->invalidate_range, the ->invalidate_range is implicitly called by
        mmu_notifier_invalidate_range_end(). And only those secondary MMUs
        that share the same pagetable with the primary MMU (like AMD iommuv2)
        can get away only implementing ->invalidate_range"
      
      As the callback is allowed to sleep and the implementation is out of
      hand of the MM it is safer to simply bail out if there is an mmu
      notifier registered.  In order to not fail too early make the
      mm_has_notifiers check under the oom_lock and have a little nap before
      failing to give the current oom victim some more time to exit.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170913113427.2291-1-mhocko@kernel.org
      Fixes: aac45363 ("mm, oom: introduce oom reaper")
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b819707
    • Stefan Wahren's avatar
      staging: vchiq_2835_arm: Fix NULL ptr dereference in free_pagelist · 8a056a11
      Stefan Wahren authored
      commit 974d4d03 upstream.
      
      This fixes a NULL pointer dereference on RPi 2 with multi_v7_defconfig.
      The function page_address() could return NULL with enabled CONFIG_HIGHMEM.
      So fix this by using kmap() instead.
      Signed-off-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Fixes: 71bad7f0 ("staging: add bcm2708 vchiq driver")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a056a11
    • Andrey Konovalov's avatar
      uwb: ensure that endpoint is interrupt · 8928c5b2
      Andrey Konovalov authored
      commit 70e743e4 upstream.
      
      hwarc_neep_init() assumes that endpoint 0 is interrupt, but there's no
      check for that, which results in a WARNING in USB core code, when a bad
      USB descriptor is provided from a device:
      
      usb 1-1: BOGUS urb xfer, pipe 1 != type 3
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 3 at drivers/usb/core/urb.c:449 usb_submit_urb+0xf8a/0x11d0
      Modules linked in:
      CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.13.0+ #111
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Workqueue: usb_hub_wq hub_event
      task: ffff88006bdc1a00 task.stack: ffff88006bde8000
      RIP: 0010:usb_submit_urb+0xf8a/0x11d0 drivers/usb/core/urb.c:448
      RSP: 0018:ffff88006bdee3c0 EFLAGS: 00010282
      RAX: 0000000000000029 RBX: ffff8800672a7200 RCX: 0000000000000000
      RDX: 0000000000000029 RSI: ffff88006c815c78 RDI: ffffed000d7bdc6a
      RBP: ffff88006bdee4c0 R08: fffffbfff0fe00ff R09: fffffbfff0fe00ff
      R10: 0000000000000018 R11: fffffbfff0fe00fe R12: 1ffff1000d7bdc7f
      R13: 0000000000000003 R14: 0000000000000001 R15: ffff88006b02cc90
      FS:  0000000000000000(0000) GS:ffff88006c800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fe4daddf000 CR3: 000000006add6000 CR4: 00000000000006f0
      Call Trace:
       hwarc_neep_init+0x4ce/0x9c0 drivers/uwb/hwa-rc.c:710
       uwb_rc_add+0x2fb/0x730 drivers/uwb/lc-rc.c:361
       hwarc_probe+0x34e/0x9b0 drivers/uwb/hwa-rc.c:858
       usb_probe_interface+0x351/0x8d0 drivers/usb/core/driver.c:361
       really_probe drivers/base/dd.c:385
       driver_probe_device+0x610/0xa00 drivers/base/dd.c:529
       __device_attach_driver+0x230/0x290 drivers/base/dd.c:625
       bus_for_each_drv+0x15e/0x210 drivers/base/bus.c:463
       __device_attach+0x269/0x3c0 drivers/base/dd.c:682
       device_initial_probe+0x1f/0x30 drivers/base/dd.c:729
       bus_probe_device+0x1da/0x280 drivers/base/bus.c:523
       device_add+0xcf9/0x1640 drivers/base/core.c:1703
       usb_set_configuration+0x1064/0x1890 drivers/usb/core/message.c:1932
       generic_probe+0x73/0xe0 drivers/usb/core/generic.c:174
       usb_probe_device+0xaf/0xe0 drivers/usb/core/driver.c:266
       really_probe drivers/base/dd.c:385
       driver_probe_device+0x610/0xa00 drivers/base/dd.c:529
       __device_attach_driver+0x230/0x290 drivers/base/dd.c:625
       bus_for_each_drv+0x15e/0x210 drivers/base/bus.c:463
       __device_attach+0x269/0x3c0 drivers/base/dd.c:682
       device_initial_probe+0x1f/0x30 drivers/base/dd.c:729
       bus_probe_device+0x1da/0x280 drivers/base/bus.c:523
       device_add+0xcf9/0x1640 drivers/base/core.c:1703
       usb_new_device+0x7b8/0x1020 drivers/usb/core/hub.c:2457
       hub_port_connect drivers/usb/core/hub.c:4890
       hub_port_connect_change drivers/usb/core/hub.c:4996
       port_event drivers/usb/core/hub.c:5102
       hub_event+0x23c8/0x37c0 drivers/usb/core/hub.c:5182
       process_one_work+0x9fb/0x1570 kernel/workqueue.c:2097
       worker_thread+0x1e4/0x1350 kernel/workqueue.c:2231
       kthread+0x324/0x3f0 kernel/kthread.c:231
       ret_from_fork+0x25/0x30 arch/x86/entry/entry_64.S:425
      Code: 48 8b 85 30 ff ff ff 48 8d b8 98 00 00 00 e8 8e 93 07 ff 45 89
      e8 44 89 f1 4c 89 fa 48 89 c6 48 c7 c7 a0 e5 55 86 e8 20 08 8f fd <0f>
      ff e9 9b f7 ff ff e8 4a 04 d6 fd e9 80 f7 ff ff e8 60 11 a6
      ---[ end trace 55d741234124cfc3 ]---
      
      Check that endpoint is interrupt.
      
      Found by syzkaller.
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8928c5b2
    • Andrey Konovalov's avatar
      uwb: properly check kthread_run return value · 8ff7adb9
      Andrey Konovalov authored
      commit bbf26183 upstream.
      
      uwbd_start() calls kthread_run() and checks that the return value is
      not NULL. But the return value is not NULL in case kthread_run() fails,
      it takes the form of ERR_PTR(-EINTR).
      
      Use IS_ERR() instead.
      
      Also add a check to uwbd_stop().
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ff7adb9