1. 12 Oct, 2017 40 commits
    • Nicolai Stange's avatar
      driver core: platform: Don't read past the end of "driver_override" buffer · a97ca4f7
      Nicolai Stange authored
      commit bf563b01 upstream.
      
      When printing the driver_override parameter when it is 4095 and 4094 bytes
      long, the printing code would access invalid memory because we need count+1
      bytes for printing.
      
      Reject driver_override values of these lengths in driver_override_store().
      
      This is in close analogy to commit 4efe874a ("PCI: Don't read past the
      end of sysfs "driver_override" buffer") from Sasha Levin.
      
      Fixes: 3d713e0e ("driver core: platform: add device binding path 'driver_override'")
      Signed-off-by: default avatarNicolai Stange <nstange@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a97ca4f7
    • Mark Rutland's avatar
      percpu: make this_cpu_generic_read() atomic w.r.t. interrupts · fc3c6722
      Mark Rutland authored
      commit e88d62cd upstream.
      
      As raw_cpu_generic_read() is a plain read from a raw_cpu_ptr() address,
      it's possible (albeit unlikely) that the compiler will split the access
      across multiple instructions.
      
      In this_cpu_generic_read() we disable preemption but not interrupts
      before calling raw_cpu_generic_read(). Thus, an interrupt could be taken
      in the middle of the split load instructions. If a this_cpu_write() or
      RMW this_cpu_*() op is made to the same variable in the interrupt
      handling path, this_cpu_read() will return a torn value.
      
      For native word types, we can avoid tearing using READ_ONCE(), but this
      won't work in all cases (e.g. 64-bit types on most 32-bit platforms).
      This patch reworks this_cpu_generic_read() to use READ_ONCE() where
      possible, otherwise falling back to disabling interrupts.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pranith Kumar <bobby.prani@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc3c6722
    • Gustavo Romero's avatar
      powerpc/tm: Fix illegal TM state in signal handler · 6a988259
      Gustavo Romero authored
      commit 044215d1 upstream.
      
      Currently it's possible that on returning from the signal handler
      through the restore_tm_sigcontexts() code path (e.g. from a signal
      caught due to a `trap` instruction executed in the middle of an HTM
      block, or a deliberately constructed sigframe) an illegal TM state
      (like TS=10 TM=0, i.e. "T0") is set in SRR1 and when `rfid` sets
      implicitly the MSR register from SRR1 register on return to userspace
      it causes a TM Bad Thing exception.
      
      That illegal state can be set (a) by a malicious user that disables
      the TM bit by tweaking the bits in uc_mcontext before returning from
      the signal handler or (b) by a sufficient number of context switches
      occurring such that the load_tm counter overflows and TM is disabled
      whilst in the signal handler.
      
      This commit fixes the illegal TM state by ensuring that TM bit is
      always enabled before we return from restore_tm_sigcontexts(). A small
      comment correction is made as well.
      
      Fixes: 5d176f75 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
      Signed-off-by: default avatarGustavo Romero <gromero@linux.vnet.ibm.com>
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a988259
    • Cyril Bur's avatar
      powerpc/64s: Use emergency stack for kernel TM Bad Thing program checks · afebf5ef
      Cyril Bur authored
      commit 265e60a1 upstream.
      
      When using transactional memory (TM), the CPU can be in one of six
      states as far as TM is concerned, encoded in the Machine State
      Register (MSR). Certain state transitions are illegal and if attempted
      trigger a "TM Bad Thing" type program check exception.
      
      If we ever hit one of these exceptions it's treated as a bug, ie. we
      oops, and kill the process and/or panic, depending on configuration.
      
      One case where we can trigger a TM Bad Thing, is when returning to
      userspace after a system call or interrupt, using RFID. When this
      happens the CPU first restores the user register state, in particular
      r1 (the stack pointer) and then attempts to update the MSR. However
      the MSR update is not allowed and so we take the program check with
      the user register state, but the kernel MSR.
      
      This tricks the exception entry code into thinking we have a bad
      kernel stack pointer, because the MSR says we're coming from the
      kernel, but r1 is pointing to userspace.
      
      To avoid this we instead always switch to the emergency stack if we
      take a TM Bad Thing from the kernel. That way none of the user
      register values are used, other than for printing in the oops message.
      
      This is the fix for CVE-2017-1000255.
      
      Fixes: 5d176f75 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
      Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
      [mpe: Rewrite change log & comments, tweak asm slightly]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afebf5ef
    • Eric Dumazet's avatar
      socket, bpf: fix possible use after free · 02f7e410
      Eric Dumazet authored
      
      [ Upstream commit eefca20e ]
      
      Starting from linux-4.4, 3WHS no longer takes the listener lock.
      
      Since this time, we might hit a use-after-free in sk_filter_charge(),
      if the filter we got in the memcpy() of the listener content
      just happened to be replaced by a thread changing listener BPF filter.
      
      To fix this, we need to make sure the filter refcount is not already
      zero before incrementing it again.
      
      Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02f7e410
    • Nikolay Aleksandrov's avatar
      net: rtnetlink: fix info leak in RTM_GETSTATS call · 95206ea3
      Nikolay Aleksandrov authored
      
      [ Upstream commit ce024f42 ]
      
      When RTM_GETSTATS was added the fields of its header struct were not all
      initialized when returning the result thus leaking 4 bytes of information
      to user-space per rtnl_fill_statsinfo call, so initialize them now. Thanks
      to Alexander Potapenko for the detailed report and bisection.
      Reported-by: default avatarAlexander Potapenko <glider@google.com>
      Fixes: 10c9ead9 ("rtnetlink: add new RTM_GETSTATS message to dump link stats")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95206ea3
    • Parthasarathy Bhuvaragan's avatar
      tipc: use only positive error codes in messages · 58b1b840
      Parthasarathy Bhuvaragan authored
      
      [ Upstream commit aad06212 ]
      
      In commit e3a77561 ("tipc: split up function tipc_msg_eval()"),
      we have updated the function tipc_msg_lookup_dest() to set the error
      codes to negative values at destination lookup failures. Thus when
      the function sets the error code to -TIPC_ERR_NO_NAME, its inserted
      into the 4 bit error field of the message header as 0xf instead of
      TIPC_ERR_NO_NAME (1). The value 0xf is an unknown error code.
      
      In this commit, we set only positive error code.
      
      Fixes: e3a77561 ("tipc: split up function tipc_msg_eval()")
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58b1b840
    • Xin Long's avatar
      ip6_tunnel: update mtu properly for ARPHRD_ETHER tunnel device in tx path · 09788d46
      Xin Long authored
      
      [ Upstream commit d41bb33b ]
      
      Now when updating mtu in tx path, it doesn't consider ARPHRD_ETHER tunnel
      device, like ip6gre_tap tunnel, for which it should also subtract ether
      header to get the correct mtu.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09788d46
    • Xin Long's avatar
      ip6_gre: ip6gre_tap device should keep dst · ab4da56f
      Xin Long authored
      
      [ Upstream commit 2d40557c ]
      
      The patch 'ip_gre: ipgre_tap device should keep dst' fixed
      a issue that ipgre_tap mtu couldn't be updated in tx path.
      
      The same fix is needed for ip6gre_tap as well.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab4da56f
    • Jason A. Donenfeld's avatar
      netlink: do not proceed if dump's start() errs · b4a11925
      Jason A. Donenfeld authored
      
      [ Upstream commit fef0035c ]
      
      Drivers that use the start method for netlink dumping rely on dumpit not
      being called if start fails. For example, ila_xlat.c allocates memory
      and assigns it to cb->args[0] in its start() function. It might fail to
      do that and return -ENOMEM instead. However, even when returning an
      error, dumpit will be called, which, in the example above, quickly
      dereferences the memory in cb->args[0], which will OOPS the kernel. This
      is but one example of how this goes wrong.
      
      Since start() has always been a function with an int return type, it
      therefore makes sense to use it properly, rather than ignoring it. This
      patch thus returns early and does not call dumpit() when start() fails.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Reviewed-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4a11925
    • Christoph Paasch's avatar
      net: Set sk_prot_creator when cloning sockets to the right proto · cf2eaf16
      Christoph Paasch authored
      
      [ Upstream commit 9d538fa6 ]
      
      sk->sk_prot and sk->sk_prot_creator can differ when the app uses
      IPV6_ADDRFORM (transforming an IPv6-socket to an IPv4-one).
      Which is why sk_prot_creator is there to make sure that sk_prot_free()
      does the kmem_cache_free() on the right kmem_cache slab.
      
      Now, if such a socket gets transformed back to a listening socket (using
      connect() with AF_UNSPEC) we will allocate an IPv4 tcp_sock through
      sk_clone_lock() when a new connection comes in. But sk_prot_creator will
      still point to the IPv6 kmem_cache (as everything got copied in
      sk_clone_lock()). When freeing, we will thus put this
      memory back into the IPv6 kmem_cache although it was allocated in the
      IPv4 cache. I have seen memory corruption happening because of this.
      
      With slub-debugging and MEMCG_KMEM enabled this gives the warning
      	"cache_from_obj: Wrong slab cache. TCPv6 but object is from TCP"
      
      A C-program to trigger this:
      
      void main(void)
      {
              int fd = socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP);
              int new_fd, newest_fd, client_fd;
              struct sockaddr_in6 bind_addr;
              struct sockaddr_in bind_addr4, client_addr1, client_addr2;
              struct sockaddr unsp;
              int val;
      
              memset(&bind_addr, 0, sizeof(bind_addr));
              bind_addr.sin6_family = AF_INET6;
              bind_addr.sin6_port = ntohs(42424);
      
              memset(&client_addr1, 0, sizeof(client_addr1));
              client_addr1.sin_family = AF_INET;
              client_addr1.sin_port = ntohs(42424);
              client_addr1.sin_addr.s_addr = inet_addr("127.0.0.1");
      
              memset(&client_addr2, 0, sizeof(client_addr2));
              client_addr2.sin_family = AF_INET;
              client_addr2.sin_port = ntohs(42421);
              client_addr2.sin_addr.s_addr = inet_addr("127.0.0.1");
      
              memset(&unsp, 0, sizeof(unsp));
              unsp.sa_family = AF_UNSPEC;
      
              bind(fd, (struct sockaddr *)&bind_addr, sizeof(bind_addr));
      
              listen(fd, 5);
      
              client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
              connect(client_fd, (struct sockaddr *)&client_addr1, sizeof(client_addr1));
              new_fd = accept(fd, NULL, NULL);
              close(fd);
      
              val = AF_INET;
              setsockopt(new_fd, SOL_IPV6, IPV6_ADDRFORM, &val, sizeof(val));
      
              connect(new_fd, &unsp, sizeof(unsp));
      
              memset(&bind_addr4, 0, sizeof(bind_addr4));
              bind_addr4.sin_family = AF_INET;
              bind_addr4.sin_port = ntohs(42421);
              bind(new_fd, (struct sockaddr *)&bind_addr4, sizeof(bind_addr4));
      
              listen(new_fd, 5);
      
              client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
              connect(client_fd, (struct sockaddr *)&client_addr2, sizeof(client_addr2));
      
              newest_fd = accept(new_fd, NULL, NULL);
              close(new_fd);
      
              close(client_fd);
              close(new_fd);
      }
      
      As far as I can see, this bug has been there since the beginning of the
      git-days.
      Signed-off-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf2eaf16
    • Willem de Bruijn's avatar
      packet: only test po->has_vnet_hdr once in packet_snd · 24ee394a
      Willem de Bruijn authored
      
      [ Upstream commit da7c9561 ]
      
      Packet socket option po->has_vnet_hdr can be updated concurrently with
      other operations if no ring is attached.
      
      Do not test the option twice in packet_snd, as the value may change in
      between calls. A race on setsockopt disable may cause a packet > mtu
      to be sent without having GSO options set.
      
      Fixes: bfd5f4a3 ("packet: Add GSO/csum offload support.")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24ee394a
    • Willem de Bruijn's avatar
      packet: in packet_do_bind, test fanout with bind_lock held · 0f22167d
      Willem de Bruijn authored
      
      [ Upstream commit 4971613c ]
      
      Once a socket has po->fanout set, it remains a member of the group
      until it is destroyed. The prot_hook must be constant and identical
      across sockets in the group.
      
      If fanout_add races with packet_do_bind between the test of po->fanout
      and taking the lock, the bind call may make type or dev inconsistent
      with that of the fanout group.
      
      Hold po->bind_lock when testing po->fanout to avoid this race.
      
      I had to introduce artificial delay (local_bh_enable) to actually
      observe the race.
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f22167d
    • Florian Fainelli's avatar
      net: dsa: Fix network device registration order · 6eab1f82
      Florian Fainelli authored
      
      [ Upstream commit e804441c ]
      
      We cannot be registering the network device first, then setting its
      carrier off and finally connecting it to a PHY, doing that leaves a
      window during which the carrier is at best inconsistent, and at worse
      the device is not usable without a down/up sequence since the network
      device is visible to user space with possibly no PHY device attached.
      
      Re-order steps so that they make logical sense. This fixes some devices
      where the port was not usable after e.g: an unbind then bind of the
      driver.
      
      Fixes: 0071f56e ("dsa: Register netdev before phy")
      Fixes: 91da11f8 ("net: Distributed Switch Architecture protocol support")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6eab1f82
    • Alexander Potapenko's avatar
      tun: bail out from tun_get_user() if the skb is empty · b8990d2e
      Alexander Potapenko authored
      
      [ Upstream commit 2580c4c1 ]
      
      KMSAN (https://github.com/google/kmsan) reported accessing uninitialized
      skb->data[0] in the case the skb is empty (i.e. skb->len is 0):
      
      ================================================
      BUG: KMSAN: use of uninitialized memory in tun_get_user+0x19ba/0x3770
      CPU: 0 PID: 3051 Comm: probe Not tainted 4.13.0+ #3140
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
      ...
       __msan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:477
       tun_get_user+0x19ba/0x3770 drivers/net/tun.c:1301
       tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365
       call_write_iter ./include/linux/fs.h:1743
       new_sync_write fs/read_write.c:457
       __vfs_write+0x6c3/0x7f0 fs/read_write.c:470
       vfs_write+0x3e4/0x770 fs/read_write.c:518
       SYSC_write+0x12f/0x2b0 fs/read_write.c:565
       SyS_write+0x55/0x80 fs/read_write.c:557
       do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284
       entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:245
      ...
      origin:
      ...
       kmsan_poison_shadow+0x6e/0xc0 mm/kmsan/kmsan.c:211
       slab_alloc_node mm/slub.c:2732
       __kmalloc_node_track_caller+0x351/0x370 mm/slub.c:4351
       __kmalloc_reserve net/core/skbuff.c:138
       __alloc_skb+0x26a/0x810 net/core/skbuff.c:231
       alloc_skb ./include/linux/skbuff.h:903
       alloc_skb_with_frags+0x1d7/0xc80 net/core/skbuff.c:4756
       sock_alloc_send_pskb+0xabf/0xfe0 net/core/sock.c:2037
       tun_alloc_skb drivers/net/tun.c:1144
       tun_get_user+0x9a8/0x3770 drivers/net/tun.c:1274
       tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365
       call_write_iter ./include/linux/fs.h:1743
       new_sync_write fs/read_write.c:457
       __vfs_write+0x6c3/0x7f0 fs/read_write.c:470
       vfs_write+0x3e4/0x770 fs/read_write.c:518
       SYSC_write+0x12f/0x2b0 fs/read_write.c:565
       SyS_write+0x55/0x80 fs/read_write.c:557
       do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284
       return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:245
      ================================================
      
      Make sure tun_get_user() doesn't touch skb->data[0] unless there is
      actual data.
      
      C reproducer below:
      ==========================
          // autogenerated by syzkaller (http://github.com/google/syzkaller)
      
          #define _GNU_SOURCE
      
          #include <fcntl.h>
          #include <linux/if_tun.h>
          #include <netinet/ip.h>
          #include <net/if.h>
          #include <string.h>
          #include <sys/ioctl.h>
      
          int main()
          {
            int sock = socket(PF_INET, SOCK_STREAM, IPPROTO_IP);
            int tun_fd = open("/dev/net/tun", O_RDWR);
            struct ifreq req;
            memset(&req, 0, sizeof(struct ifreq));
            strcpy((char*)&req.ifr_name, "gre0");
            req.ifr_flags = IFF_UP | IFF_MULTICAST;
            ioctl(tun_fd, TUNSETIFF, &req);
            ioctl(sock, SIOCSIFFLAGS, "gre0");
            write(tun_fd, "hi", 0);
            return 0;
          }
      ==========================
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8990d2e
    • Sabrina Dubroca's avatar
      l2tp: fix race condition in l2tp_tunnel_delete · b4a9b12d
      Sabrina Dubroca authored
      
      [ Upstream commit 62b982ee ]
      
      If we try to delete the same tunnel twice, the first delete operation
      does a lookup (l2tp_tunnel_get), finds the tunnel, calls
      l2tp_tunnel_delete, which queues it for deletion by
      l2tp_tunnel_del_work.
      
      The second delete operation also finds the tunnel and calls
      l2tp_tunnel_delete. If the workqueue has already fired and started
      running l2tp_tunnel_del_work, then l2tp_tunnel_delete will queue the
      same tunnel a second time, and try to free the socket again.
      
      Add a dead flag to prevent firing the workqueue twice. Then we can
      remove the check of queue_work's result that was meant to prevent that
      race but doesn't.
      
      Reproducer:
      
          ip l2tp add tunnel tunnel_id 3000 peer_tunnel_id 4000 local 192.168.0.2 remote 192.168.0.1 encap udp udp_sport 5000 udp_dport 6000
          ip l2tp add session name l2tp1 tunnel_id 3000 session_id 1000 peer_session_id 2000
          ip link set l2tp1 up
          ip l2tp del tunnel tunnel_id 3000
          ip l2tp del tunnel tunnel_id 3000
      
      Fixes: f8ccac0e ("l2tp: put tunnel socket release on a workqueue")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4a9b12d
    • Ridge Kennedy's avatar
      l2tp: Avoid schedule while atomic in exit_net · e5941137
      Ridge Kennedy authored
      
      [ Upstream commit 12d656af ]
      
      While destroying a network namespace that contains a L2TP tunnel a
      "BUG: scheduling while atomic" can be observed.
      
      Enabling lockdep shows that this is happening because l2tp_exit_net()
      is calling l2tp_tunnel_closeall() (via l2tp_tunnel_delete()) from
      within an RCU critical section.
      
      l2tp_exit_net() takes rcu_read_lock_bh()
        << list_for_each_entry_rcu() >>
        l2tp_tunnel_delete()
          l2tp_tunnel_closeall()
            __l2tp_session_unhash()
              synchronize_rcu() << Illegal inside RCU critical section >>
      
      BUG: sleeping function called from invalid context
      in_atomic(): 1, irqs_disabled(): 0, pid: 86, name: kworker/u16:2
      INFO: lockdep is turned off.
      CPU: 2 PID: 86 Comm: kworker/u16:2 Tainted: G        W  O    4.4.6-at1 #2
      Hardware name: Xen HVM domU, BIOS 4.6.1-xs125300 05/09/2016
      Workqueue: netns cleanup_net
       0000000000000000 ffff880202417b90 ffffffff812b0013 ffff880202410ac0
       ffffffff81870de8 ffff880202417bb8 ffffffff8107aee8 ffffffff81870de8
       0000000000000c51 0000000000000000 ffff880202417be0 ffffffff8107b024
      Call Trace:
       [<ffffffff812b0013>] dump_stack+0x85/0xc2
       [<ffffffff8107aee8>] ___might_sleep+0x148/0x240
       [<ffffffff8107b024>] __might_sleep+0x44/0x80
       [<ffffffff810b21bd>] synchronize_sched+0x2d/0xe0
       [<ffffffff8109be6d>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffff8105c7bb>] ? __local_bh_enable_ip+0x6b/0xc0
       [<ffffffff816a1b00>] ? _raw_spin_unlock_bh+0x30/0x40
       [<ffffffff81667482>] __l2tp_session_unhash+0x172/0x220
       [<ffffffff81667397>] ? __l2tp_session_unhash+0x87/0x220
       [<ffffffff8166888b>] l2tp_tunnel_closeall+0x9b/0x140
       [<ffffffff81668c74>] l2tp_tunnel_delete+0x14/0x60
       [<ffffffff81668dd0>] l2tp_exit_net+0x110/0x270
       [<ffffffff81668d5c>] ? l2tp_exit_net+0x9c/0x270
       [<ffffffff815001c3>] ops_exit_list.isra.6+0x33/0x60
       [<ffffffff81501166>] cleanup_net+0x1b6/0x280
       ...
      
      This bug can easily be reproduced with a few steps:
      
       $ sudo unshare -n bash  # Create a shell in a new namespace
       # ip link set lo up
       # ip addr add 127.0.0.1 dev lo
       # ip l2tp add tunnel remote 127.0.0.1 local 127.0.0.1 tunnel_id 1 \
          peer_tunnel_id 1 udp_sport 50000 udp_dport 50000
       # ip l2tp add session name foo tunnel_id 1 session_id 1 \
          peer_session_id 1
       # ip link set foo up
       # exit  # Exit the shell, in turn exiting the namespace
       $ dmesg
       ...
       [942121.089216] BUG: scheduling while atomic: kworker/u16:3/13872/0x00000200
       ...
      
      To fix this, move the call to l2tp_tunnel_closeall() out of the RCU
      critical section, and instead call it from l2tp_tunnel_del_work(), which
      is running from the l2tp_wq workqueue.
      
      Fixes: 2b551c6e ("l2tp: close sessions before initiating tunnel delete")
      Signed-off-by: default avatarRidge Kennedy <ridge.kennedy@alliedtelesis.co.nz>
      Acked-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5941137
    • Alexey Kodanev's avatar
      vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit · 6689f835
      Alexey Kodanev authored
      
      [ Upstream commit 36f6ee22 ]
      
      When running LTP IPsec tests, KASan might report:
      
      BUG: KASAN: use-after-free in vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
      Read of size 4 at addr ffff880dc6ad1980 by task swapper/0/0
      ...
      Call Trace:
        <IRQ>
        dump_stack+0x63/0x89
        print_address_description+0x7c/0x290
        kasan_report+0x28d/0x370
        ? vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
        __asan_report_load4_noabort+0x19/0x20
        vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
        ? vti_init_net+0x190/0x190 [ip_vti]
        ? save_stack_trace+0x1b/0x20
        ? save_stack+0x46/0xd0
        dev_hard_start_xmit+0x147/0x510
        ? icmp_echo.part.24+0x1f0/0x210
        __dev_queue_xmit+0x1394/0x1c60
      ...
      Freed by task 0:
        save_stack_trace+0x1b/0x20
        save_stack+0x46/0xd0
        kasan_slab_free+0x70/0xc0
        kmem_cache_free+0x81/0x1e0
        kfree_skbmem+0xb1/0xe0
        kfree_skb+0x75/0x170
        kfree_skb_list+0x3e/0x60
        __dev_queue_xmit+0x1298/0x1c60
        dev_queue_xmit+0x10/0x20
        neigh_resolve_output+0x3a8/0x740
        ip_finish_output2+0x5c0/0xe70
        ip_finish_output+0x4ba/0x680
        ip_output+0x1c1/0x3a0
        xfrm_output_resume+0xc65/0x13d0
        xfrm_output+0x1e4/0x380
        xfrm4_output_finish+0x5c/0x70
      
      Can be fixed if we get skb->len before dst_output().
      
      Fixes: b9959fd3 ("vti: switch to new ip tunnel code")
      Fixes: 22e1b23d ("vti6: Support inter address family tunneling.")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6689f835
    • Timur Tabi's avatar
      net: qcom/emac: specify the correct size when mapping a DMA buffer · 852bdea5
      Timur Tabi authored
      
      [ Upstream commit a93ad944 ]
      
      When mapping the RX DMA buffers, the driver was accidentally specifying
      zero for the buffer length.  Under normal circumstances, SWIOTLB does not
      need to allocate a bounce buffer, so the address is just mapped without
      checking the size field.  This is why the error was not detected earlier.
      
      Fixes: b9b17deb ("net: emac: emac gigabit ethernet controller driver")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTimur Tabi <timur@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      852bdea5
    • Konstantin Khlebnikov's avatar
      net_sched: always reset qdisc backlog in qdisc_reset() · 5600c758
      Konstantin Khlebnikov authored
      
      [ Upstream commit c8e18129 ]
      
      SKB stored in qdisc->gso_skb also counted into backlog.
      
      Some qdiscs don't reset backlog to zero in ->reset(),
      for example sfq just dequeue and free all queued skb.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Fixes: 2ccccf5f ("net_sched: update hierarchical backlog too")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5600c758
    • Meng Xu's avatar
      isdn/i4l: fetch the ppp_write buffer in one shot · 93eef217
      Meng Xu authored
      
      [ Upstream commit 02388bf8 ]
      
      In isdn_ppp_write(), the header (i.e., protobuf) of the buffer is
      fetched twice from userspace. The first fetch is used to peek at the
      protocol of the message and reset the huptimer if necessary; while the
      second fetch copies in the whole buffer. However, given that buf resides
      in userspace memory, a user process can race to change its memory content
      across fetches. By doing so, we can either avoid resetting the huptimer
      for any type of packets (by first setting proto to PPP_LCP and later
      change to the actual type) or force resetting the huptimer for LCP
      packets.
      
      This patch changes this double-fetch behavior into two single fetches
      decided by condition (lp->isdn_device < 0 || lp->isdn_channel <0).
      A more detailed discussion can be found at
      https://marc.info/?l=linux-kernel&m=150586376926123&w=2Signed-off-by: default avatarMeng Xu <mengxu.gatech@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93eef217
    • Yonghong Song's avatar
      bpf: one perf event close won't free bpf program attached by another perf event · 0dee549f
      Yonghong Song authored
      
      [ Upstream commit ec9dd352 ]
      
      This patch fixes a bug exhibited by the following scenario:
        1. fd1 = perf_event_open with attr.config = ID1
        2. attach bpf program prog1 to fd1
        3. fd2 = perf_event_open with attr.config = ID1
           <this will be successful>
        4. user program closes fd2 and prog1 is detached from the tracepoint.
        5. user program with fd1 does not work properly as tracepoint
           no output any more.
      
      The issue happens at step 4. Multiple perf_event_open can be called
      successfully, but only one bpf prog pointer in the tp_event. In the
      current logic, any fd release for the same tp_event will free
      the tp_event->prog.
      
      The fix is to free tp_event->prog only when the closing fd
      corresponds to the one which registered the program.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0dee549f
    • Willem de Bruijn's avatar
      packet: hold bind lock when rebinding to fanout hook · 6f7cdd4a
      Willem de Bruijn authored
      
      [ Upstream commit 008ba2a1 ]
      
      Packet socket bind operations must hold the po->bind_lock. This keeps
      po->running consistent with whether the socket is actually on a ptype
      list to receive packets.
      
      fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then
      binds the fanout object to receive through packet_rcv_fanout.
      
      Make it hold the po->bind_lock when testing po->running and rebinding.
      Else, it can race with other rebind operations, such as that in
      packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates
      can result in a socket being added to a fanout group twice, causing
      use-after-free KASAN bug reports, among others.
      
      Reported independently by both trinity and syzkaller.
      Verified that the syzkaller reproducer passes after this patch.
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Reported-by: default avatarnixioaming <nixiaoming@huawei.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f7cdd4a
    • Christian Lamparter's avatar
      net: emac: Fix napi poll list corruption · 6eac2cd2
      Christian Lamparter authored
      
      [ Upstream commit f5595606 ]
      
      This patch is pretty much a carbon copy of
      commit 3079c652 ("caif: Fix napi poll list corruption")
      with "caif" replaced by "emac".
      
      The commit d75b1ade ("net: less interrupt masking in NAPI")
      breaks emac.
      
      It is now required that if the entire budget is consumed when poll
      returns, the napi poll_list must remain empty.  However, like some
      other drivers emac tries to do a last-ditch check and if there is
      more work it will call napi_reschedule and then immediately process
      some of this new work.  Should the entire budget be consumed while
      processing such new work then we will violate the new caller
      contract.
      
      This patch fixes this by not touching any work when we reschedule
      in emac.
      Signed-off-by: default avatarChristian Lamparter <chunkeey@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6eac2cd2
    • Eric Dumazet's avatar
      tcp: fastopen: fix on syn-data transmit failure · b463521d
      Eric Dumazet authored
      
      [ Upstream commit b5b7db8d ]
      
      Our recent change exposed a bug in TCP Fastopen Client that syzkaller
      found right away [1]
      
      When we prepare skb with SYN+DATA, we attempt to transmit it,
      and we update socket state as if the transmit was a success.
      
      In socket RTX queue we have two skbs, one with the SYN alone,
      and a second one containing the DATA.
      
      When (malicious) ACK comes in, we now complain that second one had no
      skb_mstamp.
      
      The proper fix is to make sure that if the transmit failed, we do not
      pretend we sent the DATA skb, and make it our send_head.
      
      When 3WHS completes, we can now send the DATA right away, without having
      to wait for a timeout.
      
      [1]
      WARNING: CPU: 0 PID: 100189 at net/ipv4/tcp_input.c:3117 tcp_clean_rtx_queue+0x2057/0x2ab0 net/ipv4/tcp_input.c:3117()
      
       WARN_ON_ONCE(last_ackt == 0);
      
      Modules linked in:
      CPU: 0 PID: 100189 Comm: syz-executor1 Not tainted
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       0000000000000000 ffff8800b35cb1d8 ffffffff81cad00d 0000000000000000
       ffffffff828a4347 ffff88009f86c080 ffffffff8316eb20 0000000000000d7f
       ffff8800b35cb220 ffffffff812c33c2 ffff8800baad2440 00000009d46575c0
      Call Trace:
       [<ffffffff81cad00d>] __dump_stack
       [<ffffffff81cad00d>] dump_stack+0xc1/0x124
       [<ffffffff812c33c2>] warn_slowpath_common+0xe2/0x150
       [<ffffffff812c361e>] warn_slowpath_null+0x2e/0x40
       [<ffffffff828a4347>] tcp_clean_rtx_queue+0x2057/0x2ab0 n
       [<ffffffff828ae6fd>] tcp_ack+0x151d/0x3930
       [<ffffffff828baa09>] tcp_rcv_state_process+0x1c69/0x4fd0
       [<ffffffff828efb7f>] tcp_v4_do_rcv+0x54f/0x7c0
       [<ffffffff8258aacb>] sk_backlog_rcv
       [<ffffffff8258aacb>] __release_sock+0x12b/0x3a0
       [<ffffffff8258ad9e>] release_sock+0x5e/0x1c0
       [<ffffffff8294a785>] inet_wait_for_connect
       [<ffffffff8294a785>] __inet_stream_connect+0x545/0xc50
       [<ffffffff82886f08>] tcp_sendmsg_fastopen
       [<ffffffff82886f08>] tcp_sendmsg+0x2298/0x35a0
       [<ffffffff82952515>] inet_sendmsg+0xe5/0x520
       [<ffffffff8257152f>] sock_sendmsg_nosec
       [<ffffffff8257152f>] sock_sendmsg+0xcf/0x110
      
      Fixes: 8c72c65b ("tcp: update skb->skb_mstamp more carefully")
      Fixes: 783237e8 ("net-tcp: Fast Open client - sending SYN-data")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b463521d
    • Davide Caratti's avatar
      net/sched: cls_matchall: fix crash when used with classful qdisc · b13bc543
      Davide Caratti authored
      
      [ Upstream commit 3ff4cbec ]
      
      this script, edited from Linux Advanced Routing and Traffic Control guide
      
      tc q a dev en0 root handle 1: htb default a
      tc c a dev en0 parent 1:  classid 1:1 htb rate 6mbit burst 15k
      tc c a dev en0 parent 1:1 classid 1:a htb rate 5mbit ceil 6mbit burst 15k
      tc c a dev en0 parent 1:1 classid 1:b htb rate 1mbit ceil 6mbit burst 15k
      tc f a dev en0 parent 1:0 prio 1 $clsname $clsargs classid 1:b
      ping $address -c1
      tc -s c s dev en0
      
      classifies traffic to 1:b or 1:a, depending on whether the packet matches
      or not the pattern $clsargs of filter $clsname. However, when $clsname is
      'matchall', a systematic crash can be observed in htb_classify(). HTB and
      classful qdiscs don't assign initial value to struct tcf_result, but then
      they expect it to contain valid values after filters have been run. Thus,
      current 'matchall' ignores the TCA_MATCHALL_CLASSID attribute, configured
      by user, and makes HTB (and classful qdiscs) dereference random pointers.
      
      By assigning head->res to *res in mall_classify(), before the actions are
      invoked, we fix this crash and enable TCA_MATCHALL_CLASSID functionality,
      that had no effect on 'matchall' classifier since its first introduction.
      
      BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1460213Reported-by: default avatarJiri Benc <jbenc@redhat.com>
      Fixes: b87f7936 ("net/sched: introduce Match-all classifier")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b13bc543
    • Xin Long's avatar
      ip6_tunnel: do not allow loading ip6_tunnel if ipv6 is disabled in cmdline · 13c8bd7a
      Xin Long authored
      
      [ Upstream commit 8c22dab0 ]
      
      If ipv6 has been disabled from cmdline since kernel started, it makes
      no sense to allow users to create any ip6 tunnel. Otherwise, it could
      some potential problem.
      
      Jianlin found a kernel crash caused by this in ip6_gre when he set
      ipv6.disable=1 in grub:
      
      [  209.588865] Unable to handle kernel paging request for data at address 0x00000080
      [  209.588872] Faulting instruction address: 0xc000000000a3aa6c
      [  209.588879] Oops: Kernel access of bad area, sig: 11 [#1]
      [  209.589062] NIP [c000000000a3aa6c] fib_rules_lookup+0x4c/0x260
      [  209.589071] LR [c000000000b9ad90] fib6_rule_lookup+0x50/0xb0
      [  209.589076] Call Trace:
      [  209.589097] fib6_rule_lookup+0x50/0xb0
      [  209.589106] rt6_lookup+0xc4/0x110
      [  209.589116] ip6gre_tnl_link_config+0x214/0x2f0 [ip6_gre]
      [  209.589125] ip6gre_newlink+0x138/0x3a0 [ip6_gre]
      [  209.589134] rtnl_newlink+0x798/0xb80
      [  209.589142] rtnetlink_rcv_msg+0xec/0x390
      [  209.589151] netlink_rcv_skb+0x138/0x150
      [  209.589159] rtnetlink_rcv+0x48/0x70
      [  209.589169] netlink_unicast+0x538/0x640
      [  209.589175] netlink_sendmsg+0x40c/0x480
      [  209.589184] ___sys_sendmsg+0x384/0x4e0
      [  209.589194] SyS_sendmsg+0xd4/0x140
      [  209.589201] SyS_socketcall+0x3e0/0x4f0
      [  209.589209] system_call+0x38/0xe0
      
      This patch is to return -EOPNOTSUPP in ip6_tunnel_init if ipv6 has been
      disabled from cmdline.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13c8bd7a
    • Fahad Kunnathadi's avatar
      net: phy: Fix mask value write on gmii2rgmii converter speed register · fc2fe7a0
      Fahad Kunnathadi authored
      
      [ Upstream commit f2654a47 ]
      
      To clear Speed Selection in MDIO control register(0x10),
      ie, clear bits 6 and 13 to zero while keeping other bits same.
      Before AND operation,The Mask value has to be perform with bitwise NOT
      operation (ie, ~ operator)
      
      This patch clears current speed selection before writing the
      new speed settings to gmii2rgmii converter
      
      Fixes: f411a616 ("net: phy: Add gmiitorgmii converter support")
      Signed-off-by: default avatarFahad Kunnathadi <fahad.kunnathadi@dexceldesigns.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc2fe7a0
    • Xin Long's avatar
      ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header · e814bae3
      Xin Long authored
      
      [ Upstream commit 76cc0d32 ]
      
      Now in ip6gre_header before packing the ipv6 header, it skb_push t->hlen
      which only includes encap_hlen + tun_hlen. It means greh and inner header
      would be over written by ipv6 stuff and ipv6h might have no chance to set
      up.
      
      Jianlin found this issue when using remote any on ip6_gre, the packets he
      captured on gre dev are truncated:
      
      22:50:26.210866 Out ethertype IPv6 (0x86dd), length 120: truncated-ip6 -\
      8128 bytes missing!(flowlabel 0x92f40, hlim 0, next-header Options (0)  \
      payload length: 8192) ::1:2000:0 > ::1:0:86dd: HBH [trunc] ip-proto-128 \
      8184
      
      It should also skb_push ipv6hdr so that ipv6h points to the right position
      to set ipv6 stuff up.
      
      This patch is to skb_push hlen + sizeof(*ipv6h) and also fix some indents
      in ip6gre_header.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e814bae3
    • Subash Abhinov Kasiviswanathan's avatar
      udpv6: Fix the checksum computation when HW checksum does not apply · f0a5af78
      Subash Abhinov Kasiviswanathan authored
      
      [ Upstream commit 63ecc3d9 ]
      
      While trying an ESP transport mode encryption for UDPv6 packets of
      datagram size 1436 with MTU 1500, checksum error was observed in
      the secondary fragment.
      
      This error occurs due to the UDP payload checksum being missed out
      when computing the full checksum for these packets in
      udp6_hwcsum_outgoing().
      
      Fixes: d39d938c ("ipv6: Introduce udpv6_send_skb()")
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0a5af78
    • Eric Dumazet's avatar
      tcp: fix data delivery rate · 85908cca
      Eric Dumazet authored
      
      [ Upstream commit fc225799 ]
      
      Now skb->mstamp_skb is updated later, we also need to call
      tcp_rate_skb_sent() after the update is done.
      
      Fixes: 8c72c65b ("tcp: update skb->skb_mstamp more carefully")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85908cca
    • Edward Cree's avatar
      bpf/verifier: reject BPF_ALU64|BPF_END · e159492b
      Edward Cree authored
      
      [ Upstream commit e67b8a68 ]
      
      Neither ___bpf_prog_run nor the JITs accept it.
      Also adds a new test case.
      
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e159492b
    • Eric Dumazet's avatar
      tcp: update skb->skb_mstamp more carefully · 186a9c5e
      Eric Dumazet authored
      
      [ Upstream commit 8c72c65b ]
      
      liujian reported a problem in TCP_USER_TIMEOUT processing with a patch
      in tcp_probe_timer() :
            https://www.spinics.net/lists/netdev/msg454496.html
      
      After investigations, the root cause of the problem is that we update
      skb->skb_mstamp of skbs in write queue, even if the attempt to send a
      clone or copy of it failed. One reason being a routing problem.
      
      This patch prevents this, solving liujian issue.
      
      It also removes a potential RTT miscalculation, since
      __tcp_retransmit_skb() is not OR-ing TCP_SKB_CB(skb)->sacked with
      TCPCB_EVER_RETRANS if a failure happens, but skb->skb_mstamp has
      been changed.
      
      A future ACK would then lead to a very small RTT sample and min_rtt
      would then be lowered to this too small value.
      
      Tested:
      
      # cat user_timeout.pkt
      --local_ip=192.168.102.64
      
          0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
         +0 bind(3, ..., ...) = 0
         +0 listen(3, 1) = 0
      
         +0 `ifconfig tun0 192.168.102.64/16; ip ro add 192.0.2.1 dev tun0`
      
         +0 < S 0:0(0) win 0 <mss 1460>
         +0 > S. 0:0(0) ack 1 <mss 1460>
      
        +.1 < . 1:1(0) ack 1 win 65530
         +0 accept(3, ..., ...) = 4
      
         +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
         +0 write(4, ..., 24) = 24
         +0 > P. 1:25(24) ack 1 win 29200
         +.1 < . 1:1(0) ack 25 win 65530
      
      //change the ipaddress
         +1 `ifconfig tun0 192.168.0.10/16`
      
         +1 write(4, ..., 24) = 24
         +1 write(4, ..., 24) = 24
         +1 write(4, ..., 24) = 24
         +1 write(4, ..., 24) = 24
      
         +0 `ifconfig tun0 192.168.102.64/16`
         +0 < . 1:2(1) ack 25 win 65530
         +0 `ifconfig tun0 192.168.0.10/16`
      
         +3 write(4, ..., 24) = -1
      
      # ./packetdrill user_timeout.pkt
      Signed-off-by: default avatarEric Dumazet <edumazet@googl.com>
      Reported-by: default avatarliujian <liujian56@huawei.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      186a9c5e
    • Dan Carpenter's avatar
      sctp: potential read out of bounds in sctp_ulpevent_type_enabled() · b70bb9bb
      Dan Carpenter authored
      
      [ Upstream commit fa5f7b51 ]
      
      This code causes a static checker warning because Smatch doesn't trust
      anything that comes from skb->data.  I've reviewed this code and I do
      think skb->data can be controlled by the user here.
      
      The sctp_event_subscribe struct has 13 __u8 fields and we want to see
      if ours is non-zero.  sn_type can be any value in the 0-USHRT_MAX range.
      We're subtracting SCTP_SN_TYPE_BASE which is 1 << 15 so we could read
      either before the start of the struct or after the end.
      
      This is a very old bug and it's surprising that it would go undetected
      for so long but my theory is that it just doesn't have a big impact so
      it would be hard to notice.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b70bb9bb
    • Jiri Pirko's avatar
      net: sched: fix use-after-free in tcf_action_destroy and tcf_del_walker · f86d3b1a
      Jiri Pirko authored
      
      [ Upstream commit 255cd50f ]
      
      Recent commit d7fb60b9 ("net_sched: get rid of tcfa_rcu") removed
      freeing in call_rcu, which changed already existing hard-to-hit
      race condition into 100% hit:
      
      [  598.599825] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
      [  598.607782] IP: tcf_action_destroy+0xc0/0x140
      
      Or:
      
      [   40.858924] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
      [   40.862840] IP: tcf_generic_walker+0x534/0x820
      
      Fix this by storing the ops and use them directly for module_put call.
      
      Fixes: a85a970a ("net_sched: move tc_action into tcf_common")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f86d3b1a
    • Yuval Mintz's avatar
      mlxsw: spectrum: Prevent mirred-related crash on removal · f860ca54
      Yuval Mintz authored
      
      [ Upstream commit 6399ebcc ]
      
      When removing the offloading of mirred actions under
      matchall classifiers, mlxsw would find the destination port
      associated with the offloaded action and utilize it for undoing
      the configuration.
      
      Depending on the order by which ports are removed, it's possible that
      the destination port would get removed before the source port.
      In such a scenario, when actions would be flushed for the source port
      mlxsw would perform an illegal dereference as the destination port is
      no longer listed.
      
      Since the only item necessary for undoing the configuration on the
      destination side is the port-id and that in turn is already maintained
      by mlxsw on the source-port, simply stop trying to access the
      destination port and use the port-id directly instead.
      
      Fixes: 763b4b70 ("mlxsw: spectrum: Add support in matchall mirror TC offloading")
      Signed-off-by: default avatarYuval Mintz <yuvalm@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f860ca54
    • Takashi Iwai's avatar
      ALSA: usx2y: Suppress kernel warning at page allocation failures · 065af12f
      Takashi Iwai authored
      commit 7682e399 upstream.
      
      The usx2y driver allocates the stream read/write buffers in continuous
      pages depending on the stream setup, and this may spew the kernel
      warning messages with a stack trace like:
        WARNING: CPU: 1 PID: 1846 at mm/page_alloc.c:3883
        __alloc_pages_slowpath+0x1ef2/0x2d70
        Modules linked in:
        CPU: 1 PID: 1846 Comm: kworker/1:2 Not tainted
        ....
      
      It may confuse user as if it were any serious error, although this is
      no fatal error and the driver handles the error case gracefully.
      Since the driver has already some sanity check of the given size (128
      and 256 pages), it can't pass any crazy value.  So it's merely page
      fragmentation.
      
      This patch adds __GFP_NOWARN to each caller for suppressing such
      kernel warnings.  The original issue was spotted by syzkaller.
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      065af12f
    • Takashi Sakamoto's avatar
      Revert "ALSA: echoaudio: purge contradictions between dimension matrix members... · 40e21932
      Takashi Sakamoto authored
      Revert "ALSA: echoaudio: purge contradictions between dimension matrix members and total number of members"
      
      commit 51db452d upstream.
      
      This reverts commit 275353bb to fix a regression which can abort
      'alsactl' program in alsa-utils due to assertion in alsa-lib.
      
      alsactl: control.c:2513: snd_ctl_elem_value_get_integer: Assertion `idx < sizeof(obj->value.integer.value) / sizeof(obj->value.integer.value[0])' failed.
      
      alsactl: control.c:2976: snd_ctl_elem_value_get_integer: Assertion `idx < ARRAY_SIZE(obj->value.integer.value)' failed.
      
      This commit is a band-aid. In a point of usage of ALSA control interface,
      the drivers still bring an issue that they prevent userspace applications
      to have a consistent way to parse each levels of the dimension information
      via ALSA control interface.
      
      Let me investigate this issue. Current implementation of the drivers
      have three control element sets with dimension information:
       * 'Monitor Mixer Volume' (type: integer)
       * 'VMixer Volume' (type: integer)
       * 'VU-meters' (type: boolean)
      
      Although the number of elements named as 'Monitor Mixer Volume' differs
      depending on drivers in this group, it can be calculated by macros
      defined by each driver (= (BX_NUM - BX_ANALOG_IN) * BX_ANALOG_IN). Each
      of the elements has one member for value and has dimension information
      with 2 levels (= BX_ANALOG_IN * (BX_NUM - BX_ANALOG_IN)). For these
      elements, userspace applications are expected to handle the dimension
      information so that all of the elements construct a matrix where the
      number of rows and columns are represented by the dimension information.
      
      The same way is applied to elements named as 'VMixer Volume'. The number
      of these elements can also be calculated by macros defined by each
      drivers (= PX_ANALOG_IN * BX_ANALOG_IN). Each of the element has one
      member for value and has dimension information with 2 levels
      (= BX_ANALOG_IN * PX_ANALOG_IN). All of the elements construct a matrix
      with the dimension information.
      
      An element named as 'VU-meters' gets a different way in a point of
      dimension information. The element includes 96 members for value. The
      element has dimension information with 3 levels (= 3 or 2 * 16 * 2). For
      this element, userspace applications are expected to handle the dimension
      information so that all of the members for value construct a matrix
      where the number of rows and columns are represented by the dimension
      information. This is different from the way for the former.
      
      As a summary, the drivers were not designed to produce a consistent way to
      parse the dimension information. This makes it hard for general userspace
      applications such as amixer to parse the information by a consistent way,
      and actually no userspace applications except for 'echomixer' utilize the
      dimension information. Additionally, no drivers excluding this group use
      the information.
      
      The reverted commit was written based on the latter way. A commit
      860c1994 ('ALSA: control: add dimension validator for userspace
      elements') is written based on the latter way, too. The patch should be
      reconsider too in the same time to re-define a consistent way to parse the
      dimension information.
      Reported-by: default avatarMark Hills <mark@xwax.org>
      Reported-by: default avatarS. Christian Collins <s.chriscollins@gmail.com>
      Fixes: 275353bb ('ALSA: echoaudio: purge contradictions between dimension matrix members and total number of members')
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      40e21932
    • Guneshwor Singh's avatar
      ALSA: compress: Remove unused variable · 984b6c96
      Guneshwor Singh authored
      commit a931b9ce upstream.
      
      Commit 04c5d5a4 ("ALSA: compress: Embed struct device") removed
      the statement that used 'str' but didn't remove the variable itself.
      So remove it.
      
      [Adding stable to Cc since pr_debug() may refer to the uninitialized
       buffer -- tiwai]
      
      Fixes: 04c5d5a4 ("ALSA: compress: Embed struct device")
      Signed-off-by: default avatarGuneshwor Singh <guneshwor.o.singh@intel.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      984b6c96
    • Casey Schaufler's avatar
      lsm: fix smack_inode_removexattr and xattr_getsecurity memleak · 88c195d6
      Casey Schaufler authored
      commit 57e7ba04 upstream.
      
      security_inode_getsecurity() provides the text string value
      of a security attribute. It does not provide a "secctx".
      The code in xattr_getsecurity() that calls security_inode_getsecurity()
      and then calls security_release_secctx() happened to work because
      SElinux and Smack treat the attribute and the secctx the same way.
      It fails for cap_inode_getsecurity(), because that module has no
      secctx that ever needs releasing. It turns out that Smack is the
      one that's doing things wrong by not allocating memory when instructed
      to do so by the "alloc" parameter.
      
      The fix is simple enough. Change the security_release_secctx() to
      kfree() because it isn't a secctx being returned by
      security_inode_getsecurity(). Change Smack to allocate the string when
      told to do so.
      
      Note: this also fixes memory leaks for LSMs which implement
      inode_getsecurity but not release_secctx, such as capabilities.
      Signed-off-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Reported-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88c195d6