1. 11 Mar, 2018 1 commit
    • Daniel Borkmann's avatar
      bpf: fix mlock precharge on arraymaps · d9fd73c6
      Daniel Borkmann authored
      [ upstream commit 9c2d63b8 ]
      
      syzkaller recently triggered OOM during percpu map allocation;
      while there is work in progress by Dennis Zhou to add __GFP_NORETRY
      semantics for percpu allocator under pressure, there seems also a
      missing bpf_map_precharge_memlock() check in array map allocation.
      
      Given today the actual bpf_map_charge_memlock() happens after the
      find_and_alloc_map() in syscall path, the bpf_map_precharge_memlock()
      is there to bail out early before we go and do the map setup work
      when we find that we hit the limits anyway. Therefore add this for
      array map as well.
      
      Fixes: 6c905981 ("bpf: pre-allocate hash map elements")
      Fixes: a10423b8 ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map")
      Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Dennis Zhou <dennisszhou@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9fd73c6
  2. 09 Mar, 2018 39 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.14.25 · 8773f9bf
      Greg Kroah-Hartman authored
      8773f9bf
    • Sagi Grimberg's avatar
      nvme-rdma: don't suppress send completions · df11c226
      Sagi Grimberg authored
      commit b4b591c8 upstream.
      
      The entire completions suppress mechanism is currently broken because the
      HCA might retry a send operation (due to dropped ack) after the nvme
      transaction has completed.
      
      In order to handle this, we signal all send completions and introduce a
      separate done handler for async events as they will be handled differently
      (as they don't include in-capsule data by definition).
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      df11c226
    • NeilBrown's avatar
      md: only allow remove_and_add_spares when no sync_thread running. · 9474d8fa
      NeilBrown authored
      commit 39772f0a upstream.
      
      The locking protocols in md assume that a device will
      never be removed from an array during resync/recovery/reshape.
      When that isn't happening, rcu or reconfig_mutex is needed
      to protect an rdev pointer while taking a refcount.  When
      it is happening, that protection isn't needed.
      
      Unfortunately there are cases were remove_and_add_spares() is
      called when recovery might be happening: is state_store(),
      slot_store() and hot_remove_disk().
      In each case, this is just an optimization, to try to expedite
      removal from the personality so the device can be removed from
      the array.  If resync etc is happening, we just have to wait
      for md_check_recover to find a suitable time to call
      remove_and_add_spares().
      
      This optimization and not essential so it doesn't
      matter if it fails.
      So change remove_and_add_spares() to abort early if
      resync/recovery/reshape is happening, unless it is called
      from md_check_recovery() as part of a newly started recovery.
      The parameter "this" is only NULL when called from
      md_check_recovery() so when it is NULL, there is no need to abort.
      
      As this can result in a NULL dereference, the fix is suitable
      for -stable.
      
      cc: yuyufen <yuyufen@huawei.com>
      Cc: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
      Fixes: 8430e7e0 ("md: disconnect device from personality before trying to remove it.")
      Cc: stable@ver.kernel.org (v4.8+)
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarShaohua Li <sh.li@alibaba-inc.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9474d8fa
    • Adam Ford's avatar
      ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux · 4df591f7
      Adam Ford authored
      commit 74402055 upstream.
      
      The pinmuxing was missing for I2C1 which was causing intermittent issues
      with the PMIC which is connected to I2C1.  The bootloader did not quite
      configure the I2C1 either, so when running at 2.6MHz, it was generating
      errors at time.
      
      This correctly sets the I2C1 pinmuxing so it can operate at 2.6MHz
      
      Fixes: 687c2767 ("ARM: dts: Add minimal support for LogicPD Torpedo
      DM3730 devkit")
      Signed-off-by: default avatarAdam Ford <aford173@gmail.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4df591f7
    • Adam Ford's avatar
      ARM: dts: LogicPD SOM-LV: Fix I2C1 pinmux · 2b844657
      Adam Ford authored
      commit 84c7efd6 upstream.
      
      The pinmuxing was missing for I2C1 which was causing intermittent issues
      with the PMIC which is connected to I2C1.  The bootloader did not quite
      configure the I2C1 either, so when running at 2.6MHz, it was generating
      errors at times.
      
      This correctly sets the I2C1 pinmuxing so it can operate at 2.6MHz
      
      Fixes: ab8dd3ae ("ARM: DTS: Add minimal Support for Logic PD DM3730
      SOM-LV")
      Signed-off-by: default avatarAdam Ford <aford173@gmail.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b844657
    • Kai Heng Feng's avatar
      ACPI / bus: Parse tables as term_list for Dell XPS 9570 and Precision M5530 · b2190cc3
      Kai Heng Feng authored
      commit 36904703 upstream.
      
      The i2c touchpad on Dell XPS 9570 and Precision M5530 doesn't work out
      of box.
      
      The touchpad relies on its _INI method to update its _HID value from
      XXXX0000 to SYNA2393.
      
      Also, the _STA relies on value of I2CN to report correct status.
      
      Set acpi_gbl_parse_table_as_term_list so the value of I2CN can be
      correctly set up, and _INI can get run. The ACPI table in this machine
      is designed to get parsed this way.
      
      Also, change the quirk table to a more generic name.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=198515Signed-off-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2190cc3
    • Eric Biggers's avatar
      KVM/x86: remove WARN_ON() for when vm_munmap() fails · b95f8ca8
      Eric Biggers authored
      commit 103c763c upstream.
      
      On x86, special KVM memslots such as the TSS region have anonymous
      memory mappings created on behalf of userspace, and these mappings are
      removed when the VM is destroyed.
      
      It is however possible for removing these mappings via vm_munmap() to
      fail.  This can most easily happen if the thread receives SIGKILL while
      it's waiting to acquire ->mmap_sem.   This triggers the 'WARN_ON(r < 0)'
      in __x86_set_memory_region().  syzkaller was able to hit this, using
      'exit()' to send the SIGKILL.  Note that while the vm_munmap() failure
      results in the mapping not being removed immediately, it is not leaked
      forever but rather will be freed when the process exits.
      
      It's not really possible to handle this failure properly, so almost
      every other caller of vm_munmap() doesn't check the return value.  It's
      a limitation of having the kernel manage these mappings rather than
      userspace.
      
      So just remove the WARN_ON() so that users can't spam the kernel log
      with this warning.
      
      Fixes: f0d648bd ("KVM: x86: map/unmap private slots in __x86_set_memory_region")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarJack Wang <jinpu.wang@profitbricks.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b95f8ca8
    • Tianyu Lan's avatar
      KVM/x86: Fix wrong macro references of X86_CR0_PG_BIT and X86_CR4_PAE_BIT in kvm_valid_sregs() · 61546237
      Tianyu Lan authored
      commit 37b95951 upstream.
      
      kvm_valid_sregs() should use X86_CR0_PG and X86_CR4_PAE to check bit
      status rather than X86_CR0_PG_BIT and X86_CR4_PAE_BIT. This patch is
      to fix it.
      
      Fixes: f2981033(KVM/x86: Check input paging mode when cs.l is set)
      Reported-by: default avatarJeremi Piotrowski <jeremi.piotrowski@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarTianyu Lan <Tianyu.Lan@microsoft.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarJack Wang <jinpu.wang@profitbricks.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61546237
    • Ard Biesheuvel's avatar
      PCI/ASPM: Deal with missing root ports in link state handling · db98acd6
      Ard Biesheuvel authored
      commit ee8bdfb6 upstream.
      
      Even though it is unconventional, some PCIe host implementations omit the
      root ports entirely, and simply consist of a host bridge (which is not
      modeled as a device in the PCI hierarchy) and a link.
      
      When the downstream device is an endpoint, our current code does not seem
      to mind this unusual configuration. However, when PCIe switches are
      involved, the ASPM code assumes that any downstream switch port has a
      parent, and blindly dereferences the bus->parent->self field of the pci_dev
      struct to chain the downstream link state to the link state of the root
      port. Given that the root port is missing, the link is not modeled at all,
      and nor is the link state, and attempting to access it results in a NULL
      pointer dereference and a crash.
      
      Avoid this by allowing the link state chain to terminate at the downstream
      port if no root port exists.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db98acd6
    • Radim Krčmář's avatar
      KVM: x86: fix vcpu initialization with userspace lapic · b4830f3a
      Radim Krčmář authored
      commit b7e31be3 upstream.
      
      Moving the code around broke this rare configuration.
      Use this opportunity to finally call lapic reset from vcpu reset.
      
      Reported-by: syzbot+fb7a33a4b6c35007a72b@syzkaller.appspotmail.com
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Fixes: 0b2e9904 ("KVM: x86: move LAPIC initialization after VMCS creation")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4830f3a
    • Paolo Bonzini's avatar
      KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR path as unlikely() · 1f17daea
      Paolo Bonzini authored
      commit 946fbbc1 upstream.
      
      vmx_vcpu_run() and svm_vcpu_run() are large functions, and giving
      branch hints to the compiler can actually make a substantial cycle
      difference by keeping the fast path contiguous in memory.
      
      With this optimization, the retpoline-guest/retpoline-host case is
      about 50 cycles faster.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: KarimAllah Ahmed <karahmed@amazon.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180222154318.20361-3-pbonzini@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f17daea
    • Paolo Bonzini's avatar
      KVM: x86: move LAPIC initialization after VMCS creation · 03d62460
      Paolo Bonzini authored
      commit 0b2e9904 upstream.
      
      The initial reset of the local APIC is performed before the VMCS has been
      created, but it tries to do a vmwrite:
      
       vmwrite error: reg 810 value 4a00 (err 18944)
       CPU: 54 PID: 38652 Comm: qemu-kvm Tainted: G        W I      4.16.0-0.rc2.git0.1.fc28.x86_64 #1
       Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0003.090520141303 09/05/2014
       Call Trace:
        vmx_set_rvi [kvm_intel]
        vmx_hwapic_irr_update [kvm_intel]
        kvm_lapic_reset [kvm]
        kvm_create_lapic [kvm]
        kvm_arch_vcpu_init [kvm]
        kvm_vcpu_init [kvm]
        vmx_create_vcpu [kvm_intel]
        kvm_vm_ioctl [kvm]
      
      Move it later, after the VMCS has been created.
      
      Fixes: 4191db26 ("KVM: x86: Update APICv on APIC reset")
      Cc: stable@vger.kernel.org
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03d62460
    • Paolo Bonzini's avatar
      KVM/x86: Remove indirect MSR op calls from SPEC_CTRL · 0d62a56d
      Paolo Bonzini authored
      commit ecb586bd upstream.
      
      Having a paravirt indirect call in the IBRS restore path is not a
      good idea, since we are trying to protect from speculative execution
      of bogus indirect branch targets.  It is also slower, so use
      native_wrmsrl() on the vmentry path too.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: KarimAllah Ahmed <karahmed@amazon.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: d28b387f
      Link: http://lkml.kernel.org/r/20180222154318.20361-2-pbonzini@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d62a56d
    • Wanpeng Li's avatar
      KVM: mmu: Fix overlap between public and private memslots · 7135aaf3
      Wanpeng Li authored
      commit b28676bb upstream.
      
      Reported by syzkaller:
      
          pte_list_remove: ffff9714eb1f8078 0->BUG
          ------------[ cut here ]------------
          kernel BUG at arch/x86/kvm/mmu.c:1157!
          invalid opcode: 0000 [#1] SMP
          RIP: 0010:pte_list_remove+0x11b/0x120 [kvm]
          Call Trace:
           drop_spte+0x83/0xb0 [kvm]
           mmu_page_zap_pte+0xcc/0xe0 [kvm]
           kvm_mmu_prepare_zap_page+0x81/0x4a0 [kvm]
           kvm_mmu_invalidate_zap_all_pages+0x159/0x220 [kvm]
           kvm_arch_flush_shadow_all+0xe/0x10 [kvm]
           kvm_mmu_notifier_release+0x6c/0xa0 [kvm]
           ? kvm_mmu_notifier_release+0x5/0xa0 [kvm]
           __mmu_notifier_release+0x79/0x110
           ? __mmu_notifier_release+0x5/0x110
           exit_mmap+0x15a/0x170
           ? do_exit+0x281/0xcb0
           mmput+0x66/0x160
           do_exit+0x2c9/0xcb0
           ? __context_tracking_exit.part.5+0x4a/0x150
           do_group_exit+0x50/0xd0
           SyS_exit_group+0x14/0x20
           do_syscall_64+0x73/0x1f0
           entry_SYSCALL64_slow_path+0x25/0x25
      
      The reason is that when creates new memslot, there is no guarantee for new
      memslot not overlap with private memslots. This can be triggered by the
      following program:
      
         #include <fcntl.h>
         #include <pthread.h>
         #include <setjmp.h>
         #include <signal.h>
         #include <stddef.h>
         #include <stdint.h>
         #include <stdio.h>
         #include <stdlib.h>
         #include <string.h>
         #include <sys/ioctl.h>
         #include <sys/stat.h>
         #include <sys/syscall.h>
         #include <sys/types.h>
         #include <unistd.h>
         #include <linux/kvm.h>
      
         long r[16];
      
         int main()
         {
      	void *p = valloc(0x4000);
      
      	r[2] = open("/dev/kvm", 0);
      	r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul);
      
      	uint64_t addr = 0xf000;
      	ioctl(r[3], KVM_SET_IDENTITY_MAP_ADDR, &addr);
      	r[6] = ioctl(r[3], KVM_CREATE_VCPU, 0x0ul);
      	ioctl(r[3], KVM_SET_TSS_ADDR, 0x0ul);
      	ioctl(r[6], KVM_RUN, 0);
      	ioctl(r[6], KVM_RUN, 0);
      
      	struct kvm_userspace_memory_region mr = {
      		.slot = 0,
      		.flags = KVM_MEM_LOG_DIRTY_PAGES,
      		.guest_phys_addr = 0xf000,
      		.memory_size = 0x4000,
      		.userspace_addr = (uintptr_t) p
      	};
      	ioctl(r[3], KVM_SET_USER_MEMORY_REGION, &mr);
      	return 0;
         }
      
      This patch fixes the bug by not adding a new memslot even if it
      overlaps with private memslots.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      7135aaf3
    • Wanpeng Li's avatar
      KVM: X86: Fix SMRAM accessing even if VM is shutdown · 1ebf9ab6
      Wanpeng Li authored
      commit 95e057e2 upstream.
      
      Reported by syzkaller:
      
         WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
         CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4
         RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
         Call Trace:
          vmx_handle_exit+0xbd/0xe20 [kvm_intel]
          kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm]
          kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
          do_vfs_ioctl+0xa4/0x6a0
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x25/0x9c
      
      The testcase creates a first thread to issue KVM_SMI ioctl, and then creates
      a second thread to mmap and operate on the same vCPU.  This triggers a race
      condition when running the testcase with multiple threads. Sometimes one thread
      exits with a triple fault while another thread mmaps and operates on the same
      vCPU.  Because CS=0x3000/IP=0x8000 is not mapped, accessing the SMI handler
      results in an EPT misconfig. This patch fixes it by returning RET_PF_EMULATE
      in kvm_handle_bad_page(), which will go on to cause an emulation failure and an
      exit with KVM_EXIT_INTERNAL_ERROR.
      
      Reported-by: syzbot+c1d9517cab094dae65e446c0c5b4de6c40f4dc58@syzkaller.appspotmail.com
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ebf9ab6
    • Paolo Bonzini's avatar
      KVM: x86: extend usage of RET_MMIO_PF_* constants · f925158c
      Paolo Bonzini authored
      commit 9b8ebbdb upstream.
      
      The x86 MMU if full of code that returns 0 and 1 for retry/emulate.  Use
      the existing RET_MMIO_PF_RETRY/RET_MMIO_PF_EMULATE enum, renaming it to
      drop the MMIO part.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Thomas Backlund <tmb@mageia.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f925158c
    • Arnd Bergmann's avatar
      ARM: kvm: fix building with gcc-8 · e0c7b2b1
      Arnd Bergmann authored
      commit 67870eb1 upstream.
      
      In banked-sr.c, we use a top-level '__asm__(".arch_extension virt")'
      statement to allow compilation of a multi-CPU kernel for ARMv6
      and older ARMv7-A that don't normally support access to the banked
      registers.
      
      This is considered to be a programming error by the gcc developers
      and will no longer work in gcc-8, where we now get a build error:
      
      /tmp/cc4Qy7GR.s:34: Error: Banked registers are not available with this architecture. -- `mrs r3,SP_usr'
      /tmp/cc4Qy7GR.s:41: Error: Banked registers are not available with this architecture. -- `mrs r3,ELR_hyp'
      /tmp/cc4Qy7GR.s:55: Error: Banked registers are not available with this architecture. -- `mrs r3,SP_svc'
      /tmp/cc4Qy7GR.s:62: Error: Banked registers are not available with this architecture. -- `mrs r3,LR_svc'
      /tmp/cc4Qy7GR.s:69: Error: Banked registers are not available with this architecture. -- `mrs r3,SPSR_svc'
      /tmp/cc4Qy7GR.s:76: Error: Banked registers are not available with this architecture. -- `mrs r3,SP_abt'
      
      Passign the '-march-armv7ve' flag to gcc works, and is ok here, because
      we know the functions won't ever be called on pre-ARMv7VE machines.
      Unfortunately, older compiler versions (4.8 and earlier) do not understand
      that flag, so we still need to keep the asm around.
      
      Backporting to stable kernels (4.6+) is needed to allow those to be built
      with future compilers as well.
      
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84129
      Fixes: 33280b4c ("ARM: KVM: Add banked registers save/restore")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e0c7b2b1
    • Ulf Magnusson's avatar
      ARM: mvebu: Fix broken PL310_ERRATA_753970 selects · fc6be8bc
      Ulf Magnusson authored
      commit 8aa36a8d upstream.
      
      The MACH_ARMADA_375 and MACH_ARMADA_38X boards select ARM_ERRATA_753970,
      but it was renamed to PL310_ERRATA_753970 by commit fa0ce403 ("ARM:
      7162/1: errata: tidy up Kconfig options for PL310 errata workarounds").
      
      Fix the selects to use the new name.
      
      Discovered with the
      https://github.com/ulfalizer/Kconfiglib/blob/master/examples/list_undefined.py
      script.
      Fixes: fa0ce403 ("ARM: 7162/1: errata: tidy up Kconfig options for
      PL310 errata workarounds"
      cc: stable@vger.kernel.org
      Signed-off-by: default avatarUlf Magnusson <ulfalizer@gmail.com>
      Signed-off-by: default avatarGregory CLEMENT <gregory.clement@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc6be8bc
    • Daniel Schultz's avatar
      ARM: dts: rockchip: Remove 1.8 GHz operation point from phycore som · 4c02f016
      Daniel Schultz authored
      commit 5ce0bad4 upstream.
      
      Rockchip recommends to run the CPU cores only with operations points of
      1.6 GHz or lower.
      
      Removed the cpu0 node with too high operation points and use the default
      values instead.
      
      Fixes: 903d31e3 ("ARM: dts: rockchip: Add support for phyCORE-RK3288 SoM")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDaniel Schultz <d.schultz@phytec.de>
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c02f016
    • Arnd Bergmann's avatar
      ARM: orion: fix orion_ge00_switch_board_info initialization · 8dc356e5
      Arnd Bergmann authored
      commit 8337d083 upstream.
      
      A section type mismatch warning shows up when building with LTO,
      since orion_ge00_mvmdio_bus_name was put in __initconst but not marked
      const itself:
      
      include/linux/of.h: In function 'spear_setup_of_timer':
      arch/arm/mach-spear/time.c:207:34: error: 'timer_of_match' causes a section type conflict with 'orion_ge00_mvmdio_bus_name'
       static const struct of_device_id timer_of_match[] __initconst = {
                                        ^
      arch/arm/plat-orion/common.c:475:32: note: 'orion_ge00_mvmdio_bus_name' was declared here
       static __initconst const char *orion_ge00_mvmdio_bus_name = "orion-mii";
                                      ^
      
      As pointed out by Andrew Lunn, it should in fact be 'const' but not
      '__initconst' because the string is never copied but may be accessed
      after the init sections are freed. To fix that, I get rid of the
      extra symbol and rewrite the initialization in a simpler way that
      assigns both the bus_id and modalias statically.
      
      I spotted another theoretical bug in the same place, where d->netdev[i]
      may be an out of bounds access, this can be fixed by moving the device
      assignment into the loop.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8dc356e5
    • Jan Beulich's avatar
      x86/mm: Fix {pmd,pud}_{set,clear}_flags() · b20d1086
      Jan Beulich authored
      commit 842cef91 upstream.
      
      Just like pte_{set,clear}_flags() their PMD and PUD counterparts should
      not do any address translation. This was outright wrong under Xen
      (causing a dead boot with no useful output on "suitable" systems), and
      produced needlessly more complicated code (even if just slightly) when
      paravirt was enabled.
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/5A8AF1BB02000078001A91C3@prv-mh.provo.novell.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b20d1086
    • Rasmus Villemoes's avatar
      nospec: Allow index argument to have const-qualified type · 656772cb
      Rasmus Villemoes authored
      commit b98c6a16 upstream.
      
      The last expression in a statement expression need not be a bare
      variable, quoting gcc docs
      
        The last thing in the compound statement should be an expression
        followed by a semicolon; the value of this subexpression serves as the
        value of the entire construct.
      
      and we already use that in e.g. the min/max macros which end with a
      ternary expression.
      
      This way, we can allow index to have const-qualified type, which will in
      some cases avoid the need for introducing a local copy of index of
      non-const qualified type. That, in turn, can prevent readers not
      familiar with the internals of array_index_nospec from wondering about
      the seemingly redundant extra variable, and I think that's worthwhile
      considering how confusing the whole _nospec business is.
      
      The expression _i&_mask has type unsigned long (since that is the type
      of _mask, and the BUILD_BUG_ONs guarantee that _i will get promoted to
      that), so in order not to change the type of the whole expression, add
      a cast back to typeof(_i).
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arch@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/151881604837.17395.10812767547837568328.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      656772cb
    • David Hildenbrand's avatar
      KVM: s390: consider epoch index on TOD clock syncs · 81a158d2
      David Hildenbrand authored
      commit 1575767e upstream.
      
      For now, we don't take care of over/underflows. Especially underflows
      are critical:
      
      Assume the epoch is currently 0 and we get a sync request for delta=1,
      meaning the TOD is moved forward by 1 and we have to fix it up by
      subtracting 1 from the epoch. Right now, this will leave the epoch
      index untouched, resulting in epoch=-1, epoch_idx=0, which is wrong.
      
      We have to take care of over and underflows, also for the VSIE case. So
      let's factor out calculation into a separate function.
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Message-Id: <20180207114647.6220-5-david@redhat.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Fixes: 8fa1696e ("KVM: s390: Multiple Epoch Facility support")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      [use u8 for idx]
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81a158d2
    • David Hildenbrand's avatar
      KVM: s390: consider epoch index on hotplugged CPUs · dbab3751
      David Hildenbrand authored
      commit d16b52cb upstream.
      
      We must copy both, the epoch and the epoch_idx.
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Message-Id: <20180207114647.6220-4-david@redhat.com>
      Fixes: 8fa1696e ("KVM: s390: Multiple Epoch Facility support")
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Fixes: 8fa1696e ("KVM: s390: Multiple Epoch Facility support")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dbab3751
    • David Hildenbrand's avatar
      KVM: s390: provide only a single function for setting the tod (fix SCK) · 58a5d1ac
      David Hildenbrand authored
      commit 0e7def5f upstream.
      
      Right now, SET CLOCK called in the guest does not properly take care of
      the epoch index, as the call goes via the old kvm_s390_set_tod_clock()
      interface. So the epoch index is neither reset to 0, if required, nor
      properly set to e.g. 0xff on negative values.
      
      Fix this by providing a single kvm_s390_set_tod_clock() function. Move
      Multiple-epoch facility handling into it.
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Message-Id: <20180207114647.6220-3-david@redhat.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Fixes: 8fa1696e ("KVM: s390: Multiple Epoch Facility support")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58a5d1ac
    • David Hildenbrand's avatar
      KVM: s390: take care of clock-comparator sign control · c09ea9a8
      David Hildenbrand authored
      commit 5fe01793 upstream.
      
      Missed when enabling the Multiple-epoch facility. If the facility is
      installed and the control is set, a sign based comaprison has to be
      performed.
      
      Right now we would inject wrong interrupts and ignore interrupt
      conditions. Also the sleep time is calculated in a wrong way.
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Message-Id: <20180207114647.6220-2-david@redhat.com>
      Fixes: 8fa1696e ("KVM: s390: Multiple Epoch Facility support")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c09ea9a8
    • Anna Karbownik's avatar
      EDAC, sb_edac: Fix out of bound writes during DIMM configuration on KNL · bd3ead45
      Anna Karbownik authored
      commit bf848670 upstream.
      
      Commit
      
        3286d3eb ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4")
      
      decreased NUM_CHANNELS from 8 to 4, but this is not enough for Knights
      Landing which supports up to 6 channels.
      
      This caused out-of-bounds writes to pvt->mirror_mode and pvt->tolm
      variables which don't pay critical role on KNL code path, so the memory
      corruption wasn't causing any visible driver failures.
      
      The easiest way of fixing it is to change NUM_CHANNELS to 6. Do that.
      
      An alternative solution would be to restructure the KNL part of the
      driver to 2MC/3channel representation.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAnna Karbownik <anna.karbownik@intel.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: jim.m.snow@intel.com
      Cc: krzysztof.paliswiat@intel.com
      Cc: lukasz.odzioba@intel.com
      Cc: qiuxu.zhuo@intel.com
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: <stable@vger.kernel.org>
      Fixes: 3286d3eb ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4")
      Link: http://lkml.kernel.org/r/1519312693-4789-1-git-send-email-anna.karbownik@intel.com
      [ Massage commit message. ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd3ead45
    • Mauro Carvalho Chehab's avatar
      media: m88ds3103: don't call a non-initalized function · 1ba2b9e0
      Mauro Carvalho Chehab authored
      commit b9c97c67 upstream.
      
      If m88d3103 chip ID is not recognized, the device is not initialized.
      
      However, it returns from probe without any error, causing this OOPS:
      
      [    7.689289] Unable to handle kernel NULL pointer dereference at virtual address 00000000
      [    7.689297] pgd = 7b0bd7a7
      [    7.689302] [00000000] *pgd=00000000
      [    7.689318] Internal error: Oops: 80000005 [#1] SMP ARM
      [    7.689322] Modules linked in: dvb_usb_dvbsky(+) m88ds3103 dvb_usb_v2 dvb_core videobuf2_vmalloc videobuf2_memops videobuf2_core crc32_arm_ce videodev media
      [    7.689358] CPU: 3 PID: 197 Comm: systemd-udevd Not tainted 4.15.0-mcc+ #23
      [    7.689361] Hardware name: BCM2835
      [    7.689367] PC is at 0x0
      [    7.689382] LR is at m88ds3103_attach+0x194/0x1d0 [m88ds3103]
      [    7.689386] pc : [<00000000>]    lr : [<bf0ae1ec>]    psr: 60000013
      [    7.689391] sp : ed8e5c20  ip : ed8c1e00  fp : ed8945c0
      [    7.689395] r10: ed894000  r9 : ed894378  r8 : eda736c0
      [    7.689400] r7 : ed894070  r6 : ed8e5c44  r5 : bf0bb040  r4 : eda77600
      [    7.689405] r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : eda77600
      [    7.689412] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      [    7.689417] Control: 10c5383d  Table: 2d8e806a  DAC: 00000051
      [    7.689423] Process systemd-udevd (pid: 197, stack limit = 0xe9dbfb63)
      [    7.689428] Stack: (0xed8e5c20 to 0xed8e6000)
      [    7.689439] 5c20: ed853a80 eda73640 ed894000 ed8942c0 ed853a80 bf0b9e98 ed894070 bf0b9f10
      [    7.689449] 5c40: 00000000 00000000 bf08c17c c08dfc50 00000000 00000000 00000000 00000000
      [    7.689459] 5c60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      [    7.689468] 5c80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      [    7.689479] 5ca0: 00000000 00000000 ed8945c0 ed8942c0 ed894000 ed894830 bf0b9e98 00000000
      [    7.689490] 5cc0: ed894378 bf0a3cb4 bf0bc3b0 0000533b ed920540 00000000 00000034 bf0a6434
      [    7.689500] 5ce0: ee952070 ed826600 bf0a7038 bf0a2dd8 00000001 bf0a6768 bf0a2f90 ed8943c0
      [    7.689511] 5d00: 00000000 c08eca68 ed826620 ed826620 00000000 ee952070 bf0bc034 ee952000
      [    7.689521] 5d20: ed826600 bf0bb080 ffffffed c0aa9e9c c0aa9dac ed826620 c16edf6c c168c2c8
      [    7.689531] 5d40: c16edf70 00000000 bf0bc034 0000000d 00000000 c08e268c bf0bb080 ed826600
      [    7.689541] 5d60: bf0bc034 ed826654 ed826620 bf0bc034 c164c8bc 00000000 00000001 00000000
      [    7.689553] 5d80: 00000028 c08e2948 00000000 bf0bc034 c08e2848 c08e0778 ee9f0a58 ed88bab4
      [    7.689563] 5da0: bf0bc034 ed90ba80 c168c1f0 c08e1934 bf0bb3bc c17045ac bf0bc034 c164c8bc
      [    7.689574] 5dc0: bf0bc034 bf0bb3bc ed91f564 c08e34ec bf0bc000 c164c8bc bf0bc034 c0aa8dc4
      [    7.689584] 5de0: ffffe000 00000000 bf0bf000 ed91f600 ed91f564 c03021e4 00000001 00000000
      [    7.689595] 5e00: c166e040 8040003f ed853a80 bf0bc448 00000000 c1678174 ed853a80 f0f22000
      [    7.689605] 5e20: f0f21fff 8040003f 014000c0 ed91e700 ed91e700 c16d8e68 00000001 ed91e6c0
      [    7.689615] 5e40: bf0bc400 00000001 bf0bc400 ed91f564 00000001 00000000 00000028 c03c9a24
      [    7.689625] 5e60: 00000001 c03c8c94 ed8e5f50 ed8e5f50 00000001 bf0bc400 ed91f540 c03c8cb0
      [    7.689637] 5e80: bf0bc40c 00007fff bf0bc400 c03c60b0 00000000 bf0bc448 00000028 c0e09684
      [    7.689647] 5ea0: 00000002 bf0bc530 c1234bf8 bf0bc5dc bf0bc514 c10ebbe8 ffffe000 bf000000
      [    7.689657] 5ec0: 00011538 00000000 ed8e5f48 00000000 00000000 00000000 00000000 00000000
      [    7.689666] 5ee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      [    7.689676] 5f00: 00000000 00000000 7fffffff 00000000 00000013 b6e55a18 0000017b c0309104
      [    7.689686] 5f20: ed8e4000 00000000 00510af0 c03c9430 7fffffff 00000000 00000003 00000000
      [    7.689697] 5f40: 00000000 f0f0f000 00011538 00000000 f0f107b0 f0f0f000 00011538 f0f1fdb8
      [    7.689707] 5f60: f0f1fbe8 f0f1b974 00004000 000041e0 bf0bc3d0 00000001 00000000 000024c4
      [    7.689717] 5f80: 0000002d 0000002e 00000019 00000000 00000010 00000000 16894000 00000000
      [    7.689727] 5fa0: 00000000 c0308f20 16894000 00000000 00000013 b6e55a18 00000000 b6e5652c
      [    7.689737] 5fc0: 16894000 00000000 00000000 0000017b 00020000 00508110 00000000 00510af0
      [    7.689748] 5fe0: bef68948 bef68938 b6e4d3d0 b6d32590 60000010 00000013 00000000 00000000
      [    7.689790] [<bf0ae1ec>] (m88ds3103_attach [m88ds3103]) from [<bf0b9f10>] (dvbsky_s960c_attach+0x78/0x280 [dvb_usb_dvbsky])
      [    7.689821] [<bf0b9f10>] (dvbsky_s960c_attach [dvb_usb_dvbsky]) from [<bf0a3cb4>] (dvb_usbv2_probe+0xa3c/0x1024 [dvb_usb_v2])
      [    7.689849] [<bf0a3cb4>] (dvb_usbv2_probe [dvb_usb_v2]) from [<c0aa9e9c>] (usb_probe_interface+0xf0/0x2a8)
      [    7.689869] [<c0aa9e9c>] (usb_probe_interface) from [<c08e268c>] (driver_probe_device+0x2f8/0x4b4)
      [    7.689881] [<c08e268c>] (driver_probe_device) from [<c08e2948>] (__driver_attach+0x100/0x11c)
      [    7.689895] [<c08e2948>] (__driver_attach) from [<c08e0778>] (bus_for_each_dev+0x4c/0x9c)
      [    7.689909] [<c08e0778>] (bus_for_each_dev) from [<c08e1934>] (bus_add_driver+0x1c0/0x264)
      [    7.689919] [<c08e1934>] (bus_add_driver) from [<c08e34ec>] (driver_register+0x78/0xf4)
      [    7.689931] [<c08e34ec>] (driver_register) from [<c0aa8dc4>] (usb_register_driver+0x70/0x134)
      [    7.689946] [<c0aa8dc4>] (usb_register_driver) from [<c03021e4>] (do_one_initcall+0x44/0x168)
      [    7.689963] [<c03021e4>] (do_one_initcall) from [<c03c9a24>] (do_init_module+0x64/0x1f4)
      [    7.689979] [<c03c9a24>] (do_init_module) from [<c03c8cb0>] (load_module+0x20a0/0x25c8)
      [    7.689993] [<c03c8cb0>] (load_module) from [<c03c9430>] (SyS_finit_module+0xb4/0xec)
      [    7.690007] [<c03c9430>] (SyS_finit_module) from [<c0308f20>] (ret_fast_syscall+0x0/0x54)
      [    7.690018] Code: bad PC value
      
      This may happen on normal circumstances, if, for some reason, the demod
      hangs and start returning an invalid chip ID:
      
      [   10.394395] m88ds3103 3-0068: Unknown device. Chip_id=00
      
      So, change the logic to cause probe to fail with -ENODEV, preventing
      the OOPS.
      
      Detected while testing DVB MMAP patches on Raspberry Pi 3 with
      DVBSky S960CI.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ba2b9e0
    • Ming Lei's avatar
      blk-mq: don't call io sched's .requeue_request when requeueing rq to ->dispatch · ccddee81
      Ming Lei authored
      commit 105976f5 upstream.
      
      __blk_mq_requeue_request() covers two cases:
      
      - one is that the requeued request is added to hctx->dispatch, such as
      blk_mq_dispatch_rq_list()
      
      - another case is that the request is requeued to io scheduler, such as
      blk_mq_requeue_request().
      
      We should call io sched's .requeue_request callback only for the 2nd
      case.
      
      Cc: Paolo Valente <paolo.valente@linaro.org>
      Cc: Omar Sandoval <osandov@fb.com>
      Fixes: bd166ef1 ("blk-mq-sched: add framework for MQ capable IO schedulers")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Acked-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ccddee81
    • Julian Wiedmann's avatar
      s390/qeth: fix IPA command submission race · c5f32462
      Julian Wiedmann authored
      
      [ Upstream commit d22ffb5a ]
      
      If multiple IPA commands are build & sent out concurrently,
      fill_ipacmd_header() may assign a seqno value to a command that's
      different from what send_control_data() later assigns to this command's
      reply.
      This is due to other commands passing through send_control_data(),
      and incrementing card->seqno.ipa along the way.
      
      So one IPA command has no reply that's waiting for its seqno, while some
      other IPA command has multiple reply objects waiting for it.
      Only one of those waiting replies wins, and the other(s) times out and
      triggers a recovery via send_ipa_cmd().
      
      Fix this by making sure that the same seqno value is assigned to
      a command and its reply object.
      Do so immediately before submitting the command & while holding the
      irq_pending "lock", to produce nicely ascending seqnos.
      
      As a side effect, *all* IPA commands now use a reply object that's
      waiting for its actual seqno. Previously, early IPA commands that were
      submitted while the card was still DOWN used the "catch-all" IDX seqno.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c5f32462
    • Julian Wiedmann's avatar
      s390/qeth: fix IP address lookup for L3 devices · eae17c40
      Julian Wiedmann authored
      
      [ Upstream commit c5c48c58 ]
      
      Current code ("qeth_l3_ip_from_hash()") matches a queried address object
      against objects in the IP table by IP address, Mask/Prefix Length and
      MAC address ("qeth_l3_ipaddrs_is_equal()"). But what callers actually
      require is either
      a) "is this IP address registered" (ie. match by IP address only),
      before adding a new address.
      b) or "is this address object registered" (ie. match all relevant
         attributes), before deleting an address.
      
      Right now
      1. the ADD path is too strict in its lookup, and eg. doesn't detect
      conflicts between an existing NORMAL address and a new VIPA address
      (because the NORMAL address will have mask != 0, while VIPA has
      a mask == 0),
      2. the DELETE path is not strict enough, and eg. allows del_rxip() to
      delete a VIPA address as long as the IP address matches.
      
      Fix all this by adding helpers (_addr_match_ip() and _addr_match_all())
      that do the appropriate checking.
      
      Note that the ADD path for NORMAL addresses is special, as qeth keeps
      track of how many times such an address is in use (and there is no
      immediate way of returning errors to the caller). So when a requested
      NORMAL address _fully_ matches an existing one, it's not considered a
      conflict and we merely increment the refcount.
      
      Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eae17c40
    • Julian Wiedmann's avatar
      Revert "s390/qeth: fix using of ref counter for rxip addresses" · 87c4789f
      Julian Wiedmann authored
      
      [ Upstream commit 4964c66f ]
      
      This reverts commit cb816192.
      
      The issue this attempted to fix never actually occurs.
      l3_add_rxip() checks (via l3_ip_from_hash()) if the requested address
      was previously added to the card. If so, it returns -EEXIST and doesn't
      call l3_add_ip().
      As a result, the "address exists" path in l3_add_ip() is never taken
      for rxip addresses, and this patch had no effect.
      
      Fixes: cb816192 ("s390/qeth: fix using of ref counter for rxip addresses")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87c4789f
    • Julian Wiedmann's avatar
      s390/qeth: fix double-free on IP add/remove race · 56f662db
      Julian Wiedmann authored
      
      [ Upstream commit 14d066c3 ]
      
      Registering an IPv4 address with the HW takes quite a while, so we
      temporarily drop the ip_htable lock. Any concurrent add/remove of the
      same IP adjusts the IP's use count, and (on remove) is then blocked by
      addr->in_progress.
      After the register call has completed, we check the use count for
      concurrently attempted add/remove calls - and possibly straight-away
      deregister the IP again. This happens via l3_delete_ip(), which
      1) looks up the queried IP in the htable (getting a reference to the
         *same* queried object),
      2) deregisters the IP from the HW, and
      3) frees the IP object.
      
      The caller in l3_add_ip() then does a second free on the same object.
      
      For this case, skip all the extra checks and lookups in l3_delete_ip()
      and just deregister & free the IP object ourselves.
      
      Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56f662db
    • Julian Wiedmann's avatar
      s390/qeth: fix IP removal on offline cards · 02763710
      Julian Wiedmann authored
      
      [ Upstream commit 98d823ab ]
      
      If the HW is not reachable, then none of the IPs in qeth's internal
      table has been registered with the HW yet. So when deleting such an IP,
      there's no need to stage it for deregistration - just drop it from
      the table.
      
      This fixes the "add-delete-add" scenario on an offline card, where the
      the second "add" merely increments the IP's use count. But as the IP is
      still set to DISP_ADDR_DELETE from the previous "delete" step,
      l3_recover_ip() won't register it with the HW when the card goes online.
      
      Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02763710
    • Julian Wiedmann's avatar
      s390/qeth: fix overestimated count of buffer elements · fa4919e3
      Julian Wiedmann authored
      
      [ Upstream commit 12472af8 ]
      
      qeth_get_elements_for_range() doesn't know how to handle a 0-length
      range (ie. start == end), and returns 1 when it should return 0.
      Such ranges occur on TSO skbs, where the L2/L3/L4 headers (and thus all
      of the skb's linear data) are skipped when mapping the skb into regular
      buffer elements.
      
      This overestimation may cause several performance-related issues:
      1. sub-optimal IO buffer selection, where the next buffer gets selected
         even though the skb would actually still fit into the current buffer.
      2. forced linearization, if the element count for a non-linear skb
         exceeds QETH_MAX_BUFFER_ELEMENTS.
      
      Rather than modifying qeth_get_elements_for_range() and adding overhead
      to every caller, fix up those callers that are in risk of passing a
      0-length range.
      
      Fixes: 2863c613 ("qeth: refactor calculation of SBALE count")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa4919e3
    • Julian Wiedmann's avatar
      s390/qeth: fix SETIP command handling · 128c7e69
      Julian Wiedmann authored
      
      [ Upstream commit 1c5b2216 ]
      
      send_control_data() applies some special handling to SETIP v4 IPA
      commands. But current code parses *all* command types for the SETIP
      command code. Limit the command code check to IPA commands.
      
      Fixes: 5b54e16f ("qeth: do not spin for SETIP ip assist command")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      128c7e69
    • Ursula Braun's avatar
      s390/qeth: fix underestimated count of buffer elements · fcdfb9d8
      Ursula Braun authored
      
      [ Upstream commit 89271c65 ]
      
      For a memory range/skb where the last byte falls onto a page boundary
      (ie. 'end' is of the form xxx...xxx001), the PFN_UP() part of the
      calculation currently doesn't round up to the next PFN due to an
      off-by-one error.
      Thus qeth believes that the skb occupies one page less than it
      actually does, and may select a IO buffer that doesn't have enough spare
      buffer elements to fit all of the skb's data.
      HW detects this as a malformed buffer descriptor, and raises an
      exception which then triggers device recovery.
      
      Fixes: 2863c613 ("qeth: refactor calculation of SBALE count")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fcdfb9d8
    • Jason Wang's avatar
      virtio-net: disable NAPI only when enabled during XDP set · 99a78194
      Jason Wang authored
      
      [ Upstream commit 4e09ff53 ]
      
      We try to disable NAPI to prevent a single XDP TX queue being used by
      multiple cpus. But we don't check if device is up (NAPI is enabled),
      this could result stall because of infinite wait in
      napi_disable(). Fixing this by checking device state through
      netif_running() before.
      
      Fixes: 4941d472 ("virtio-net: do not reset during XDP set")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99a78194
    • Jason Wang's avatar
      tuntap: disable preemption during XDP processing · 5134b919
      Jason Wang authored
      
      [ Upstream commit 23e43f07 ]
      
      Except for tuntap, all other drivers' XDP was implemented at NAPI
      poll() routine in a bh. This guarantees all XDP operation were done at
      the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But
      for tuntap, we do it in process context and we try to protect XDP
      processing by RCU reader lock. This is insufficient since
      CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which
      breaks the assumption that all XDP were processed in the same CPU.
      
      Fixing this by simply disabling preemption during XDP processing.
      
      Fixes: 761876c8 ("tap: XDP support")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5134b919