1. 28 Nov, 2018 2 commits
    • Linus Torvalds's avatar
      Merge tag 'spi-fix-v4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 5b26f718
      Linus Torvalds authored
      Pull spi fixes from Mark Brown:
       "A few driver specific fixes here, nothing big or that stands out for
        anyone other than the driver users.
      
        The omap2-mcspi fix is for issues that started showing up with a
        change in defconfig in this release to make cpuidle get turned on by
        default"
      
      * tag 'spi-fix-v4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: omap2-mcspi: Add missing suspend and resume calls
        spi: mediatek: use correct mata->xfer_len when in fifo transfer
        spi: uniphier: fix incorrect property items
      5b26f718
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · d8242d22
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "Bugfixes, many of them reported by syzkaller and mostly predating the
        merge window"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb
        kvm: mmu: Fix race in emulated page table writes
        KVM: nVMX: vmcs12 revision_id is always VMCS12_REVISION even when copied from eVMCS
        KVM: nVMX: Verify eVMCS revision id match supported eVMCS version on eVMCS VMPTRLD
        KVM: nVMX/nSVM: Fix bug which sets vcpu->arch.tsc_offset to L1 tsc_offset
        x86/kvm/vmx: fix old-style function declaration
        KVM: x86: fix empty-body warnings
        KVM: VMX: Update shared MSRs to be saved/restored on MSR_EFER.LMA changes
        KVM: x86: Fix kernel info-leak in KVM_HC_CLOCK_PAIRING hypercall
        KVM: nVMX: Fix kernel info-leak when enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS more than once
        svm: Add mutex_lock to protect apic_access_page_done on AMD systems
        KVM: X86: Fix scan ioapic use-before-initialization
        KVM: LAPIC: Fix pv ipis use-before-initialization
        KVM: VMX: re-add ple_gap module parameter
        KVM: PPC: Book3S HV: Fix handling for interrupted H_ENTER_NESTED
      d8242d22
  2. 27 Nov, 2018 15 commits
    • Jim Mattson's avatar
      kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb · fd65d314
      Jim Mattson authored
      Previously, we only called indirect_branch_prediction_barrier on the
      logical CPU that freed a vmcb. This function should be called on all
      logical CPUs that last loaded the vmcb in question.
      
      Fixes: 15d45071 ("KVM/x86: Add IBPB support")
      Reported-by: default avatarNeel Natu <neelnatu@google.com>
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fd65d314
    • Junaid Shahid's avatar
      kvm: mmu: Fix race in emulated page table writes · 0e0fee5c
      Junaid Shahid authored
      When a guest page table is updated via an emulated write,
      kvm_mmu_pte_write() is called to update the shadow PTE using the just
      written guest PTE value. But if two emulated guest PTE writes happened
      concurrently, it is possible that the guest PTE and the shadow PTE end
      up being out of sync. Emulated writes do not mark the shadow page as
      unsync-ed, so this inconsistency will not be resolved even by a guest TLB
      flush (unless the page was marked as unsync-ed at some other point).
      
      This is fixed by re-reading the current value of the guest PTE after the
      MMU lock has been acquired instead of just using the value that was
      written prior to calling kvm_mmu_pte_write().
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      Reviewed-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0e0fee5c
    • Liran Alon's avatar
      KVM: nVMX: vmcs12 revision_id is always VMCS12_REVISION even when copied from eVMCS · 52ad7eb3
      Liran Alon authored
      vmcs12 represents the per-CPU cache of L1 active vmcs12.
      
      This cache can be loaded by one of the following:
      1) Guest making a vmcs12 active by exeucting VMPTRLD
      2) Guest specifying eVMCS in VP assist page and executing
      VMLAUNCH/VMRESUME.
      
      Either way, vmcs12 should have revision_id of VMCS12_REVISION.
      Which is not equal to eVMCS revision_id which specifies used
      VersionNumber of eVMCS struct (e.g. KVM_EVMCS_VERSION).
      
      Specifically, this causes an issue in restoring a nested VM state
      because vmx_set_nested_state() verifies that vmcs12->revision_id
      is equal to VMCS12_REVISION which was not true in case vmcs12
      was populated from an eVMCS by vmx_get_nested_state() which calls
      copy_enlightened_to_vmcs12().
      Reviewed-by: default avatarDarren Kenny <darren.kenny@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      52ad7eb3
    • Liran Alon's avatar
      KVM: nVMX: Verify eVMCS revision id match supported eVMCS version on eVMCS VMPTRLD · 72aeb60c
      Liran Alon authored
      According to TLFS section 16.11.2 Enlightened VMCS, the first u32
      field of eVMCS should specify eVMCS VersionNumber.
      
      This version should be in the range of supported eVMCS versions exposed
      to guest via CPUID.0x4000000A.EAX[0:15].
      The range which KVM expose to guest in this CPUID field should be the
      same as the value returned in vmcs_version by nested_enable_evmcs().
      
      According to the above, eVMCS VMPTRLD should verify that version specified
      in given eVMCS is in the supported range. However, current code
      mistakenly verfies this field against VMCS12_REVISION.
      
      One can also see that when KVM use eVMCS, it makes sure that
      alloc_vmcs_cpu() sets allocated eVMCS revision_id to KVM_EVMCS_VERSION.
      
      Obvious fix should just change eVMCS VMPTRLD to verify first u32 field
      of eVMCS is equal to KVM_EVMCS_VERSION.
      However, it turns out that Microsoft Hyper-V fails to comply to their
      own invented interface: When Hyper-V use eVMCS, it just sets first u32
      field of eVMCS to revision_id specified in MSR_IA32_VMX_BASIC (In our
      case: VMCS12_REVISION). Instead of used eVMCS version number which is
      one of the supported versions specified in CPUID.0x4000000A.EAX[0:15].
      To overcome Hyper-V bug, we accept either a supported eVMCS version
      or VMCS12_REVISION as valid values for first u32 field of eVMCS.
      
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: default avatarNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: default avatarMark Kanda <mark.kanda@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      72aeb60c
    • Leonid Shatz's avatar
      KVM: nVMX/nSVM: Fix bug which sets vcpu->arch.tsc_offset to L1 tsc_offset · 326e7425
      Leonid Shatz authored
      Since commit e79f245d ("X86/KVM: Properly update 'tsc_offset' to
      represent the running guest"), vcpu->arch.tsc_offset meaning was
      changed to always reflect the tsc_offset value set on active VMCS.
      Regardless if vCPU is currently running L1 or L2.
      
      However, above mentioned commit failed to also change
      kvm_vcpu_write_tsc_offset() to set vcpu->arch.tsc_offset correctly.
      This is because vmx_write_tsc_offset() could set the tsc_offset value
      in active VMCS to given offset parameter *plus vmcs12->tsc_offset*.
      However, kvm_vcpu_write_tsc_offset() just sets vcpu->arch.tsc_offset
      to given offset parameter. Without taking into account the possible
      addition of vmcs12->tsc_offset. (Same is true for SVM case).
      
      Fix this issue by changing kvm_x86_ops->write_tsc_offset() to return
      actually set tsc_offset in active VMCS and modify
      kvm_vcpu_write_tsc_offset() to set returned value in
      vcpu->arch.tsc_offset.
      In addition, rename write_tsc_offset() callback to write_l1_tsc_offset()
      to make it clear that it is meant to set L1 TSC offset.
      
      Fixes: e79f245d ("X86/KVM: Properly update 'tsc_offset' to represent the running guest")
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: default avatarKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Signed-off-by: default avatarLeonid Shatz <leonid.shatz@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      326e7425
    • Yi Wang's avatar
      x86/kvm/vmx: fix old-style function declaration · 1e4329ee
      Yi Wang authored
      The inline keyword which is not at the beginning of the function
      declaration may trigger the following build warnings, so let's fix it:
      
      arch/x86/kvm/vmx.c:1309:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
      arch/x86/kvm/vmx.c:5947:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
      arch/x86/kvm/vmx.c:5985:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
      arch/x86/kvm/vmx.c:6023:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
      Signed-off-by: default avatarYi Wang <wang.yi59@zte.com.cn>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1e4329ee
    • Yi Wang's avatar
      KVM: x86: fix empty-body warnings · 354cb410
      Yi Wang authored
      We get the following warnings about empty statements when building
      with 'W=1':
      
      arch/x86/kvm/lapic.c:632:53: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      arch/x86/kvm/lapic.c:1907:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      arch/x86/kvm/lapic.c:1936:65: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      arch/x86/kvm/lapic.c:1975:44: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      
      Rework the debug helper macro to get rid of these warnings.
      Signed-off-by: default avatarYi Wang <wang.yi59@zte.com.cn>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      354cb410
    • Liran Alon's avatar
      KVM: VMX: Update shared MSRs to be saved/restored on MSR_EFER.LMA changes · f48b4711
      Liran Alon authored
      When guest transitions from/to long-mode by modifying MSR_EFER.LMA,
      the list of shared MSRs to be saved/restored on guest<->host
      transitions is updated (See vmx_set_efer() call to setup_msrs()).
      
      On every entry to guest, vcpu_enter_guest() calls
      vmx_prepare_switch_to_guest(). This function should also take care
      of setting the shared MSRs to be saved/restored. However, the
      function does nothing in case we are already running with loaded
      guest state (vmx->loaded_cpu_state != NULL).
      
      This means that even when guest modifies MSR_EFER.LMA which results
      in updating the list of shared MSRs, it isn't being taken into account
      by vmx_prepare_switch_to_guest() because it happens while we are
      running with loaded guest state.
      
      To fix above mentioned issue, add a flag to mark that the list of
      shared MSRs has been updated and modify vmx_prepare_switch_to_guest()
      to set shared MSRs when running with host state *OR* list of shared
      MSRs has been updated.
      
      Note that this issue was mistakenly introduced by commit
      678e315e ("KVM: vmx: add dedicated utility to access guest's
      kernel_gs_base") because previously vmx_set_efer() always called
      vmx_load_host_state() which resulted in vmx_prepare_switch_to_guest() to
      set shared MSRs.
      
      Fixes: 678e315e ("KVM: vmx: add dedicated utility to access guest's kernel_gs_base")
      Reported-by: default avatarEyal Moscovici <eyal.moscovici@oracle.com>
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f48b4711
    • Liran Alon's avatar
      KVM: x86: Fix kernel info-leak in KVM_HC_CLOCK_PAIRING hypercall · bcbfbd8e
      Liran Alon authored
      kvm_pv_clock_pairing() allocates local var
      "struct kvm_clock_pairing clock_pairing" on stack and initializes
      all it's fields besides padding (clock_pairing.pad[]).
      
      Because clock_pairing var is written completely (including padding)
      to guest memory, failure to init struct padding results in kernel
      info-leak.
      
      Fix the issue by making sure to also init the padding with zeroes.
      
      Fixes: 55dd00a7 ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall")
      Reported-by: syzbot+a8ef68d71211ba264f56@syzkaller.appspotmail.com
      Reviewed-by: default avatarMark Kanda <mark.kanda@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bcbfbd8e
    • Liran Alon's avatar
      KVM: nVMX: Fix kernel info-leak when enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS more than once · 7f9ad1df
      Liran Alon authored
      Consider the case that userspace enables KVM_CAP_HYPERV_ENLIGHTENED_VMCS twice:
      1) kvm_vcpu_ioctl_enable_cap() is called to enable
      KVM_CAP_HYPERV_ENLIGHTENED_VMCS which calls nested_enable_evmcs().
      2) nested_enable_evmcs() sets enlightened_vmcs_enabled to true and fills
      vmcs_version which is then copied to userspace.
      3) kvm_vcpu_ioctl_enable_cap() is called again to enable
      KVM_CAP_HYPERV_ENLIGHTENED_VMCS which calls nested_enable_evmcs().
      4) This time nested_enable_evmcs() just returns 0 as
      enlightened_vmcs_enabled is already true. *Without filling
      vmcs_version*.
      5) kvm_vcpu_ioctl_enable_cap() continues as usual and copies
      *uninitialized* vmcs_version to userspace which leads to kernel info-leak.
      
      Fix this issue by simply changing nested_enable_evmcs() to always fill
      vmcs_version output argument. Even when enlightened_vmcs_enabled is
      already set to true.
      
      Note that SVM's nested_enable_evmcs() should not be modified because it
      always returns a non-zero value (-ENODEV) which results in
      kvm_vcpu_ioctl_enable_cap() skipping the copy of vmcs_version to
      userspace (as it should).
      
      Fixes: 57b119da ("KVM: nVMX: add KVM_CAP_HYPERV_ENLIGHTENED_VMCS capability")
      Reported-by: syzbot+cfbc368e283d381f8cef@syzkaller.appspotmail.com
      Reviewed-by: default avatarKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7f9ad1df
    • Wei Wang's avatar
      svm: Add mutex_lock to protect apic_access_page_done on AMD systems · 30510387
      Wei Wang authored
      There is a race condition when accessing kvm->arch.apic_access_page_done.
      Due to it, x86_set_memory_region will fail when creating the second vcpu
      for a svm guest.
      
      Add a mutex_lock to serialize the accesses to apic_access_page_done.
      This lock is also used by vmx for the same purpose.
      Signed-off-by: default avatarWei Wang <wawei@amazon.de>
      Signed-off-by: default avatarAmadeusz Juskowiak <ajusk@amazon.de>
      Signed-off-by: default avatarJulian Stecklina <jsteckli@amazon.de>
      Signed-off-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Reviewed-by: default avatarJoerg Roedel <jroedel@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      30510387
    • Wanpeng Li's avatar
      KVM: X86: Fix scan ioapic use-before-initialization · e97f852f
      Wanpeng Li authored
      Reported by syzkaller:
      
       BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8
       PGD 80000003ec4da067 P4D 80000003ec4da067 PUD 3f7bfa067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP PTI
       CPU: 7 PID: 5059 Comm: debug Tainted: G           OE     4.19.0-rc5 #16
       RIP: 0010:__lock_acquire+0x1a6/0x1990
       Call Trace:
        lock_acquire+0xdb/0x210
        _raw_spin_lock+0x38/0x70
        kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
        vcpu_enter_guest+0x167e/0x1910 [kvm]
        kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
        kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
        do_vfs_ioctl+0xa5/0x690
        ksys_ioctl+0x6d/0x80
        __x64_sys_ioctl+0x1a/0x20
        do_syscall_64+0x83/0x6e0
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT6 msr
      and triggers scan ioapic logic to load synic vectors into EOI exit bitmap.
      However, irqchip is not initialized by this simple testcase, ioapic/apic
      objects should not be accessed.
      This can be triggered by the following program:
      
          #define _GNU_SOURCE
      
          #include <endian.h>
          #include <stdint.h>
          #include <stdio.h>
          #include <stdlib.h>
          #include <string.h>
          #include <sys/syscall.h>
          #include <sys/types.h>
          #include <unistd.h>
      
          uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};
      
          int main(void)
          {
          	syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
          	long res = 0;
          	memcpy((void*)0x20000040, "/dev/kvm", 9);
          	res = syscall(__NR_openat, 0xffffffffffffff9c, 0x20000040, 0, 0);
          	if (res != -1)
          		r[0] = res;
          	res = syscall(__NR_ioctl, r[0], 0xae01, 0);
          	if (res != -1)
          		r[1] = res;
          	res = syscall(__NR_ioctl, r[1], 0xae41, 0);
          	if (res != -1)
          		r[2] = res;
          	memcpy(
          			(void*)0x20000080,
          			"\x01\x00\x00\x00\x00\x5b\x61\xbb\x96\x00\x00\x40\x00\x00\x00\x00\x01\x00"
          			"\x08\x00\x00\x00\x00\x00\x0b\x77\xd1\x78\x4d\xd8\x3a\xed\xb1\x5c\x2e\x43"
          			"\xaa\x43\x39\xd6\xff\xf5\xf0\xa8\x98\xf2\x3e\x37\x29\x89\xde\x88\xc6\x33"
          			"\xfc\x2a\xdb\xb7\xe1\x4c\xac\x28\x61\x7b\x9c\xa9\xbc\x0d\xa0\x63\xfe\xfe"
          			"\xe8\x75\xde\xdd\x19\x38\xdc\x34\xf5\xec\x05\xfd\xeb\x5d\xed\x2e\xaf\x22"
          			"\xfa\xab\xb7\xe4\x42\x67\xd0\xaf\x06\x1c\x6a\x35\x67\x10\x55\xcb",
          			106);
          	syscall(__NR_ioctl, r[2], 0x4008ae89, 0x20000080);
          	syscall(__NR_ioctl, r[2], 0xae80, 0);
          	return 0;
          }
      
      This patch fixes it by bailing out scan ioapic if ioapic is not initialized in
      kernel.
      Reported-by: default avatarWei Wu <ww9210@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Wei Wu <ww9210@gmail.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e97f852f
    • Wanpeng Li's avatar
      KVM: LAPIC: Fix pv ipis use-before-initialization · 38ab012f
      Wanpeng Li authored
      Reported by syzkaller:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
       PGD 800000040410c067 P4D 800000040410c067 PUD 40410d067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP PTI
       CPU: 3 PID: 2567 Comm: poc Tainted: G           OE     4.19.0-rc5 #16
       RIP: 0010:kvm_pv_send_ipi+0x94/0x350 [kvm]
       Call Trace:
        kvm_emulate_hypercall+0x3cc/0x700 [kvm]
        handle_vmcall+0xe/0x10 [kvm_intel]
        vmx_handle_exit+0xc1/0x11b0 [kvm_intel]
        vcpu_enter_guest+0x9fb/0x1910 [kvm]
        kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
        kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
        do_vfs_ioctl+0xa5/0x690
        ksys_ioctl+0x6d/0x80
        __x64_sys_ioctl+0x1a/0x20
        do_syscall_64+0x83/0x6e0
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The reason is that the apic map has not yet been initialized, the testcase
      triggers pv_send_ipi interface by vmcall which results in kvm->arch.apic_map
      is dereferenced. This patch fixes it by checking whether or not apic map is
      NULL and bailing out immediately if that is the case.
      
      Fixes: 4180bf1b (KVM: X86: Implement "send IPI" hypercall)
      Reported-by: default avatarWei Wu <ww9210@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Wei Wu <ww9210@gmail.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      38ab012f
    • Luiz Capitulino's avatar
      KVM: VMX: re-add ple_gap module parameter · a87c99e6
      Luiz Capitulino authored
      Apparently, the ple_gap parameter was accidentally removed
      by commit c8e88717. Add it
      back.
      Signed-off-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: c8e88717Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a87c99e6
    • Matias Bjørling's avatar
      ia64: export node_distance function · ef78e5ec
      Matias Bjørling authored
      The numa_slit variable used by node_distance is available to a
      module as long as it is linked at compile-time. However, it is
      not available to loadable modules. Leading to errors such as:
      
        ERROR: "numa_slit" [drivers/nvme/host/nvme-core.ko] undefined!
      
      The error above is caused by the nvme multipath code that makes
      use of node_distance for its path calculation. When the patch was
      added, the lightnvm subsystem would select nvme and always compile
      it in, leading to the node_distance call to always succeed.
      However, when this requirement was removed, nvme could be compiled
      in as a module, which exposed this bug.
      
      This patch extracts node_distance to a function and exports it.
      Since ACPI is depending on node_distance being a simple lookup to
      numa_slit, the previous behavior is exposed as slit_distance and its
      users updated.
      
      Fixes: f3334447 "nvme: take node locality into account when selecting a path"
      Fixes: 73569e11 "lightnvm: remove dependencies on BLK_DEV_NVME and PCI"
      Signed-off-by: default avatarMatias Bjøring <mb@lightnvm.io>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ef78e5ec
  3. 26 Nov, 2018 1 commit
  4. 25 Nov, 2018 6 commits
    • Linus Torvalds's avatar
      Linux 4.20-rc4 · 2e6e902d
      Linus Torvalds authored
      2e6e902d
    • Paolo Bonzini's avatar
      Merge tag 'kvm-ppc-fixes-4.20-1' of... · caf54f59
      Paolo Bonzini authored
      Merge tag 'kvm-ppc-fixes-4.20-1' of https://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
      
      PPC KVM fixes for 4.20
      
      This has a single 1-line patch which fixes a bug in the recently-merged
      nested HV KVM support.
      caf54f59
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-4.20-3' of git://git.infradead.org/users/hch/dma-mapping · d6d460b8
      Linus Torvalds authored
      Pull dma-mapping fixes from Christoph Hellwig:
       "Two dma-direct / swiotlb regressions fixes:
      
         - zero is a valid physical address on some arm boards, we can't use
           it as the error value
      
         - don't try to cache flush the error return value (no matter what it
           is)"
      
      * tag 'dma-mapping-4.20-3' of git://git.infradead.org/users/hch/dma-mapping:
        swiotlb: Skip cache maintenance on map error
        dma-direct: Make DIRECT_MAPPING_ERROR viable for SWIOTLB
      d6d460b8
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.20-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 17c2f540
      Linus Torvalds authored
      Pull NFS client bugfixes from Trond Myklebust:
      
       - Fix a NFSv4 state manager deadlock when returning a delegation
      
       - NFSv4.2 copy do not allocate memory under the lock
      
       - flexfiles: Use the correct stateid for IO in the tightly coupled case
      
      * tag 'nfs-for-4.20-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        flexfiles: use per-mirror specified stateid for IO
        NFSv4.2 copy do not allocate memory under the lock
        NFSv4: Fix a NFSv4 state manager deadlock
      17c2f540
    • Luc Van Oostenryck's avatar
      MAINTAINERS: change Sparse's maintainer · 4e962ff6
      Luc Van Oostenryck authored
      I'm taking over the maintainance of Sparse so add myself as
      maintainer and move Christopher's info to CREDITS.
      Signed-off-by: default avatarLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e962ff6
    • Linus Torvalds's avatar
      Merge tag 'xarray-4.20-rc4' of git://git.infradead.org/users/willy/linux-dax · e2125dac
      Linus Torvalds authored
      Pull XArray updates from Matthew Wilcox:
       "We found some bugs in the DAX conversion to XArray (and one bug which
        predated the XArray conversion). There were a couple of bugs in some
        of the higher-level functions, which aren't actually being called in
        today's kernel, but surfaced as a result of converting existing radix
        tree & IDR users over to the XArray.
      
        Some of the other changes to how the higher-level APIs work were also
        motivated by converting various users; again, they're not in use in
        today's kernel, so changing them has a low probability of introducing
        a bug.
      
        Dan can still trigger a bug in the DAX code with hot-offline/online,
        and we're working on tracking that down"
      
      * tag 'xarray-4.20-rc4' of git://git.infradead.org/users/willy/linux-dax:
        XArray tests: Add missing locking
        dax: Avoid losing wakeup in dax_lock_mapping_entry
        dax: Fix huge page faults
        dax: Fix dax_unlock_mapping_entry for PMD pages
        dax: Reinstate RCU protection of inode
        dax: Make sure the unlocking entry isn't locked
        dax: Remove optimisation from dax_lock_mapping_entry
        XArray tests: Correct some 64-bit assumptions
        XArray: Correct xa_store_range
        XArray: Fix Documentation
        XArray: Handle NULL pointers differently for allocation
        XArray: Unify xa_store and __xa_store
        XArray: Add xa_store_bh() and xa_store_irq()
        XArray: Turn xa_erase into an exported function
        XArray: Unify xa_cmpxchg and __xa_cmpxchg
        XArray: Regularise xa_reserve
        nilfs2: Use xa_erase_irq
        XArray: Export __xa_foo to non-GPL modules
        XArray: Fix xa_for_each with a single element at 0
      e2125dac
  5. 24 Nov, 2018 10 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · e195ca6c
      Linus Torvalds authored
      Pull HID fixes from Jiri Kosina:
      
       - revert of the high-resolution scrolling feature, as it breaks certain
         hardware due to incompatibilities between Logitech and Microsoft
         worlds. Peter Hutterer is working on a fixed implementation. Until
         that is finished, revert by Benjamin Tissoires.
      
       - revert of incorrect strncpy->strlcpy conversion in uhid, from David
         Herrmann
      
       - fix for buggy sendfile() implementation on uhid device node, from
         Eric Biggers
      
       - a few assorted device-ID specific quirks
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        Revert "Input: Add the `REL_WHEEL_HI_RES` event code"
        Revert "HID: input: Create a utility class for counting scroll events"
        Revert "HID: logitech: Add function to enable HID++ 1.0 "scrolling acceleration""
        Revert "HID: logitech: Enable high-resolution scrolling on Logitech mice"
        Revert "HID: logitech: Use LDJ_DEVICE macro for existing Logitech mice"
        Revert "HID: logitech: fix a used uninitialized GCC warning"
        Revert "HID: input: simplify/fix high-res scroll event handling"
        HID: Add quirk for Primax PIXART OEM mice
        HID: i2c-hid: Disable runtime PM for LG touchscreen
        HID: multitouch: Add pointstick support for Cirque Touchpad
        HID: steam: remove input device when a hid client is running.
        Revert "HID: uhid: use strlcpy() instead of strncpy()"
        HID: uhid: forbid UHID_CREATE under KERNEL_DS or elevated privileges
        HID: input: Ignore battery reported by Symbol DS4308
        HID: Add quirk for Microsoft PIXART OEM mouse
      e195ca6c
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · d146194f
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas::
      
       - Fix wrong conflict resolution around CONFIG_ARM64_SSBD
      
       - Fix sparse warning on unsigned long constant
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: cpufeature: Fix mismerge of CONFIG_ARM64_SSBD block
        arm64: sysreg: fix sparse warnings
      d146194f
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 857fa628
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Need to take mutex in ath9k_add_interface(), from Dan Carpenter.
      
       2) Fix mt76 build without CONFIG_LEDS_CLASS, from Arnd Bergmann.
      
       3) Fix socket wmem accounting in SCTP, from Xin Long.
      
       4) Fix failed resume crash in ena driver, from Arthur Kiyanovski.
      
       5) qed driver passes bytes instead of bits into second arg of
          bitmap_weight(). From Denis Bolotin.
      
       6) Fix reset deadlock in ibmvnic, from Juliet Kim.
      
       7) skb_scrube_packet() needs to scrub the fwd marks too, from Petr
          Machata.
      
       8) Make sure older TCP stacks see enough dup ACKs, and avoid doing SACK
          compression during this period, from Eric Dumazet.
      
       9) Add atomicity to SMC protocol cursor handling, from Ursula Braun.
      
      10) Don't leave dangling error pointer if bpf_prog_add() fails in
          thunderx driver, from Lorenzo Bianconi. Also, when we unmap TSO
          headers, set sq->tso_hdrs to NULL.
      
      11) Fix race condition over state variables in act_police, from Davide
          Caratti.
      
      12) Disable guest csum in the presence of XDP in virtio_net, from Jason
          Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits)
        net: gemini: Fix copy/paste error
        net: phy: mscc: fix deadlock in vsc85xx_default_config
        dt-bindings: dsa: Fix typo in "probed"
        net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue
        net: amd: add missing of_node_put()
        team: no need to do team_notify_peers or team_mcast_rejoin when disabling port
        virtio-net: fail XDP set if guest csum is negotiated
        virtio-net: disable guest csum during XDP set
        net/sched: act_police: add missing spinlock initialization
        net: don't keep lonely packets forever in the gro hash
        net/ipv6: re-do dad when interface has IFF_NOARP flag change
        packet: copy user buffers before orphan or clone
        ibmvnic: Update driver queues after change in ring size support
        ibmvnic: Fix RX queue buffer cleanup
        net: thunderx: set xdp_prog to NULL if bpf_prog_add fails
        net/dim: Update DIM start sample after each DIM iteration
        net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts
        net/smc: use after free fix in smc_wr_tx_put_slot()
        net/smc: atomic SMCD cursor handling
        net/smc: add SMC-D shutdown signal
        ...
      857fa628
    • Linus Torvalds's avatar
      Merge tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · abe72ff4
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "Dave and I have continued our work fixing corruption problems that can
        be found when running long-term burn-in exercisers on xfs. Here are
        some patches fixing most of the problems, but there will likely be
        more. :/
      
         - Numerous corruption fixes for copy on write
      
         - Numerous corruption fixes for blocksize < pagesize writes
      
         - Don't miscalculate AG reservations for small final AGs
      
         - Fix page cache truncation to work properly for reflink and extent
           shifting
      
         - Fix use-after-free when retrying failed inode/dquot buffer logging
      
         - Fix corruptions seen when using copy_file_range in directio mode"
      
      * tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: readpages doesn't zero page tail beyond EOF
        vfs: vfs_dedupe_file_range() doesn't return EOPNOTSUPP
        iomap: dio data corruption and spurious errors when pipes fill
        iomap: sub-block dio needs to zeroout beyond EOF
        iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents
        xfs: delalloc -> unwritten COW fork allocation can go wrong
        xfs: flush removing page cache in xfs_reflink_remap_prep
        xfs: extent shifting doesn't fully invalidate page cache
        xfs: finobt AG reserves don't consider last AG can be a runt
        xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers
        xfs: uncached buffer tracing needs to print bno
        xfs: make xfs_file_remap_range() static
        xfs: fix shared extent data corruption due to missing cow reservation
      abe72ff4
    • Andreas Fiedler's avatar
      net: gemini: Fix copy/paste error · 07093b76
      Andreas Fiedler authored
      The TX stats should be started with the tx_stats_syncp,
      there seems to be a copy/paste error in the driver.
      Signed-off-by: default avatarAndreas Fiedler <andreas.fiedler@gmx.net>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07093b76
    • Quentin Schulz's avatar
      net: phy: mscc: fix deadlock in vsc85xx_default_config · 3fa528b7
      Quentin Schulz authored
      The vsc85xx_default_config function called in the vsc85xx_config_init
      function which is used by VSC8530, VSC8531, VSC8540 and VSC8541 PHYs
      mistakenly calls phy_read and phy_write in-between phy_select_page and
      phy_restore_page.
      
      phy_select_page and phy_restore_page actually take and release the MDIO
      bus lock and phy_write and phy_read take and release the lock to write
      or read to a PHY register.
      
      Let's fix this deadlock by using phy_modify_paged which handles
      correctly a read followed by a write in a non-standard page.
      
      Fixes: 6a0bfbbe ("net: phy: mscc: migrate to phy_select/restore_page functions")
      Signed-off-by: default avatarQuentin Schulz <quentin.schulz@bootlin.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3fa528b7
    • Fabio Estevam's avatar
      dt-bindings: dsa: Fix typo in "probed" · e7b9fb4f
      Fabio Estevam authored
      The correct form is "can be probed", so fix the typo.
      Signed-off-by: default avatarFabio Estevam <festevam@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7b9fb4f
    • Lorenzo Bianconi's avatar
      net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue · ef2a7cf1
      Lorenzo Bianconi authored
      Reset snd_queue tso_hdrs pointer to NULL in nicvf_free_snd_queue routine
      since it is used to check if tso dma descriptor queue has been previously
      allocated. The issue can be triggered with the following reproducer:
      
      $ip link set dev enP2p1s0v0 xdpdrv obj xdp_dummy.o
      $ip link set dev enP2p1s0v0 xdpdrv off
      
      [  341.467649] WARNING: CPU: 74 PID: 2158 at mm/vmalloc.c:1511 __vunmap+0x98/0xe0
      [  341.515010] Hardware name: GIGABYTE H270-T70/MT70-HD0, BIOS T49 02/02/2018
      [  341.521874] pstate: 60400005 (nZCv daif +PAN -UAO)
      [  341.526654] pc : __vunmap+0x98/0xe0
      [  341.530132] lr : __vunmap+0x98/0xe0
      [  341.533609] sp : ffff00001c5db860
      [  341.536913] x29: ffff00001c5db860 x28: 0000000000020000
      [  341.542214] x27: ffff810feb5090b0 x26: ffff000017e57000
      [  341.547515] x25: 0000000000000000 x24: 00000000fbd00000
      [  341.552816] x23: 0000000000000000 x22: ffff810feb5090b0
      [  341.558117] x21: 0000000000000000 x20: 0000000000000000
      [  341.563418] x19: ffff000017e57000 x18: 0000000000000000
      [  341.568719] x17: 0000000000000000 x16: 0000000000000000
      [  341.574020] x15: 0000000000000010 x14: ffffffffffffffff
      [  341.579321] x13: ffff00008985eb27 x12: ffff00000985eb2f
      [  341.584622] x11: ffff0000096b3000 x10: ffff00001c5db510
      [  341.589923] x9 : 00000000ffffffd0 x8 : ffff0000086868e8
      [  341.595224] x7 : 3430303030303030 x6 : 00000000000006ef
      [  341.600525] x5 : 00000000003fffff x4 : 0000000000000000
      [  341.605825] x3 : 0000000000000000 x2 : ffffffffffffffff
      [  341.611126] x1 : ffff0000096b3728 x0 : 0000000000000038
      [  341.616428] Call trace:
      [  341.618866]  __vunmap+0x98/0xe0
      [  341.621997]  vunmap+0x3c/0x50
      [  341.624961]  arch_dma_free+0x68/0xa0
      [  341.628534]  dma_direct_free+0x50/0x80
      [  341.632285]  nicvf_free_resources+0x160/0x2d8 [nicvf]
      [  341.637327]  nicvf_config_data_transfer+0x174/0x5e8 [nicvf]
      [  341.642890]  nicvf_stop+0x298/0x340 [nicvf]
      [  341.647066]  __dev_close_many+0x9c/0x108
      [  341.650977]  dev_close_many+0xa4/0x158
      [  341.654720]  rollback_registered_many+0x140/0x530
      [  341.659414]  rollback_registered+0x54/0x80
      [  341.663499]  unregister_netdevice_queue+0x9c/0xe8
      [  341.668192]  unregister_netdev+0x28/0x38
      [  341.672106]  nicvf_remove+0xa4/0xa8 [nicvf]
      [  341.676280]  nicvf_shutdown+0x20/0x30 [nicvf]
      [  341.680630]  pci_device_shutdown+0x44/0x88
      [  341.684720]  device_shutdown+0x144/0x250
      [  341.688640]  kernel_restart_prepare+0x44/0x50
      [  341.692986]  kernel_restart+0x20/0x68
      [  341.696638]  __se_sys_reboot+0x210/0x238
      [  341.700550]  __arm64_sys_reboot+0x24/0x30
      [  341.704555]  el0_svc_handler+0x94/0x110
      [  341.708382]  el0_svc+0x8/0xc
      [  341.711252] ---[ end trace 3f4019c8439959c9 ]---
      [  341.715874] page:ffff7e0003ef4000 count:0 mapcount:0 mapping:0000000000000000 index:0x4
      [  341.723872] flags: 0x1fffe000000000()
      [  341.727527] raw: 001fffe000000000 ffff7e0003f1a008 ffff7e0003ef4048 0000000000000000
      [  341.735263] raw: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
      [  341.742994] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
      
      where xdp_dummy.c is a simple bpf program that forwards the incoming
      frames to the network stack (available here:
      https://github.com/altoor/xdp_walkthrough_examples/blob/master/sample_1/xdp_dummy.c)
      
      Fixes: 05c773f5 ("net: thunderx: Add basic XDP support")
      Fixes: 4863dea3 ("net: Adding support for Cavium ThunderX network controller")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef2a7cf1
    • Yangtao Li's avatar
      net: amd: add missing of_node_put() · c44c749d
      Yangtao Li authored
      of_find_node_by_path() acquires a reference to the node
      returned by it and that reference needs to be dropped by its caller.
      This place doesn't do that, so fix it.
      Signed-off-by: default avatarYangtao Li <tiny.windzz@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c44c749d
    • Hangbin Liu's avatar
      team: no need to do team_notify_peers or team_mcast_rejoin when disabling port · 5ed9dc99
      Hangbin Liu authored
      team_notify_peers() will send ARP and NA to notify peers. team_mcast_rejoin()
      will send multicast join group message to notify peers. We should do this when
      enabling/changed to a new port. But it doesn't make sense to do it when a port
      is disabled.
      
      On the other hand, when we set mcast_rejoin_count to 2, and do a failover,
      team_port_disable() will increase mcast_rejoin.count_pending to 2 and then
      team_port_enable() will increase mcast_rejoin.count_pending to 4. We will send
      4 mcast rejoin messages at latest, which will make user confused. The same
      with notify_peers.count.
      
      Fix it by deleting team_notify_peers() and team_mcast_rejoin() in
      team_port_disable().
      Reported-by: default avatarLiang Li <liali@redhat.com>
      Fixes: fc423ff0 ("team: add peer notification")
      Fixes: 492b200e ("team: add support for sending multicast rejoins")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ed9dc99
  6. 23 Nov, 2018 6 commits
    • Jason Wang's avatar
      virtio-net: fail XDP set if guest csum is negotiated · 18ba58e1
      Jason Wang authored
      We don't support partial csumed packet since its metadata will be lost
      or incorrect during XDP processing. So fail the XDP set if guest_csum
      feature is negotiated.
      
      Fixes: f600b690 ("virtio_net: Add XDP support")
      Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Pavel Popa <pashinho1990@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18ba58e1
    • Jason Wang's avatar
      virtio-net: disable guest csum during XDP set · e59ff2c4
      Jason Wang authored
      We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we
      can receive partial csumed packets with metadata kept in the
      vnet_hdr. This may have several side effects:
      
      - It could be overridden by header adjustment, thus is might be not
        correct after XDP processing.
      - There's no way to pass such metadata information through
        XDP_REDIRECT to another driver.
      - XDP does not support checksum offload right now.
      
      So simply disable guest csum if possible in this the case of XDP.
      
      Fixes: 3f93522f ("virtio-net: switch off offloads on demand if possible on XDP set")
      Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Pavel Popa <pashinho1990@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e59ff2c4
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-4.20-rc4' of https://github.com/ceph/ceph-client · 7c98a426
      Linus Torvalds authored
      Pullk ceph fix from Ilya Dryomov:
       "A messenger fix, marked for stable"
      
      * tag 'ceph-for-4.20-rc4' of https://github.com/ceph/ceph-client:
        libceph: fall back to sendmsg for slab pages
      7c98a426
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20181123' of git://git.kernel.dk/linux-block · 3381918f
      Linus Torvalds authored
      Pull block fix from Jens Axboe:
       "Just a single fix for this week, fixing an issue with nvme-fc"
      
      * tag 'for-linus-20181123' of git://git.kernel.dk/linux-block:
        nvme-fc: resolve io failures during connect
      3381918f
    • Davide Caratti's avatar
      net/sched: act_police: add missing spinlock initialization · 484afd1b
      Davide Caratti authored
      commit f2cbd485 ("net/sched: act_police: fix race condition on state
      variables") introduces a new spinlock, but forgets its initialization.
      Ensure that tcf_police_init() initializes 'tcfp_lock' every time a 'police'
      action is newly created, to avoid the following lockdep splat:
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       <...>
       Call Trace:
        dump_stack+0x85/0xcb
        register_lock_class+0x581/0x590
        __lock_acquire+0xd4/0x1330
        ? tcf_police_init+0x2fa/0x650 [act_police]
        ? lock_acquire+0x9e/0x1a0
        lock_acquire+0x9e/0x1a0
        ? tcf_police_init+0x2fa/0x650 [act_police]
        ? tcf_police_init+0x55a/0x650 [act_police]
        _raw_spin_lock_bh+0x34/0x40
        ? tcf_police_init+0x2fa/0x650 [act_police]
        tcf_police_init+0x2fa/0x650 [act_police]
        tcf_action_init_1+0x384/0x4c0
        tcf_action_init+0xf6/0x160
        tcf_action_add+0x73/0x170
        tc_ctl_action+0x122/0x160
        rtnetlink_rcv_msg+0x2a4/0x490
        ? netlink_deliver_tap+0x99/0x400
        ? validate_linkmsg+0x370/0x370
        netlink_rcv_skb+0x4d/0x130
        netlink_unicast+0x196/0x230
        netlink_sendmsg+0x2e5/0x3e0
        sock_sendmsg+0x36/0x40
        ___sys_sendmsg+0x280/0x2f0
        ? _raw_spin_unlock+0x24/0x30
        ? handle_pte_fault+0xafe/0xf30
        ? find_held_lock+0x2d/0x90
        ? syscall_trace_enter+0x1df/0x360
        ? __sys_sendmsg+0x5e/0xa0
        __sys_sendmsg+0x5e/0xa0
        do_syscall_64+0x60/0x210
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
       RIP: 0033:0x7f1841c7cf10
       Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
       RSP: 002b:00007ffcf9df4d68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1841c7cf10
       RDX: 0000000000000000 RSI: 00007ffcf9df4dc0 RDI: 0000000000000003
       RBP: 000000005bf56105 R08: 0000000000000002 R09: 00007ffcf9df8edc
       R10: 00007ffcf9df47e0 R11: 0000000000000246 R12: 0000000000671be0
       R13: 00007ffcf9df4e84 R14: 0000000000000008 R15: 0000000000000000
      
      Fixes: f2cbd485 ("net/sched: act_police: fix race condition on state variables")
      Reported-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      484afd1b
    • Paolo Abeni's avatar
      net: don't keep lonely packets forever in the gro hash · 605108ac
      Paolo Abeni authored
      Eric noted that with UDP GRO and NAPI timeout, we could keep a single
      UDP packet inside the GRO hash forever, if the related NAPI instance
      calls napi_gro_complete() at an higher frequency than the NAPI timeout.
      Willem noted that even TCP packets could be trapped there, till the
      next retransmission.
      This patch tries to address the issue, flushing the old packets -
      those with a NAPI_GRO_CB age before the current jiffy - before scheduling
      the NAPI timeout. The rationale is that such a timeout should be
      well below a jiffy and we are not flushing packets eligible for sane GRO.
      
      v1  -> v2:
       - clarified the commit message and comment
      
      RFC -> v1:
       - added 'Fixes tags', cleaned-up the wording.
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Fixes: 3b47d303 ("net: gro: add a per device gro flush timer")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      605108ac