1. 06 Apr, 2019 2 commits
    • Linus Torvalds's avatar
      Merge tag 'for-5.1/dm-fixes' of... · 4f1cbe07
      Linus Torvalds authored
      Merge tag 'for-5.1/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
      
       - Two queue_limits stacking fixes: disable discards if underlying
         driver does. And propagate BDI_CAP_STABLE_WRITES to fix sporadic
         checksum errors.
      
       - Fix that reverts a DM core limit that wasn't needed given that
         dm-crypt was already updated to impose an equivalent limit.
      
       - Fix dm-init to properly establish 'const' for __initconst array.
      
       - Fix deadlock in DM integrity target that occurs when overlapping IO
         is being issued to it. And two smaller fixes to the DM integrity
         target.
      
      * tag 'for-5.1/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm integrity: fix deadlock with overlapping I/O
        dm: disable DISCARD if the underlying storage no longer supports it
        dm table: propagate BDI_CAP_STABLE_WRITES to fix sporadic checksum errors
        dm: revert 8f50e358 ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE")
        dm init: fix const confusion for dm_allowed_targets array
        dm integrity: make dm_integrity_init and dm_integrity_exit static
        dm integrity: change memcmp to strncmp in dm_integrity_ctr
      4f1cbe07
    • Linus Torvalds's avatar
      Merge tag 'vfio-v5.1-rc4' of git://github.com/awilliam/linux-vfio · 3e28fb0f
      Linus Torvalds authored
      Pull VFIO fixes from Alex Williamson:
      
       - Fix clang printk format errors (Louis Taylor)
      
       - Declare structure static to fix sparse warning (Wang Hai)
      
       - Limit user DMA mappings per container (CVE-2019-3882) (Alex
         Williamson)
      
      * tag 'vfio-v5.1-rc4' of git://github.com/awilliam/linux-vfio:
        vfio/type1: Limit DMA mappings per container
        vfio/spapr_tce: Make symbol 'tce_iommu_driver_ops' static
        vfio/pci: use correct format characters
      3e28fb0f
  2. 05 Apr, 2019 25 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · bc5725f9
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "x86 fixes for overflows and other nastiness"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: nVMX: fix x2APIC VTPR read intercept
        KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887)
        KVM: SVM: prevent DBG_DECRYPT and DBG_ENCRYPT overflow
        kvm: svm: fix potential get_num_contig_pages overflow
      bc5725f9
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 2f9e10ac
      Linus Torvalds authored
      Pull arm64 fix from Catalin Marinas:
       "Fix unwind_frame() in the context of pseudo NMI"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: fix wrong check of on_sdei_stack in nmi context
      2f9e10ac
    • Linus Torvalds's avatar
      Merge tag 'trace-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 970b766c
      Linus Torvalds authored
      Pull syscall-get-arguments cleanup and fixes from Steven Rostedt:
       "Andy Lutomirski approached me to tell me that the
        syscall_get_arguments() implementation in x86 was horrible and gcc
        certainly gets it wrong.
      
        He said that since the tracepoints only pass in 0 and 6 for i and n
        repectively, it should be optimized for that case. Inspecting the
        kernel, I discovered that all users pass in 0 for i and only one file
        passing in something other than 6 for the number of arguments. That
        code happens to be my own code used for the special syscall tracing.
      
        That can easily be converted to just using 0 and 6 as well, and only
        copying what is needed. Which is probably the faster path anyway for
        that case.
      
        Along the way, a couple of real fixes came from this as the
        syscall_get_arguments() function was incorrect for csky and riscv.
      
        x86 has been optimized to for the new interface that removes the
        variable number of arguments, but the other architectures could still
        use some loving and take more advantage of the simpler interface"
      
      * tag 'trace-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        syscalls: Remove start and number from syscall_set_arguments() args
        syscalls: Remove start and number from syscall_get_arguments() args
        csky: Fix syscall_get_arguments() and syscall_set_arguments()
        riscv: Fix syscall_get_arguments() and syscall_set_arguments()
        tracing/syscalls: Pass in hardcoded 6 into syscall_get_arguments()
        ptrace: Remove maxargs from task_current_syscall()
      970b766c
    • Mikulas Patocka's avatar
      dm integrity: fix deadlock with overlapping I/O · 4ed319c6
      Mikulas Patocka authored
      dm-integrity will deadlock if overlapping I/O is issued to it, the bug
      was introduced by commit 724376a0 ("dm integrity: implement fair
      range locks").  Users rarely use overlapping I/O so this bug went
      undetected until now.
      
      Fix this bug by correcting, likely cut-n-paste, typos in
      ranges_overlap() and also remove a flawed ranges_overlap() check in
      remove_range_unlocked().  This condition could leave unprocessed bios
      hanging on wait_list forever.
      
      Cc: stable@vger.kernel.org # v4.19+
      Fixes: 724376a0 ("dm integrity: implement fair range locks")
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      4ed319c6
    • Marc Orr's avatar
      KVM: x86: nVMX: fix x2APIC VTPR read intercept · c73f4c99
      Marc Orr authored
      Referring to the "VIRTUALIZING MSR-BASED APIC ACCESSES" chapter of the
      SDM, when "virtualize x2APIC mode" is 1 and "APIC-register
      virtualization" is 0, a RDMSR of 808H should return the VTPR from the
      virtual APIC page.
      
      However, for nested, KVM currently fails to disable the read intercept
      for this MSR. This means that a RDMSR exit takes precedence over
      "virtualize x2APIC mode", and KVM passes through L1's TPR to L2,
      instead of sourcing the value from L2's virtual APIC page.
      
      This patch fixes the issue by disabling the read intercept, in VMCS02,
      for the VTPR when "APIC-register virtualization" is 0.
      
      The issue described above and fix prescribed here, were verified with
      a related patch in kvm-unit-tests titled "Test VMX's virtualize x2APIC
      mode w/ nested".
      Signed-off-by: default avatarMarc Orr <marcorr@google.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Fixes: c992384b ("KVM: vmx: speed up MSR bitmap merge")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c73f4c99
    • Marc Orr's avatar
      KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887) · acff7847
      Marc Orr authored
      The nested_vmx_prepare_msr_bitmap() function doesn't directly guard the
      x2APIC MSR intercepts with the "virtualize x2APIC mode" MSR. As a
      result, we discovered the potential for a buggy or malicious L1 to get
      access to L0's x2APIC MSRs, via an L2, as follows.
      
      1. L1 executes WRMSR(IA32_SPEC_CTRL, 1). This causes the spec_ctrl
      variable, in nested_vmx_prepare_msr_bitmap() to become true.
      2. L1 disables "virtualize x2APIC mode" in VMCS12.
      3. L1 enables "APIC-register virtualization" in VMCS12.
      
      Now, KVM will set VMCS02's x2APIC MSR intercepts from VMCS12, and then
      set "virtualize x2APIC mode" to 0 in VMCS02. Oops.
      
      This patch closes the leak by explicitly guarding VMCS02's x2APIC MSR
      intercepts with VMCS12's "virtualize x2APIC mode" control.
      
      The scenario outlined above and fix prescribed here, were verified with
      a related patch in kvm-unit-tests titled "Add leak scenario to
      virt_x2apic_mode_test".
      
      Note, it looks like this issue may have been introduced inadvertently
      during a merge---see 15303ba5.
      Signed-off-by: default avatarMarc Orr <marcorr@google.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      acff7847
    • David Rientjes's avatar
      KVM: SVM: prevent DBG_DECRYPT and DBG_ENCRYPT overflow · b86bc285
      David Rientjes authored
      This ensures that the address and length provided to DBG_DECRYPT and
      DBG_ENCRYPT do not cause an overflow.
      
      At the same time, pass the actual number of pages pinned in memory to
      sev_unpin_memory() as a cleanup.
      Reported-by: default avatarCfir Cohen <cfir@google.com>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b86bc285
    • David Rientjes's avatar
      kvm: svm: fix potential get_num_contig_pages overflow · ede885ec
      David Rientjes authored
      get_num_contig_pages() could potentially overflow int so make its type
      consistent with its usage.
      Reported-by: default avatarCfir Cohen <cfir@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ede885ec
    • Linus Torvalds's avatar
      Merge tag 'mm-compaction-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux · 7f46774c
      Linus Torvalds authored
      Pull mm/compaction fixes from Mel Gorman:
       "The merge window for 5.1 introduced a number of compaction-related
        patches. with intermittent reports of corruption and functional
        issues. The bugs are due to sloopy checking of zone boundaries and a
        corner case where invalid indexes are used to access the free lists.
      
        Reports are not common but at least two users and 0-day have tripped
        over them. There is a chance that one of the syzbot reports are
        related but it has not been confirmed properly.
      
        The normal submission path is with Andrew but there have been some
        delays and I consider them urgent enough that they should be picked up
        before RC4 to avoid duplicate reports.
      
        All of these have been successfully tested on older RC windows. This
        will make this branch look like a rebase but in fact, they've simply
        been lifted again from Andrew's tree and placed on a fresh branch.
        I've no reason to believe that this has invalidated the testing given
        the lack of change in compaction and the nature of the fixes"
      
      * tag 'mm-compaction-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux:
        mm/compaction.c: abort search if isolation fails
        mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints
      7f46774c
    • Greg Kroah-Hartman's avatar
      tty: mark Siemens R3964 line discipline as BROKEN · c7084edc
      Greg Kroah-Hartman authored
      The n_r3964 line discipline driver was written in a different time, when
      SMP machines were rare, and users were trusted to do the right thing.
      Since then, the world has moved on but not this code, it has stayed
      rooted in the past with its lovely hand-crafted list structures and
      loads of "interesting" race conditions all over the place.
      
      After attempting to clean up most of the issues, I just gave up and am
      now marking the driver as BROKEN so that hopefully someone who has this
      hardware will show up out of the woodwork (I know you are out there!)
      and will help with debugging a raft of changes that I had laying around
      for the code, but was too afraid to commit as odds are they would break
      things.
      
      Many thanks to Jann and Linus for pointing out the initial problems in
      this codebase, as well as many reviews of my attempts to fix the issues.
      It was a case of whack-a-mole, and as you can see, the mole won.
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c7084edc
    • Steven Rostedt (VMware)'s avatar
      syscalls: Remove start and number from syscall_set_arguments() args · 32d92586
      Steven Rostedt (VMware) authored
      After removing the start and count arguments of syscall_get_arguments() it
      seems reasonable to remove them from syscall_set_arguments(). Note, as of
      today, there are no users of syscall_set_arguments(). But we are told that
      there will be soon. But for now, at least make it consistent with
      syscall_get_arguments().
      
      Link: http://lkml.kernel.org/r/20190327222014.GA32540@altlinux.org
      
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Dave Martin <dave.martin@arm.com>
      Cc: "Dmitry V. Levin" <ldv@altlinux.org>
      Cc: x86@kernel.org
      Cc: linux-snps-arc@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-c6x-dev@linux-c6x.org
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      Cc: linux-hexagon@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-mips@vger.kernel.org
      Cc: nios2-dev@lists.rocketboards.org
      Cc: openrisc@lists.librecores.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-riscv@lists.infradead.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Cc: linux-um@lists.infradead.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: linux-arch@vger.kernel.org
      Acked-by: Max Filippov <jcmvbkbc@gmail.com> # For xtensa changes
      Acked-by: Will Deacon <will.deacon@arm.com> # For the arm64 bits
      Reviewed-by: Thomas Gleixner <tglx@linutronix.de> # for x86
      Reviewed-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      32d92586
    • Steven Rostedt (Red Hat)'s avatar
      syscalls: Remove start and number from syscall_get_arguments() args · b35f549d
      Steven Rostedt (Red Hat) authored
      At Linux Plumbers, Andy Lutomirski approached me and pointed out that the
      function call syscall_get_arguments() implemented in x86 was horribly
      written and not optimized for the standard case of passing in 0 and 6 for
      the starting index and the number of system calls to get. When looking at
      all the users of this function, I discovered that all instances pass in only
      0 and 6 for these arguments. Instead of having this function handle
      different cases that are never used, simply rewrite it to return the first 6
      arguments of a system call.
      
      This should help out the performance of tracing system calls by ptrace,
      ftrace and perf.
      
      Link: http://lkml.kernel.org/r/20161107213233.754809394@goodmis.org
      
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Dave Martin <dave.martin@arm.com>
      Cc: "Dmitry V. Levin" <ldv@altlinux.org>
      Cc: x86@kernel.org
      Cc: linux-snps-arc@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-c6x-dev@linux-c6x.org
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      Cc: linux-hexagon@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-mips@vger.kernel.org
      Cc: nios2-dev@lists.rocketboards.org
      Cc: openrisc@lists.librecores.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-riscv@lists.infradead.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Cc: linux-um@lists.infradead.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: linux-arch@vger.kernel.org
      Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts
      Acked-by: Max Filippov <jcmvbkbc@gmail.com> # For xtensa changes
      Acked-by: Will Deacon <will.deacon@arm.com> # For the arm64 bits
      Reviewed-by: Thomas Gleixner <tglx@linutronix.de> # for x86
      Reviewed-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      b35f549d
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2019-04-05' of git://anongit.freedesktop.org/drm/drm · ea2cec24
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Pretty quiet week, just some amdgpu and i915 fixes.
      
        i915:
         - deadlock fix
         - gvt fixes
      
        amdgpu:
         - PCIE dpm feature fix
         - Powerplay fixes"
      
      * tag 'drm-fixes-2019-04-05' of git://anongit.freedesktop.org/drm/drm:
        drm/i915/gvt: Fix kerneldoc typo for intel_vgpu_emulate_hotplug
        drm/i915/gvt: Correct the calculation of plane size
        drm/amdgpu: remove unnecessary rlc reset function on gfx9
        drm/i915: Always backoff after a drm_modeset_lock() deadlock
        drm/i915/gvt: do not let pin count of shadow mm go negative
        drm/i915/gvt: do not deliver a workload if its creation fails
        drm/amd/display: VBIOS can't be light up HDMI when restart system
        drm/amd/powerplay: fix possible hang with 3+ 4K monitors
        drm/amd/powerplay: correct data type to avoid overflow
        drm/amd/powerplay: add ECC feature bit
        drm/amd/amdgpu: fix PCIe dpm feature issue (v3)
      ea2cec24
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 0548740e
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Several hash table refcount fixes in batman-adv, from Sven
          Eckelmann.
      
       2) Use after free in bpf_evict_inode(), from Daniel Borkmann.
      
       3) Fix mdio bus registration in ixgbe, from Ivan Vecera.
      
       4) Unbounded loop in __skb_try_recv_datagram(), from Paolo Abeni.
      
       5) ila rhashtable corruption fix from Herbert Xu.
      
       6) Don't allow upper-devices to be added to vrf devices, from Sabrina
          Dubroca.
      
       7) Add qmi_wwan device ID for Olicard 600, from Bjørn Mork.
      
       8) Don't leave skb->next poisoned in __netif_receive_skb_list_ptype,
          from Alexander Lobakin.
      
       9) Missing IDR checks in mlx5 driver, from Aditya Pakki.
      
      10) Fix false connection termination in ktls, from Jakub Kicinski.
      
      11) Work around some ASPM issues with r8169 by disabling rx interrupt
          coalescing on certain chips. From Heiner Kallweit.
      
      12) Properly use per-cpu qstat values on NOLOCK qdiscs, from Paolo
          Abeni.
      
      13) Fully initialize sockaddr_in structures in SCTP, from Xin Long.
      
      14) Various BPF flow dissector fixes from Stanislav Fomichev.
      
      15) Divide by zero in act_sample, from Davide Caratti.
      
      16) Fix bridging multicast regression introduced by rhashtable
          conversion, from Nikolay Aleksandrov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
        ibmvnic: Fix completion structure initialization
        ipv6: sit: reset ip header pointer in ipip6_rcv
        net: bridge: always clear mcast matching struct on reports and leaves
        libcxgb: fix incorrect ppmax calculation
        vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x
        sch_cake: Make sure we can write the IP header before changing DSCP bits
        sch_cake: Use tc_skb_protocol() helper for getting packet protocol
        tcp: Ensure DCTCP reacts to losses
        net/sched: act_sample: fix divide by zero in the traffic path
        net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop
        net: hns: Fix sparse: some warnings in HNS drivers
        net: hns: Fix WARNING when remove HNS driver with SMMU enabled
        net: hns: fix ICMP6 neighbor solicitation messages discard problem
        net: hns: Fix probabilistic memory overwrite when HNS driver initialized
        net: hns: Use NAPI_POLL_WEIGHT for hns driver
        net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()
        flow_dissector: rst'ify documentation
        ipv6: Fix dangling pointer when ipv6 fragment
        net-gro: Fix GRO flush when receiving a GSO packet.
        flow_dissector: document BPF flow dissector environment
        ...
      0548740e
    • Thomas Falcon's avatar
      ibmvnic: Fix completion structure initialization · bbd669a8
      Thomas Falcon authored
      Fix device initialization completion handling for vNIC adapters.
      Initialize the completion structure on probe and reinitialize when needed.
      This also fixes a race condition during kdump where the driver can attempt
      to access the completion struct before it is initialized:
      
      Unable to handle kernel paging request for data at address 0x00000000
      Faulting instruction address: 0xc0000000081acbe0
      Oops: Kernel access of bad area, sig: 11 [#1]
      LE SMP NR_CPUS=2048 NUMA pSeries
      Modules linked in: ibmvnic(+) ibmveth sunrpc overlay squashfs loop
      CPU: 19 PID: 301 Comm: systemd-udevd Not tainted 4.18.0-64.el8.ppc64le #1
      NIP:  c0000000081acbe0 LR: c0000000081ad964 CTR: c0000000081ad900
      REGS: c000000027f3f990 TRAP: 0300   Not tainted  (4.18.0-64.el8.ppc64le)
      MSR:  800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28228288  XER: 00000006
      CFAR: c000000008008934 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1
      GPR00: c0000000081ad964 c000000027f3fc10 c0000000095b5800 c0000000221b4e58
      GPR04: 0000000000000003 0000000000000001 000049a086918581 00000000000000d4
      GPR08: 0000000000000007 0000000000000000 ffffffffffffffe8 d0000000014dde28
      GPR12: c0000000081ad900 c000000009a00c00 0000000000000001 0000000000000100
      GPR16: 0000000000000038 0000000000000007 c0000000095e2230 0000000000000006
      GPR20: 0000000000400140 0000000000000001 c00000000910c880 0000000000000000
      GPR24: 0000000000000000 0000000000000006 0000000000000000 0000000000000003
      GPR28: 0000000000000001 0000000000000001 c0000000221b4e60 c0000000221b4e58
      NIP [c0000000081acbe0] __wake_up_locked+0x50/0x100
      LR [c0000000081ad964] complete+0x64/0xa0
      Call Trace:
      [c000000027f3fc10] [c000000027f3fc60] 0xc000000027f3fc60 (unreliable)
      [c000000027f3fc60] [c0000000081ad964] complete+0x64/0xa0
      [c000000027f3fca0] [d0000000014dad58] ibmvnic_handle_crq+0xce0/0x1160 [ibmvnic]
      [c000000027f3fd50] [d0000000014db270] ibmvnic_tasklet+0x98/0x130 [ibmvnic]
      [c000000027f3fda0] [c00000000813f334] tasklet_action_common.isra.3+0xc4/0x1a0
      [c000000027f3fe00] [c000000008cd13f4] __do_softirq+0x164/0x400
      [c000000027f3fef0] [c00000000813ed64] irq_exit+0x184/0x1c0
      [c000000027f3ff20] [c0000000080188e8] __do_irq+0xb8/0x210
      [c000000027f3ff90] [c00000000802d0a4] call_do_irq+0x14/0x24
      [c000000026a5b010] [c000000008018adc] do_IRQ+0x9c/0x130
      [c000000026a5b060] [c000000008008ce4] hardware_interrupt_common+0x114/0x120
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbd669a8
    • Lorenzo Bianconi's avatar
      ipv6: sit: reset ip header pointer in ipip6_rcv · bb9bd814
      Lorenzo Bianconi authored
      ipip6 tunnels run iptunnel_pull_header on received skbs. This can
      determine the following use-after-free accessing iph pointer since
      the packet will be 'uncloned' running pskb_expand_head if it is a
      cloned gso skb (e.g if the packet has been sent though a veth device)
      
      [  706.369655] BUG: KASAN: use-after-free in ipip6_rcv+0x1678/0x16e0 [sit]
      [  706.449056] Read of size 1 at addr ffffe01b6bd855f5 by task ksoftirqd/1/=
      [  706.669494] Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 08/19/2016
      [  706.771839] Call trace:
      [  706.801159]  dump_backtrace+0x0/0x2f8
      [  706.845079]  show_stack+0x24/0x30
      [  706.884833]  dump_stack+0xe0/0x11c
      [  706.925629]  print_address_description+0x68/0x260
      [  706.982070]  kasan_report+0x178/0x340
      [  707.025995]  __asan_report_load1_noabort+0x30/0x40
      [  707.083481]  ipip6_rcv+0x1678/0x16e0 [sit]
      [  707.132623]  tunnel64_rcv+0xd4/0x200 [tunnel4]
      [  707.185940]  ip_local_deliver_finish+0x3b8/0x988
      [  707.241338]  ip_local_deliver+0x144/0x470
      [  707.289436]  ip_rcv_finish+0x43c/0x14b0
      [  707.335447]  ip_rcv+0x628/0x1138
      [  707.374151]  __netif_receive_skb_core+0x1670/0x2600
      [  707.432680]  __netif_receive_skb+0x28/0x190
      [  707.482859]  process_backlog+0x1d0/0x610
      [  707.529913]  net_rx_action+0x37c/0xf68
      [  707.574882]  __do_softirq+0x288/0x1018
      [  707.619852]  run_ksoftirqd+0x70/0xa8
      [  707.662734]  smpboot_thread_fn+0x3a4/0x9e8
      [  707.711875]  kthread+0x2c8/0x350
      [  707.750583]  ret_from_fork+0x10/0x18
      
      [  707.811302] Allocated by task 16982:
      [  707.854182]  kasan_kmalloc.part.1+0x40/0x108
      [  707.905405]  kasan_kmalloc+0xb4/0xc8
      [  707.948291]  kasan_slab_alloc+0x14/0x20
      [  707.994309]  __kmalloc_node_track_caller+0x158/0x5e0
      [  708.053902]  __kmalloc_reserve.isra.8+0x54/0xe0
      [  708.108280]  __alloc_skb+0xd8/0x400
      [  708.150139]  sk_stream_alloc_skb+0xa4/0x638
      [  708.200346]  tcp_sendmsg_locked+0x818/0x2b90
      [  708.251581]  tcp_sendmsg+0x40/0x60
      [  708.292376]  inet_sendmsg+0xf0/0x520
      [  708.335259]  sock_sendmsg+0xac/0xf8
      [  708.377096]  sock_write_iter+0x1c0/0x2c0
      [  708.424154]  new_sync_write+0x358/0x4a8
      [  708.470162]  __vfs_write+0xc4/0xf8
      [  708.510950]  vfs_write+0x12c/0x3d0
      [  708.551739]  ksys_write+0xcc/0x178
      [  708.592533]  __arm64_sys_write+0x70/0xa0
      [  708.639593]  el0_svc_handler+0x13c/0x298
      [  708.686646]  el0_svc+0x8/0xc
      
      [  708.739019] Freed by task 17:
      [  708.774597]  __kasan_slab_free+0x114/0x228
      [  708.823736]  kasan_slab_free+0x10/0x18
      [  708.868703]  kfree+0x100/0x3d8
      [  708.905320]  skb_free_head+0x7c/0x98
      [  708.948204]  skb_release_data+0x320/0x490
      [  708.996301]  pskb_expand_head+0x60c/0x970
      [  709.044399]  __iptunnel_pull_header+0x3b8/0x5d0
      [  709.098770]  ipip6_rcv+0x41c/0x16e0 [sit]
      [  709.146873]  tunnel64_rcv+0xd4/0x200 [tunnel4]
      [  709.200195]  ip_local_deliver_finish+0x3b8/0x988
      [  709.255596]  ip_local_deliver+0x144/0x470
      [  709.303692]  ip_rcv_finish+0x43c/0x14b0
      [  709.349705]  ip_rcv+0x628/0x1138
      [  709.388413]  __netif_receive_skb_core+0x1670/0x2600
      [  709.446943]  __netif_receive_skb+0x28/0x190
      [  709.497120]  process_backlog+0x1d0/0x610
      [  709.544169]  net_rx_action+0x37c/0xf68
      [  709.589131]  __do_softirq+0x288/0x1018
      
      [  709.651938] The buggy address belongs to the object at ffffe01b6bd85580
                      which belongs to the cache kmalloc-1024 of size 1024
      [  709.804356] The buggy address is located 117 bytes inside of
                      1024-byte region [ffffe01b6bd85580, ffffe01b6bd85980)
      [  709.946340] The buggy address belongs to the page:
      [  710.003824] page:ffff7ff806daf600 count:1 mapcount:0 mapping:ffffe01c4001f600 index:0x0
      [  710.099914] flags: 0xfffff8000000100(slab)
      [  710.149059] raw: 0fffff8000000100 dead000000000100 dead000000000200 ffffe01c4001f600
      [  710.242011] raw: 0000000000000000 0000000000380038 00000001ffffffff 0000000000000000
      [  710.334966] page dumped because: kasan: bad access detected
      
      Fix it resetting iph pointer after iptunnel_pull_header
      
      Fixes: a09a4c8d ("tunnels: Remove encapsulation offloads on decap")
      Tested-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb9bd814
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.1-rc4' of... · 8e22ba96
      Linus Torvalds authored
      Merge tag 'riscv-for-linus-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
      
      Pull RISC-V fixes from Palmer Dabbelt:
       "I dropped the ball a bit here: these patches should all probably have
        been part of rc2, but I wanted to get around to properly testing them
        in the various configurations (qemu32, qeum64, unleashed) first.
      
        Unfortunately I've been traveling and didn't have time to actually do
        that, but since these fix concrete bugs and pass my old set of tests I
        don't want to delay the fixes any longer.
      
        There are four independent fixes here:
      
         - A fix for the rv32 port that corrects the 64-bit user accesor's
           fixup label address.
      
         - A fix for a regression introduced during the merge window that
           broke medlow configurations at run time. This patch also includes a
           fix that disables ftrace for the same set of functions, which was
           found by inspection at the same time.
      
         - A modification of the memory map to avoid overlapping the FIXMAP
           and VMALLOC regions on systems with small memory maps.
      
         - A fix to the module handling code to use the correct syntax for
           probing Kconfig entries.
      
        These have passed my standard test flow, but I didn't have time to
        expand that testing like I said I would"
      
      * tag 'riscv-for-linus-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
        RISC-V: Use IS_ENABLED(CONFIG_CMODEL_MEDLOW)
        RISC-V: Fix FIXMAP_TOP to avoid overlap with VMALLOC area
        RISC-V: Always compile mm/init.c with cmodel=medany and notrace
        riscv: fix accessing 8-byte variable from RV32
      8e22ba96
    • Nikolay Aleksandrov's avatar
      net: bridge: always clear mcast matching struct on reports and leaves · 1515a63f
      Nikolay Aleksandrov authored
      We need to be careful and always zero the whole br_ip struct when it is
      used for matching since the rhashtable change. This patch fixes all the
      places which didn't properly clear it which in turn might've caused
      mismatches.
      
      Thanks for the great bug report with reproducing steps and bisection.
      
      Steps to reproduce (from the bug report):
      ip link add br0 type bridge mcast_querier 1
      ip link set br0 up
      
      ip link add v2 type veth peer name v3
      ip link set v2 master br0
      ip link set v2 up
      ip link set v3 up
      ip addr add 3.0.0.2/24 dev v3
      
      ip netns add test
      ip link add v1 type veth peer name v1 netns test
      ip link set v1 master br0
      ip link set v1 up
      ip -n test link set v1 up
      ip -n test addr add 3.0.0.1/24 dev v1
      
      # Multicast receiver
      ip netns exec test socat
      UDP4-RECVFROM:5588,ip-add-membership=224.224.224.224:3.0.0.1,fork -
      
      # Multicast sender
      echo hello | nc -u -s 3.0.0.2 224.224.224.224 5588
      
      Reported-by: liam.mcbirnie@boeing.com
      Fixes: 19e3a9c9 ("net: bridge: convert multicast to generic rhashtable")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1515a63f
    • Linus Torvalds's avatar
      Merge tag 'pm-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 20ad5494
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix up the intel_pstate driver after recent changes to prevent
        it from printing pointless messages and update the turbostat utility
        (mostly fixes and new hardware support).
      
        Specifics:
      
         - Make intel_pstate only load on Intel processors and prevent it from
           printing pointless failure messages (Borislav Petkov).
      
         - Update the turbostat utility:
            * Assorted fixes (Ben Hutchings, Len Brown, Prarit Bhargava).
            * Support for AMD Fam 17h (Zen) RAPL and package power (Calvin
              Walton).
            * Support for Intel Icelake and for systems with more than one die
              per package (Len Brown).
            * Cleanups (Len Brown)"
      
      * tag 'pm-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq/intel_pstate: Load only on Intel hardware
        tools/power turbostat: update version number
        tools/power turbostat: Warn on bad ACPI LPIT data
        tools/power turbostat: Add checks for failure of fgets() and fscanf()
        tools/power turbostat: Also read package power on AMD F17h (Zen)
        tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL
        tools/power turbostat: Do not display an error on systems without a cpufreq driver
        tools/power turbostat: Add Die column
        tools/power turbostat: Add Icelake support
        tools/power turbostat: Cleanup CNL-specific code
        tools/power turbostat: Cleanup CC3-skip code
        tools/power turbostat: Restore ability to execute in topology-order
      20ad5494
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · b512f712
      Linus Torvalds authored
      Pull ACPI fix from Rafael Wysocki:
       "Prevent stale GPE events from triggering spurious system wakeups from
        suspend-to-idle (Furquan Shaikh)"
      
      * tag 'acpi-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPICA: Clear status of GPEs before enabling them
      b512f712
    • Dave Airlie's avatar
      Merge tag 'drm-intel-fixes-2019-04-04' of... · 23b5f422
      Dave Airlie authored
      Merge tag 'drm-intel-fixes-2019-04-04' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
      
      Only one fix for DSC (backoff after drm_modeset_lock deadlock)
      and GVT's fixes including vGPU display plane size calculation,
      shadow mm pin count, error recovery path for workload create
      and one kerneldoc fix.
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190404161116.GA14522@intel.com
      23b5f422
    • Linus Torvalds's avatar
      Merge tag 'mfd-fixes-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd · 9db6ce4e
      Linus Torvalds authored
      Pull mfd fixes from Lee Jones:
      
       - Fix failed reads due to enabled IRQs when suspended; twl-core
      
       - Fix driver registration when using DT; sprd-sc27xx-spi
      
       - Fix `make allyesconfig` on x86_64; SUN6I_PRCM
      
      * tag 'mfd-fixes-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
        mfd: sun6i-prcm: Allow to compile with COMPILE_TEST
        mfd: sc27xx: Use SoC compatible string for PMIC devices
        mfd: twl-core: Disable IRQ while suspended
      9db6ce4e
    • Dave Airlie's avatar
      Merge branch 'drm-fixes-5.1' of git://people.freedesktop.org/~agd5f/linux into drm-fixes · 2ded1881
      Dave Airlie authored
      Fixes for 5.1:
      - Fix for pcie dpm
      - Powerplay fixes for vega20
      - Fix vbios display on reboot if driver display state is retained
      - Gfx9 resume robustness fix
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Alex Deucher <alexdeucher@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190404042939.3386-1-alexander.deucher@amd.com
      2ded1881
    • Varun Prakash's avatar
      libcxgb: fix incorrect ppmax calculation · cc5a726c
      Varun Prakash authored
      BITS_TO_LONGS() uses DIV_ROUND_UP() because of
      this ppmax value can be greater than available
      per cpu page pods.
      
      This patch removes BITS_TO_LONGS() to fix this
      issue.
      Signed-off-by: default avatarVarun Prakash <varun@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc5a726c
    • Chris Leech's avatar
      vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x · 0a89eb92
      Chris Leech authored
      Way back in 3c9c36bc the
      ndo_fcoe_get_wwn pointer was switched from depending on CONFIG_FCOE to
      CONFIG_LIBFCOE in order to allow building FCoE support into the bnx2x
      driver and used by bnx2fc without including the generic software fcoe
      module.
      
      But, FCoE is generally used over an 802.1q VLAN, and the implementation
      of ndo_fcoe_get_wwn in the 8021q module was not similarly changed.  The
      result is that if CONFIG_FCOE is disabled, then bnz2fc cannot make a
      call to ndo_fcoe_get_wwn through the 8021q interface to the underlying
      bnx2x interface.  The bnx2fc driver then falls back to a potentially
      different mapping of Ethernet MAC to Fibre Channel WWN, creating an
      incompatibility with the fabric and target configurations when compared
      to the WWNs used by pre-boot firmware and differently-configured
      kernels.
      
      So make the conditional inclusion of FCoE code in 8021q match the
      conditional inclusion in netdevice.h
      Signed-off-by: default avatarChris Leech <cleech@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a89eb92
  3. 04 Apr, 2019 13 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 5ba57801
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-04-04
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Batch of fixes to the existing BPF flow dissector API to support
         calling BPF programs from the eth_get_headlen context (support for
         latter is planned to be added in bpf-next), from Stanislav.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ba57801
    • Rafael J. Wysocki's avatar
      Merge branch 'acpica' into acpi · b59fb7ef
      Rafael J. Wysocki authored
      * acpica:
        ACPICA: Clear status of GPEs before enabling them
      b59fb7ef
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-tools' · 58b0cf8e
      Rafael J. Wysocki authored
      * pm-tools:
        tools/power turbostat: update version number
        tools/power turbostat: Warn on bad ACPI LPIT data
        tools/power turbostat: Add checks for failure of fgets() and fscanf()
        tools/power turbostat: Also read package power on AMD F17h (Zen)
        tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL
        tools/power turbostat: Do not display an error on systems without a cpufreq driver
        tools/power turbostat: Add Die column
        tools/power turbostat: Add Icelake support
        tools/power turbostat: Cleanup CNL-specific code
        tools/power turbostat: Cleanup CC3-skip code
        tools/power turbostat: Restore ability to execute in topology-order
      58b0cf8e
    • Mike Snitzer's avatar
      dm: disable DISCARD if the underlying storage no longer supports it · bcb44433
      Mike Snitzer authored
      Storage devices which report supporting discard commands like
      WRITE_SAME_16 with unmap, but reject discard commands sent to the
      storage device.  This is a clear storage firmware bug but it doesn't
      change the fact that should a program cause discards to be sent to a
      multipath device layered on this buggy storage, all paths can end up
      failed at the same time from the discards, causing possible I/O loss.
      
      The first discard to a path will fail with Illegal Request, Invalid
      field in cdb, e.g.:
       kernel: sd 8:0:8:19: [sdfn] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
       kernel: sd 8:0:8:19: [sdfn] tag#0 Sense Key : Illegal Request [current]
       kernel: sd 8:0:8:19: [sdfn] tag#0 Add. Sense: Invalid field in cdb
       kernel: sd 8:0:8:19: [sdfn] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 a0 08 00 00 00 80 00 00 00
       kernel: blk_update_request: critical target error, dev sdfn, sector 10487808
      
      The SCSI layer converts this to the BLK_STS_TARGET error number, the sd
      device disables its support for discard on this path, and because of the
      BLK_STS_TARGET error multipath fails the discard without failing any
      path or retrying down a different path.  But subsequent discards can
      cause path failures.  Any discards sent to the path which already failed
      a discard ends up failing with EIO from blk_cloned_rq_check_limits with
      an "over max size limit" error since the discard limit was set to 0 by
      the sd driver for the path.  As the error is EIO, this now fails the
      path and multipath tries to send the discard down the next path.  This
      cycle continues as discards are sent until all paths fail.
      
      Fix this by training DM core to disable DISCARD if the underlying
      storage already did so.
      
      Also, fix branching in dm_done() and clone_endio() to reflect the
      mutually exclussive nature of the IO operations in question.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      bcb44433
    • David S. Miller's avatar
      Merge branch 'sch_cake-fixes' · 3baf5c2d
      David S. Miller authored
      Toke Høiland-Jørgensen says:
      
      ====================
      sched: A few small fixes for sch_cake
      
      Kevin noticed a few issues with the way CAKE reads the skb protocol and the IP
      diffserv fields. This series fixes those two issues, and should probably go to
      in 4.19 as well. However, the previous refactoring patch means they don't apply
      as-is; I can send a follow-up directly to stable if that's OK with you?
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3baf5c2d
    • Toke Høiland-Jørgensen's avatar
      sch_cake: Make sure we can write the IP header before changing DSCP bits · c87b4ecd
      Toke Høiland-Jørgensen authored
      There is not actually any guarantee that the IP headers are valid before we
      access the DSCP bits of the packets. Fix this using the same approach taken
      in sch_dsmark.
      Reported-by: default avatarKevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c87b4ecd
    • Toke Høiland-Jørgensen's avatar
      sch_cake: Use tc_skb_protocol() helper for getting packet protocol · b2100cc5
      Toke Høiland-Jørgensen authored
      We shouldn't be using skb->protocol directly as that will miss cases with
      hardware-accelerated VLAN tags. Use the helper instead to get the right
      protocol number.
      Reported-by: default avatarKevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2100cc5
    • Koen De Schepper's avatar
      tcp: Ensure DCTCP reacts to losses · aecfde23
      Koen De Schepper authored
      RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
      loss episodes in the same way as conventional TCP".
      
      Currently, Linux DCTCP performs no cwnd reduction when losses
      are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets
      alpha to its maximal value if a RTO happens. This behavior
      is sub-optimal for at least two reasons: i) it ignores losses
      triggering fast retransmissions; and ii) it causes unnecessary large
      cwnd reduction in the future if the loss was isolated as it resets
      the historical term of DCTCP's alpha EWMA to its maximal value (i.e.,
      denoting a total congestion). The second reason has an especially
      noticeable effect when using DCTCP in high BDP environments, where
      alpha normally stays at low values.
      
      This patch replace the clamping of alpha by setting ssthresh to
      half of cwnd for both fast retransmissions and RTOs, at most once
      per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
      has been removed.
      
      The table below shows experimental results where we measured the
      drop probability of a PIE AQM (not applying ECN marks) at a
      bottleneck in the presence of a single TCP flow with either the
      alpha-clamping option enabled or the cwnd halving proposed by this
      patch. Results using reno or cubic are given for comparison.
      
                                |  Link   |   RTT    |    Drop
                       TCP CC   |  speed  | base+AQM | probability
              ==================|=========|==========|============
                          CUBIC |  40Mbps |  7+20ms  |    0.21%
                           RENO |         |          |    0.19%
              DCTCP-CLAMP-ALPHA |         |          |   25.80%
               DCTCP-HALVE-CWND |         |          |    0.22%
              ------------------|---------|----------|------------
                          CUBIC | 100Mbps |  7+20ms  |    0.03%
                           RENO |         |          |    0.02%
              DCTCP-CLAMP-ALPHA |         |          |   23.30%
               DCTCP-HALVE-CWND |         |          |    0.04%
              ------------------|---------|----------|------------
                          CUBIC | 800Mbps |   1+1ms  |    0.04%
                           RENO |         |          |    0.05%
              DCTCP-CLAMP-ALPHA |         |          |   18.70%
               DCTCP-HALVE-CWND |         |          |    0.06%
      
      We see that, without halving its cwnd for all source of losses,
      DCTCP drives the AQM to large drop probabilities in order to keep
      the queue length under control (i.e., it repeatedly faces RTOs).
      Instead, if DCTCP reacts to all source of losses, it can then be
      controlled by the AQM using similar drop levels than cubic or reno.
      Signed-off-by: default avatarKoen De Schepper <koen.de_schepper@nokia-bell-labs.com>
      Signed-off-by: default avatarOlivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
      Cc: Bob Briscoe <research@bobbriscoe.net>
      Cc: Lawrence Brakmo <brakmo@fb.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Daniel Borkmann <borkmann@iogearbox.net>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Andrew Shewmaker <agshew@gmail.com>
      Cc: Glenn Judd <glenn.judd@morganstanley.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aecfde23
    • Davide Caratti's avatar
      net/sched: act_sample: fix divide by zero in the traffic path · fae27081
      Davide Caratti authored
      the control path of 'sample' action does not validate the value of 'rate'
      provided by the user, but then it uses it as divisor in the traffic path.
      Validate it in tcf_sample_init(), and return -EINVAL with a proper extack
      message in case that value is zero, to fix a splat with the script below:
      
       # tc f a dev test0 egress matchall action sample rate 0 group 1 index 2
       # tc -s a s action sample
       total acts 1
      
               action order 0: sample rate 1/0 group 1 pipe
                index 2 ref 1 bind 1 installed 19 sec used 19 sec
               Action statistics:
               Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
               backlog 0b 0p requeues 0
       # ping 192.0.2.1 -I test0 -c1 -q
      
       divide error: 0000 [#1] SMP PTI
       CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ #591
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample]
       Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 <41> f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01
       RSP: 0018:ffffae320190ba30 EFLAGS: 00010246
       RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49
       RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0
       RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000
       R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80
       R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000
       FS:  00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0
       Call Trace:
        tcf_action_exec+0x7c/0x1c0
        tcf_classify+0x57/0x160
        __dev_queue_xmit+0x3dc/0xd10
        ip_finish_output2+0x257/0x6d0
        ip_output+0x75/0x280
        ip_send_skb+0x15/0x40
        raw_sendmsg+0xae3/0x1410
        sock_sendmsg+0x36/0x40
        __sys_sendto+0x10e/0x140
        __x64_sys_sendto+0x24/0x30
        do_syscall_64+0x60/0x210
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [...]
        Kernel panic - not syncing: Fatal exception in interrupt
      
      Add a TDC selftest to document that 'rate' is now being validated.
      Reported-by: default avatarMatteo Croce <mcroce@redhat.com>
      Fixes: 5c5670fa ("net/sched: Introduce sample tc action")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarYotam Gigi <yotam.gi@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fae27081
    • Lorenzo Bianconi's avatar
      net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop · 2ec1ed2a
      Lorenzo Bianconi authored
      When a bpf program is uploaded, the driver computes the number of
      xdp tx queues resulting in the allocation of additional qsets.
      Starting from commit '2ecbe4f4 ("net: thunderx: replace global
      nicvf_rx_mode_wq work queue for all VFs to private for each of them")'
      the driver runs link state polling for each VF resulting in the
      following NULL pointer dereference:
      
      [   56.169256] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
      [   56.178032] Mem abort info:
      [   56.180834]   ESR = 0x96000005
      [   56.183877]   Exception class = DABT (current EL), IL = 32 bits
      [   56.189792]   SET = 0, FnV = 0
      [   56.192834]   EA = 0, S1PTW = 0
      [   56.195963] Data abort info:
      [   56.198831]   ISV = 0, ISS = 0x00000005
      [   56.202662]   CM = 0, WnR = 0
      [   56.205619] user pgtable: 64k pages, 48-bit VAs, pgdp = 0000000021f0c7a0
      [   56.212315] [0000000000000020] pgd=0000000000000000, pud=0000000000000000
      [   56.219094] Internal error: Oops: 96000005 [#1] SMP
      [   56.260459] CPU: 39 PID: 2034 Comm: ip Not tainted 5.1.0-rc3+ #3
      [   56.266452] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS T49 02/02/2018
      [   56.273315] pstate: 80000005 (Nzcv daif -PAN -UAO)
      [   56.278098] pc : __ll_sc___cmpxchg_case_acq_64+0x4/0x20
      [   56.283312] lr : mutex_lock+0x2c/0x50
      [   56.286962] sp : ffff0000219af1b0
      [   56.290264] x29: ffff0000219af1b0 x28: ffff800f64de49a0
      [   56.295565] x27: 0000000000000000 x26: 0000000000000015
      [   56.300865] x25: 0000000000000000 x24: 0000000000000000
      [   56.306165] x23: 0000000000000000 x22: ffff000011117000
      [   56.311465] x21: ffff800f64dfc080 x20: 0000000000000020
      [   56.316766] x19: 0000000000000020 x18: 0000000000000001
      [   56.322066] x17: 0000000000000000 x16: ffff800f2e077080
      [   56.327367] x15: 0000000000000004 x14: 0000000000000000
      [   56.332667] x13: ffff000010964438 x12: 0000000000000002
      [   56.337967] x11: 0000000000000000 x10: 0000000000000c70
      [   56.343268] x9 : ffff0000219af120 x8 : ffff800f2e077d50
      [   56.348568] x7 : 0000000000000027 x6 : 000000062a9d6a84
      [   56.353869] x5 : 0000000000000000 x4 : ffff800f2e077480
      [   56.359169] x3 : 0000000000000008 x2 : ffff800f2e077080
      [   56.364469] x1 : 0000000000000000 x0 : 0000000000000020
      [   56.369770] Process ip (pid: 2034, stack limit = 0x00000000c862da3a)
      [   56.376110] Call trace:
      [   56.378546]  __ll_sc___cmpxchg_case_acq_64+0x4/0x20
      [   56.383414]  drain_workqueue+0x34/0x198
      [   56.387247]  nicvf_open+0x48/0x9e8 [nicvf]
      [   56.391334]  nicvf_open+0x898/0x9e8 [nicvf]
      [   56.395507]  nicvf_xdp+0x1bc/0x238 [nicvf]
      [   56.399595]  dev_xdp_install+0x68/0x90
      [   56.403333]  dev_change_xdp_fd+0xc8/0x240
      [   56.407333]  do_setlink+0x8e0/0xbe8
      [   56.410810]  __rtnl_newlink+0x5b8/0x6d8
      [   56.414634]  rtnl_newlink+0x54/0x80
      [   56.418112]  rtnetlink_rcv_msg+0x22c/0x2f8
      [   56.422199]  netlink_rcv_skb+0x60/0x120
      [   56.426023]  rtnetlink_rcv+0x28/0x38
      [   56.429587]  netlink_unicast+0x1c8/0x258
      [   56.433498]  netlink_sendmsg+0x1b4/0x350
      [   56.437410]  sock_sendmsg+0x4c/0x68
      [   56.440887]  ___sys_sendmsg+0x240/0x280
      [   56.444711]  __sys_sendmsg+0x68/0xb0
      [   56.448275]  __arm64_sys_sendmsg+0x2c/0x38
      [   56.452361]  el0_svc_handler+0x9c/0x128
      [   56.456186]  el0_svc+0x8/0xc
      [   56.459056] Code: 35ffff91 2a1003e0 d65f03c0 f9800011 (c85ffc10)
      [   56.465166] ---[ end trace 4a57fdc27b0a572c ]---
      [   56.469772] Kernel panic - not syncing: Fatal exception
      
      Fix it by checking nicvf_rx_mode_wq pointer in nicvf_open and nicvf_stop
      
      Fixes: 2ecbe4f4 ("net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them")
      Fixes: 2c632ad8 ("net: thunderx: move link state polling function to VF")
      Reported-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Tested-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ec1ed2a
    • David S. Miller's avatar
      Merge branch 'net-hns-bugfixes-for-HNS-Driver' · 47b62cd8
      David S. Miller authored
      Yonglong Liu says:
      
      ====================
      net: hns: bugfixes for HNS Driver
      
      This patchset fix some bugs that were found in the test of
      various scenarios, or identify by KASAN/sparse.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47b62cd8
    • Yonglong Liu's avatar
      net: hns: Fix sparse: some warnings in HNS drivers · 15400663
      Yonglong Liu authored
      There are some sparse warnings in the HNS drivers:
      
      warning: incorrect type in assignment (different address spaces)
          expected void [noderef] <asn:2> *io_base
          got void *vaddr
      warning: cast removes address space '<asn:2>' of expression
      [...]
      
      Add __iomem and change all the u8 __iomem to void __iomem to
      fix these kind of  warnings.
      
      warning: incorrect type in argument 1 (different address spaces)
          expected void [noderef] <asn:2> *base
          got unsigned char [usertype] *base_addr
      warning: cast to restricted __le16
      warning: incorrect type in assignment (different base types)
          expected unsigned int [usertype] tbl_tcam_data_high
          got restricted __le32 [usertype]
      warning: cast to restricted __le32
      [...]
      
      These variables used u32/u16 as their type, and finally as a
      parameter of writel(), writel() will do the cpu_to_le32 coversion
      so remove the little endian covert code to fix these kind of warnings.
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15400663
    • Yonglong Liu's avatar
      net: hns: Fix WARNING when remove HNS driver with SMMU enabled · 8601a99d
      Yonglong Liu authored
      When enable SMMU, remove HNS driver will cause a WARNING:
      
      [  141.924177] WARNING: CPU: 36 PID: 2708 at drivers/iommu/dma-iommu.c:443 __iommu_dma_unmap+0xc0/0xc8
      [  141.954673] Modules linked in: hns_enet_drv(-)
      [  141.963615] CPU: 36 PID: 2708 Comm: rmmod Tainted: G        W         5.0.0-rc1-28723-gb729c57de95c-dirty #32
      [  141.983593] Hardware name: Huawei D05/D05, BIOS Hisilicon D05 UEFI Nemo 1.8 RC0 08/31/2017
      [  142.000244] pstate: 60000005 (nZCv daif -PAN -UAO)
      [  142.009886] pc : __iommu_dma_unmap+0xc0/0xc8
      [  142.018476] lr : __iommu_dma_unmap+0xc0/0xc8
      [  142.027066] sp : ffff000013533b90
      [  142.033728] x29: ffff000013533b90 x28: ffff8013e6983600
      [  142.044420] x27: 0000000000000000 x26: 0000000000000000
      [  142.055113] x25: 0000000056000000 x24: 0000000000000015
      [  142.065806] x23: 0000000000000028 x22: ffff8013e66eee68
      [  142.076499] x21: ffff8013db919800 x20: 0000ffffefbff000
      [  142.087192] x19: 0000000000001000 x18: 0000000000000007
      [  142.097885] x17: 000000000000000e x16: 0000000000000001
      [  142.108578] x15: 0000000000000019 x14: 363139343a70616d
      [  142.119270] x13: 6e75656761705f67 x12: 0000000000000000
      [  142.129963] x11: 00000000ffffffff x10: 0000000000000006
      [  142.140656] x9 : 1346c1aa88093500 x8 : ffff0000114de4e0
      [  142.151349] x7 : 6662666578303d72 x6 : ffff0000105ffec8
      [  142.162042] x5 : 0000000000000000 x4 : 0000000000000000
      [  142.172734] x3 : 00000000ffffffff x2 : ffff0000114de500
      [  142.183427] x1 : 0000000000000000 x0 : 0000000000000035
      [  142.194120] Call trace:
      [  142.199030]  __iommu_dma_unmap+0xc0/0xc8
      [  142.206920]  iommu_dma_unmap_page+0x20/0x28
      [  142.215335]  __iommu_unmap_page+0x40/0x60
      [  142.223399]  hnae_unmap_buffer+0x110/0x134
      [  142.231639]  hnae_free_desc+0x6c/0x10c
      [  142.239177]  hnae_fini_ring+0x14/0x34
      [  142.246540]  hnae_fini_queue+0x2c/0x40
      [  142.254080]  hnae_put_handle+0x38/0xcc
      [  142.261619]  hns_nic_dev_remove+0x54/0xfc [hns_enet_drv]
      [  142.272312]  platform_drv_remove+0x24/0x64
      [  142.280552]  device_release_driver_internal+0x17c/0x20c
      [  142.291070]  driver_detach+0x4c/0x90
      [  142.298259]  bus_remove_driver+0x5c/0xd8
      [  142.306148]  driver_unregister+0x2c/0x54
      [  142.314037]  platform_driver_unregister+0x10/0x18
      [  142.323505]  hns_nic_dev_driver_exit+0x14/0xf0c [hns_enet_drv]
      [  142.335248]  __arm64_sys_delete_module+0x214/0x25c
      [  142.344891]  el0_svc_common+0xb0/0x10c
      [  142.352430]  el0_svc_handler+0x24/0x80
      [  142.359968]  el0_svc+0x8/0x7c0
      [  142.366104] ---[ end trace 60ad1cd58e63c407 ]---
      
      The tx ring buffer map when xmit and unmap when xmit done. So in
      hnae_init_ring() did not map tx ring buffer, but in hnae_fini_ring()
      have a unmap operation for tx ring buffer, which is already unmapped
      when xmit done, than cause this WARNING.
      
      The hnae_alloc_buffers() is called in hnae_init_ring(),
      so the hnae_free_buffers() should be in hnae_fini_ring(), not in
      hnae_free_desc().
      
      In hnae_fini_ring(), adds a check is_rx_ring() as in hnae_init_ring().
      When the ring buffer is tx ring, adds a piece of code to ensure that
      the tx ring is unmap.
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarPeng Li <lipeng321@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8601a99d