1. 31 Jan, 2016 40 commits
    • Jason Gerecke's avatar
      HID: wacom: Tie cached HID_DG_CONTACTCOUNT indices to report ID · 4fb5b524
      Jason Gerecke authored
      commit 499522c8 upstream.
      
      The cached indicies 'cc_index' and 'cc_value_index' introduced in 1b5d514
      are only valid for a single report ID. If a touchscreen has multiple
      reports with a HID_DG_CONTACTCOUNT usage, its possible that the values
      will not be correct for the report we're handling, resulting in an
      incorrect value for 'num_expected'. This has been observed with the Cintiq
      Companion 2.
      
      To address this, we store the ID of the report those indicies are valid
      for in a new  'cc_report' variable. Before using them to get the expected
      contact count, we first check if the ID of the report we're processing
      matches 'cc_report'. If it doesn't, we update the indicies to point to
      the HID_DG_CONTACTCOUNT usage of the current report (if it has one).
      Signed-off-by: default avatarJason Gerecke <jason.gerecke@wacom.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4fb5b524
    • Mikulas Patocka's avatar
      parisc iommu: fix panic due to trying to allocate too large region · 3016b876
      Mikulas Patocka authored
      commit e46e31a3 upstream.
      
      When using the Promise TX2+ SATA controller on PA-RISC, the system often
      crashes with kernel panic, for example just writing data with the dd
      utility will make it crash.
      
      Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources
      
      CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2
      Backtrace:
       [<000000004021497c>] show_stack+0x14/0x20
       [<0000000040410bf0>] dump_stack+0x88/0x100
       [<000000004023978c>] panic+0x124/0x360
       [<0000000040452c18>] sba_alloc_range+0x698/0x6a0
       [<0000000040453150>] sba_map_sg+0x260/0x5b8
       [<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata]
       [<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata]
       [<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata]
       [<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130
       [<000000004049da34>] scsi_request_fn+0x6e4/0x970
       [<00000000403e95a8>] __blk_run_queue+0x40/0x60
       [<00000000403e9d8c>] blk_run_queue+0x3c/0x68
       [<000000004049a534>] scsi_run_queue+0x2a4/0x360
       [<000000004049be68>] scsi_end_request+0x1a8/0x238
       [<000000004049de84>] scsi_io_completion+0xfc/0x688
       [<0000000040493c74>] scsi_finish_command+0x17c/0x1d0
      
      The cause of the crash is not exhaustion of the IOMMU space, there is
      plenty of free pages. The function sba_alloc_range is called with size
      0x11000, thus the pages_needed variable is 0x11. The function
      sba_search_bitmap is called with bits_wanted 0x11 and boundary size is
      0x10 (because dma_get_seg_boundary(dev) returns 0xffff).
      
      The function sba_search_bitmap attempts to allocate 17 pages that must not
      cross 16-page boundary - it can't satisfy this requirement
      (iommu_is_span_boundary always returns true) and fails even if there are
      many free entries in the IOMMU space.
      
      How did it happen that we try to allocate 17 pages that don't cross
      16-page boundary? The cause is in the function iommu_coalesce_chunks. This
      function tries to coalesce adjacent entries in the scatterlist. The
      function does several checks if it may coalesce one entry with the next,
      one of those checks is this:
      
      	if (startsg->length + dma_len > max_seg_size)
      		break;
      
      When it finishes coalescing adjacent entries, it allocates the mapping:
      
      sg_dma_len(contig_sg) = dma_len;
      dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
      sg_dma_address(contig_sg) =
      	PIDE_FLAG
      	| (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT)
      	| dma_offset;
      
      It is possible that (startsg->length + dma_len > max_seg_size) is false
      (we are just near the 0x10000 max_seg_size boundary), so the funcion
      decides to coalesce this entry with the next entry. When the coalescing
      succeeds, the function performs
      	dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
      And now, because of non-zero dma_offset, dma_len is greater than 0x10000.
      iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts
      to allocate 17 pages for a device that must not cross 16-page boundary.
      
      To fix the bug, we must make sure that dma_len after addition of
      dma_offset and alignment doesn't cross the segment boundary. I.e. change
      	if (startsg->length + dma_len > max_seg_size)
      		break;
      to
      	if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size)
      		break;
      
      This patch makes this change (it precalculates max_seg_boundary at the
      beginning of the function iommu_coalesce_chunks). I also added a check
      that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is
      not needed for Promise TX2+ SATA, but it may be needed for other devices
      that have dma_get_seg_boundary lower than dma_get_max_seg_size).
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3016b876
    • David Woodhouse's avatar
      iommu/vt-d: Fix ATSR handling for Root-Complex integrated endpoints · 24347f2e
      David Woodhouse authored
      commit d14053b3 upstream.
      
      The VT-d specification says that "Software must enable ATS on endpoint
      devices behind a Root Port only if the Root Port is reported as
      supporting ATS transactions."
      
      We walk up the tree to find a Root Port, but for integrated devices we
      don't find one — we get to the host bridge. In that case we *should*
      allow ATS. Currently we don't, which means that we are incorrectly
      failing to use ATS for the integrated graphics. Fix that.
      
      We should never break out of this loop "naturally" with bus==NULL,
      since we'll always find bridge==NULL in that case (and now return 1).
      
      So remove the check for (!bridge) after the loop, since it can never
      happen. If it did, it would be worthy of a BUG_ON(!bridge). But since
      it'll oops anyway in that case, that'll do just as well.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24347f2e
    • Will Deacon's avatar
      iommu/arm-smmu: Fix error checking for ASID and VMID allocation · 2f736853
      Will Deacon authored
      commit c0733a2c upstream.
      
      The bitmap allocator returns an int, which is one of the standard
      negative values on failure. Rather than assigning this straight to a
      u16 (like we do for the ASID and VMID callers), which means that we
      won't detect failure correctly, use an int for the purposes of error
      checking.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f736853
    • Lorenzo Pieralisi's avatar
      arm64: kernel: enforce pmuserenr_el0 initialization and restore · d2d39a3b
      Lorenzo Pieralisi authored
      commit 60792ad3 upstream.
      
      The pmuserenr_el0 register value is architecturally UNKNOWN on reset.
      Current kernel code resets that register value iff the core pmu device is
      correctly probed in the kernel. On platforms with missing DT pmu nodes (or
      disabled perf events in the kernel), the pmu is not probed, therefore the
      pmuserenr_el0 register is not reset in the kernel, which means that its
      value retains the reset value that is architecturally UNKNOWN (system
      may run with eg pmuserenr_el0 == 0x1, which means that PMU counters access
      is available at EL0, which must be disallowed).
      
      This patch adds code that resets pmuserenr_el0 on cold boot and restores
      it on core resume from shutdown, so that the pmuserenr_el0 setup is
      always enforced in the kernel.
      
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2d39a3b
    • Will Deacon's avatar
      arm64: mm: ensure that the zero page is visible to the page table walker · fcad0638
      Will Deacon authored
      commit 32d63978 upstream.
      
      In paging_init, we allocate the zero page, memset it to zero and then
      point TTBR0 to it in order to avoid speculative fetches through the
      identity mapping.
      
      In order to guarantee that the freshly zeroed page is indeed visible to
      the page table walker, we need to execute a dsb instruction prior to
      writing the TTBR.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fcad0638
    • John Blackwood's avatar
      arm64: Clear out any singlestep state on a ptrace detach operation · a8c5c526
      John Blackwood authored
      commit 5db4fd8c upstream.
      
      Make sure to clear out any ptrace singlestep state when a ptrace(2)
      PTRACE_DETACH call is made on arm64 systems.
      
      Otherwise, the previously ptraced task will die off with a SIGTRAP
      signal if the debugger just previously singlestepped the ptraced task.
      Signed-off-by: default avatarJohn Blackwood <john.blackwood@ccur.com>
      [will: added comment to justify why this is in the arch code]
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8c5c526
    • Ard Biesheuvel's avatar
      ARM/arm64: KVM: correct PTE uncachedness check · d3065fb4
      Ard Biesheuvel authored
      commit 0de58f85 upstream.
      
      Commit e6fab544 ("ARM/arm64: KVM: test properly for a PTE's
      uncachedness") modified the logic to test whether a HYP or stage-2
      mapping needs flushing, from [incorrectly] interpreting the page table
      attributes to [incorrectly] checking whether the PFN that backs the
      mapping is covered by host system RAM. The PFN number is part of the
      output of the translation, not the input, so we have to use pte_pfn()
      on the contents of the PTE, not __phys_to_pfn() on the HYP virtual
      address or stage-2 intermediate physical address.
      
      Fixes: e6fab544 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")
      Tested-by: default avatarPavel Fedin <p.fedin@samsung.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3065fb4
    • Arnd Bergmann's avatar
      arm64: fix building without CONFIG_UID16 · 838a9160
      Arnd Bergmann authored
      commit fbc416ff upstream.
      
      As reported by Michal Simek, building an ARM64 kernel with CONFIG_UID16
      disabled currently fails because the system call table still needs to
      reference the individual function entry points that are provided by
      kernel/sys_ni.c in this case, and the declarations are hidden inside
      of #ifdef CONFIG_UID16:
      
      arch/arm64/include/asm/unistd32.h:57:8: error: 'sys_lchown16' undeclared here (not in a function)
       __SYSCALL(__NR_lchown, sys_lchown16)
      
      I believe this problem only exists on ARM64, because older architectures
      tend to not need declarations when their system call table is built
      in assembly code, while newer architectures tend to not need UID16
      support. ARM64 only uses these system calls for compatibility with
      32-bit ARM binaries.
      
      This changes the CONFIG_UID16 check into CONFIG_HAVE_UID16, which is
      set unconditionally on ARM64 with CONFIG_COMPAT, so we see the
      declarations whenever we need them, but otherwise the behavior is
      unchanged.
      
      Fixes: af1839eb ("Kconfig: clean up the long arch list for the UID16 config option")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      838a9160
    • Marc Zyngier's avatar
      arm64: KVM: Fix AArch32 to AArch64 register mapping · 5dc8a8c7
      Marc Zyngier authored
      commit c0f09634 upstream.
      
      When running a 32bit guest under a 64bit hypervisor, the ARMv8
      architecture defines a mapping of the 32bit registers in the 64bit
      space. This includes banked registers that are being demultiplexed
      over the 64bit ones.
      
      On exceptions caused by an operation involving a 32bit register, the
      HW exposes the register number in the ESR_EL2 register. It was so
      far understood that SW had to distinguish between AArch32 and AArch64
      accesses (based on the current AArch32 mode and register number).
      
      It turns out that I misinterpreted the ARM ARM, and the clue is in
      D1.20.1: "For some exceptions, the exception syndrome given in the
      ESR_ELx identifies one or more register numbers from the issued
      instruction that generated the exception. Where the exception is
      taken from an Exception level using AArch32 these register numbers
      give the AArch64 view of the register."
      
      Which means that the HW is already giving us the translated version,
      and that we shouldn't try to interpret it at all (for example, doing
      an MMIO operation from the IRQ mode using the LR register leads to
      very unexpected behaviours).
      
      The fix is thus not to perform a call to vcpu_reg32() at all from
      vcpu_reg(), and use whatever register number is supplied directly.
      The only case we need to find out about the mapping is when we
      actively generate a register access, which only occurs when injecting
      a fault in a guest.
      Reviewed-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5dc8a8c7
    • Ard Biesheuvel's avatar
      ARM/arm64: KVM: test properly for a PTE's uncachedness · b8691782
      Ard Biesheuvel authored
      commit e6fab544 upstream.
      
      The open coded tests for checking whether a PTE maps a page as
      uncached use a flawed '(pte_val(xxx) & CONST) != CONST' pattern,
      which is not guaranteed to work since the type of a mapping is
      not a set of mutually exclusive bits
      
      For HYP mappings, the type is an index into the MAIR table (i.e, the
      index itself does not contain any information whatsoever about the
      type of the mapping), and for stage-2 mappings it is a bit field where
      normal memory and device types are defined as follows:
      
          #define MT_S2_NORMAL            0xf
          #define MT_S2_DEVICE_nGnRE      0x1
      
      I.e., masking *and* comparing with the latter matches on the former,
      and we have been getting lucky merely because the S2 device mappings
      also have the PTE_UXN bit set, or we would misidentify memory mappings
      as device mappings.
      
      Since the unmap_range() code path (which contains one instance of the
      flawed test) is used both for HYP mappings and stage-2 mappings, and
      considering the difference between the two, it is non-trivial to fix
      this by rewriting the tests in place, as it would involve passing
      down the type of mapping through all the functions.
      
      However, since HYP mappings and stage-2 mappings both deal with host
      physical addresses, we can simply check whether the mapping is backed
      by memory that is managed by the host kernel, and only perform the
      D-cache maintenance if this is the case.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Tested-by: default avatarPavel Fedin <p.fedin@samsung.com>
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8691782
    • Lorenzo Pieralisi's avatar
      arm64: kernel: pause/unpause function graph tracer in cpu_suspend() · bf8be9b3
      Lorenzo Pieralisi authored
      commit de818bd4 upstream.
      
      The function graph tracer adds instrumentation that is required to trace
      both entry and exit of a function. In particular the function graph
      tracer updates the "return address" of a function in order to insert
      a trace callback on function exit.
      
      Kernel power management functions like cpu_suspend() are called
      upon power down entry with functions called "finishers" that are in turn
      called to trigger the power down sequence but they may not return to the
      kernel through the normal return path.
      
      When the core resumes from low-power it returns to the cpu_suspend()
      function through the cpu_resume path, which leaves the trace stack frame
      set-up by the function tracer in an incosistent state upon return to the
      kernel when tracing is enabled.
      
      This patch fixes the issue by pausing/resuming the function graph
      tracer on the thread executing cpu_suspend() (ie the function call that
      subsequently triggers the "suspend finishers"), so that the function graph
      tracer state is kept consistent across functions that enter power down
      states and never return by effectively disabling graph tracer while they
      are executing.
      
      Fixes: 819e50e2 ("arm64: Add ftrace support")
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reported-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarAKASHI Takahiro <takahiro.akashi@linaro.org>
      Suggested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf8be9b3
    • Lorenzo Pieralisi's avatar
      arm64: cmpxchg_dbl: fix return value type · 2b656a61
      Lorenzo Pieralisi authored
      commit 57a65667 upstream.
      
      The current arm64 __cmpxchg_double{_mb} implementations carry out the
      compare exchange by first comparing the old values passed in to the
      values read from the pointer provided and by stashing the cumulative
      bitwise difference in a 64-bit register.
      
      By comparing the register content against 0, it is possible to detect if
      the values read differ from the old values passed in, so that the compare
      exchange detects whether it has to bail out or carry on completing the
      operation with the exchange.
      
      Given the current implementation, to detect the cmpxchg operation
      status, the __cmpxchg_double{_mb} functions should return the 64-bit
      stashed bitwise difference so that the caller can detect cmpxchg failure
      by comparing the return value content against 0. The current implementation
      declares the return value as an int, which means that the 64-bit
      value stashing the bitwise difference is truncated before being
      returned to the __cmpxchg_double{_mb} callers, which means that
      any bitwise difference present in the top 32 bits goes undetected,
      triggering false positives and subsequent kernel failures.
      
      This patch fixes the issue by declaring the arm64 __cmpxchg_double{_mb}
      return values as a long, so that the bitwise difference is
      properly propagated on failure, restoring the expected behaviour.
      
      Fixes: e9a4b795 ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU")
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b656a61
    • Zi Shen Lim's avatar
      arm64: bpf: fix mod-by-zero case · 3e34dfef
      Zi Shen Lim authored
      commit 14e589ff upstream.
      
      Turns out in the case of modulo by zero in a BPF program:
      	A = A % X;  (X == 0)
      the expected behavior is to terminate with return value 0.
      
      The bug in JIT is exposed by a new test case [1].
      
      [1] https://lkml.org/lkml/2015/11/4/499Signed-off-by: default avatarZi Shen Lim <zlim.lnx@gmail.com>
      Reported-by: default avatarYang Shi <yang.shi@linaro.org>
      Reported-by: default avatarXi Wang <xi.wang@gmail.com>
      CC: Alexei Starovoitov <ast@plumgrid.com>
      Fixes: e54bcde3 ("arm64: eBPF JIT compiler")
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e34dfef
    • Zi Shen Lim's avatar
      arm64: bpf: fix div-by-zero case · e905b253
      Zi Shen Lim authored
      commit 251599e1 upstream.
      
      In the case of division by zero in a BPF program:
      	A = A / X;  (X == 0)
      the expected behavior is to terminate with return value 0.
      
      This is confirmed by the test case introduced in commit 86bf1721
      ("test_bpf: add tests checking that JIT/interpreter sets A and X to 0.").
      Reported-by: default avatarYang Shi <yang.shi@linaro.org>
      Tested-by: default avatarYang Shi <yang.shi@linaro.org>
      CC: Xi Wang <xi.wang@gmail.com>
      CC: Alexei Starovoitov <ast@plumgrid.com>
      CC: linux-arm-kernel@lists.infradead.org
      CC: linux-kernel@vger.kernel.org
      Fixes: e54bcde3 ("arm64: eBPF JIT compiler")
      Signed-off-by: default avatarZi Shen Lim <zlim.lnx@gmail.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e905b253
    • Li Bin's avatar
      recordmcount: arm64: Replace the ignored mcount call into nop · 18f93664
      Li Bin authored
      commit 2ee8a74f upstream.
      
      By now, the recordmcount only records the function that in
      following sections:
      .text/.ref.text/.sched.text/.spinlock.text/.irqentry.text/
      .kprobes.text/.text.unlikely
      
      For the function that not in these sections, the call mcount
      will be in place and not be replaced when kernel boot up. And
      it will bring performance overhead, such as do_mem_abort (in
      .exception.text section). This patch make the call mcount to
      nop for this case in recordmcount.
      
      Link: http://lkml.kernel.org/r/1446019445-14421-1-git-send-email-huawei.libin@huawei.com
      Link: http://lkml.kernel.org/r/1446193864-24593-4-git-send-email-huawei.libin@huawei.com
      
      Cc: <lkp@intel.com>
      Cc: <catalin.marinas@arm.com>
      Cc: <takahiro.akashi@linaro.org>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarLi Bin <huawei.libin@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18f93664
    • Ulrich Weigand's avatar
      powerpc/module: Handle R_PPC64_ENTRY relocations · 96c81d0a
      Ulrich Weigand authored
      commit a61674bd upstream.
      
      GCC 6 will include changes to generated code with -mcmodel=large,
      which is used to build kernel modules on powerpc64le.  This was
      necessary because the large model is supposed to allow arbitrary
      sizes and locations of the code and data sections, but the ELFv2
      global entry point prolog still made the unconditional assumption
      that the TOC associated with any particular function can be found
      within 2 GB of the function entry point:
      
      func:
      	addis r2,r12,(.TOC.-func)@ha
      	addi  r2,r2,(.TOC.-func)@l
      	.localentry func, .-func
      
      To remove this assumption, GCC will now generate instead this global
      entry point prolog sequence when using -mcmodel=large:
      
      	.quad .TOC.-func
      func:
      	.reloc ., R_PPC64_ENTRY
      	ld    r2, -8(r12)
      	add   r2, r2, r12
      	.localentry func, .-func
      
      The new .reloc triggers an optimization in the linker that will
      replace this new prolog with the original code (see above) if the
      linker determines that the distance between .TOC. and func is in
      range after all.
      
      Since this new relocation is now present in module object files,
      the kernel module loader is required to handle them too.  This
      patch adds support for the new relocation and implements the
      same optimization done by the GNU linker.
      Signed-off-by: default avatarUlrich Weigand <ulrich.weigand@de.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96c81d0a
    • Ulrich Weigand's avatar
      scripts/recordmcount.pl: support data in text section on powerpc · 1a1021b6
      Ulrich Weigand authored
      commit 2e50c4be upstream.
      
      If a text section starts out with a data blob before the first
      function start label, disassembly parsing doing in recordmcount.pl
      gets confused on powerpc, leading to creation of corrupted module
      objects.
      
      This was not a problem so far since the compiler would never create
      such text sections.  However, this has changed with a recent change
      in GCC 6 to support distances of > 2GB between a function and its
      assoicated TOC in the ELFv2 ABI, exposing this problem.
      
      There is already code in recordmcount.pl to handle such data blobs
      on the sparc64 platform.  This patch uses the same method to handle
      those on powerpc as well.
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarUlrich Weigand <ulrich.weigand@de.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a1021b6
    • Boqun Feng's avatar
      powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered · e9e5728f
      Boqun Feng authored
      commit 81d7a329 upstream.
      
      According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_
      versions all need to be fully ordered, however they are now just
      RELEASE+ACQUIRE, which are not fully ordered.
      
      So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
      PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
      __{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics
      of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit
      b97021f8 ("powerpc: Fix atomic_xxx_return barrier semantics")
      
      This patch depends on patch "powerpc: Make value-returning atomics fully
      ordered" for PPC_ATOMIC_ENTRY_BARRIER definition.
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9e5728f
    • Boqun Feng's avatar
      powerpc: Make value-returning atomics fully ordered · 9f849cf3
      Boqun Feng authored
      commit 49e9cf3f upstream.
      
      According to memory-barriers.txt:
      
      > Any atomic operation that modifies some state in memory and returns
      > information about the state (old or new) implies an SMP-conditional
      > general memory barrier (smp_mb()) on each side of the actual
      > operation ...
      
      Which mean these operations should be fully ordered. However on PPC,
      PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation,
      which is currently "lwsync" if SMP=y. The leading "lwsync" can not
      guarantee fully ordered atomics, according to Paul Mckenney:
      
      https://lkml.org/lkml/2015/10/14/970
      
      To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee
      the fully-ordered semantics.
      
      This also makes futex atomics fully ordered, which can avoid possible
      memory ordering problems if userspace code relies on futex system call
      for fully ordered semantics.
      
      Fixes: b97021f8 ("powerpc: Fix atomic_xxx_return barrier semantics")
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f849cf3
    • Stewart Smith's avatar
      powerpc/powernv: pr_warn_once on unsupported OPAL_MSG type · 17aa8ac5
      Stewart Smith authored
      commit 98da62b7 upstream.
      
      When running on newer OPAL firmware that supports sending extra
      OPAL_MSG types, we would print a warning on *every* message received.
      
      This could be a problem for kernels that don't support OPAL_MSG_OCC
      on machines that are running real close to thermal limits and the
      OCC is throttling the chip. For a kernel that is paying attention to
      the message queue, we could get these notifications quite often.
      
      Conceivably, future message types could also come fairly often,
      and printing that we didn't understand them 10,000 times provides
      no further information than printing them once.
      Signed-off-by: default avatarStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      17aa8ac5
    • Alistair Popple's avatar
      powerpc/opal-irqchip: Fix deadlock introduced by "Fix double endian conversion" · a6699429
      Alistair Popple authored
      commit 036592fb upstream.
      
      Commit 25642e14 ("powerpc/opal-irqchip: Fix double endian
      conversion") fixed an endian bug by calling opal_handle_events() in
      opal_event_unmask().
      
      However this introduced a deadlock if we find an event is active
      during unmasking and call opal_handle_events() again. The bad call
      sequence is:
      
        opal_interrupt()
        -> opal_handle_events()
           -> generic_handle_irq()
              -> handle_level_irq()
                 -> raw_spin_lock(&desc->lock)
                    handle_irq_event(desc)
                    unmask_irq(desc)
                    -> opal_event_unmask()
                       -> opal_handle_events()
                          -> generic_handle_irq()
                             -> handle_level_irq()
                                -> raw_spin_lock(&desc->lock)	(BOOM)
      
      When generating multiple opal events in quick succession this would lead
      to the following stall warnings:
      
      EEH: Fenced PHB#0 detected, location: U78C9.001.WZS09XA-P1-C32
      INFO: rcu_sched detected stalls on CPUs/tasks:
      
               12-...: (1 GPs behind) idle=68f/140000000000001/0 softirq=860/861 fqs=2065
               15-...: (1 GPs behind) idle=be5/140000000000001/0 softirq=1142/1143 fqs=2065
               (detected by 13, t=2102 jiffies, g=1325, c=1324, q=602)
      NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [irqbalance:2696]
      INFO: rcu_sched detected stalls on CPUs/tasks:
               12-...: (1 GPs behind) idle=68f/140000000000001/0 softirq=860/861 fqs=8371
               15-...: (1 GPs behind) idle=be5/140000000000001/0 softirq=1142/1143 fqs=8371
               (detected by 20, t=8407 jiffies, g=1325, c=1324, q=1290)
      
      This patch corrects the problem by queuing the work if an event is
      active during unmasking, which is similar to the pre-endian fix
      behaviour.
      
      Fixes: 25642e14 ("powerpc/opal-irqchip: Fix double endian conversion")
      Signed-off-by: default avatarAlistair Popple <alistair@popple.id.au>
      Reported-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6699429
    • Alistair Popple's avatar
      powerpc/opal-irqchip: Fix double endian conversion · 0f8e324f
      Alistair Popple authored
      commit 25642e14 upstream.
      
      The OPAL event calls return a mask of events that are active in big
      endian format. This is checked when unmasking the events in the
      irqchip by comparison with a cached value. The cached value was stored
      in big endian format but should've been converted to CPU endian
      first.
      
      This bug leads to OPAL event delivery being delayed or dropped on some
      systems. Symptoms may include a non-functional console.
      
      The bug is fixed by calling opal_handle_events(...) instead of
      duplicating code in opal_event_unmask(...).
      
      Fixes: 9f0fd049 ("powerpc/powernv: Add a virtual irqchip for opal events")
      Reported-by: default avatarDouglas L Lehr <dllehr@us.ibm.com>
      Signed-off-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f8e324f
    • Michael Neuling's avatar
      powerpc/tm: Check for already reclaimed tasks · 572c8361
      Michael Neuling authored
      commit 7f821fc9 upstream.
      
      Currently we can hit a scenario where we'll tm_reclaim() twice.  This
      results in a TM bad thing exception because the second reclaim occurs
      when not in suspend mode.
      
      The scenario in which this can happen is the following.  We attempt to
      deliver a signal to userspace.  To do this we need obtain the stack
      pointer to write the signal context.  To get this stack pointer we
      must tm_reclaim() in case we need to use the checkpointed stack
      pointer (see get_tm_stackpointer()).  Normally we'd then return
      directly to userspace to deliver the signal without going through
      __switch_to().
      
      Unfortunatley, if at this point we get an error (such as a bad
      userspace stack pointer), we need to exit the process.  The exit will
      result in a __switch_to().  __switch_to() will attempt to save the
      process state which results in another tm_reclaim().  This
      tm_reclaim() now causes a TM Bad Thing exception as this state has
      already been saved and the processor is no longer in TM suspend mode.
      Whee!
      
      This patch checks the state of the MSR to ensure we are TM suspended
      before we attempt the tm_reclaim().  If we've already saved the state
      away, we should no longer be in TM suspend mode.  This has the
      additional advantage of checking for a potential TM Bad Thing
      exception.
      
      Found using syscall fuzzer.
      
      Fixes: fb09692e ("powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes")
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      572c8361
    • Michael Neuling's avatar
      powerpc/tm: Block signal return setting invalid MSR state · d5b580ef
      Michael Neuling authored
      commit d2b9d2a5 upstream.
      
      Currently we allow both the MSR T and S bits to be set by userspace on
      a signal return.  Unfortunately this is a reserved configuration and
      will cause a TM Bad Thing exception if attempted (via rfid).
      
      This patch checks for this case in both the 32 and 64 bit signals
      code.  If both T and S are set, we mark the context as invalid.
      
      Found using a syscall fuzzer.
      
      Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context")
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5b580ef
    • Dan Streetman's avatar
      xfrm: dst_entries_init() per-net dst_ops · 1bd631fc
      Dan Streetman authored
      [ Upstream commit a8a572a6 ]
      
      Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops
      templates; their dst_entries counters will never be used.  Move the
      xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to
      xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init
      and dst_entries_destroy for each net namespace.
      
      The ipv4 and ipv6 xfrms each create dst_ops template, and perform
      dst_entries_init on the templates.  The template values are copied to each
      net namespace's xfrm.xfrm*_dst_ops.  The problem there is the dst_ops
      pcpuc_entries field is a percpu counter and cannot be used correctly by
      simply copying it to another object.
      
      The result of this is a very subtle bug; changes to the dst entries
      counter from one net namespace may sometimes get applied to a different
      net namespace dst entries counter.  This is because of how the percpu
      counter works; it has a main count field as well as a pointer to the
      percpu variables.  Each net namespace maintains its own main count
      variable, but all point to one set of percpu variables.  When any net
      namespace happens to change one of the percpu variables to outside its
      small batch range, its count is moved to the net namespace's main count
      variable.  So with multiple net namespaces operating concurrently, the
      dst_ops entries counter can stray from the actual value that it should
      be; if counts are consistently moved from one net namespace to another
      (which my testing showed is likely), then one net namespace winds up
      with a negative dst_ops count while another winds up with a continually
      increasing count, eventually reaching its gc_thresh limit, which causes
      all new traffic on the net namespace to fail with -ENOBUFS.
      Signed-off-by: default avatarDan Streetman <dan.streetman@canonical.com>
      Signed-off-by: default avatarDan Streetman <ddstreet@ieee.org>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bd631fc
    • Ido Schimmel's avatar
      team: Replace rcu_read_lock with a mutex in team_vlan_rx_kill_vid · 0fe318bb
      Ido Schimmel authored
      [ Upstream commit 60a6531b ]
      
      We can't be within an RCU read-side critical section when deleting
      VLANs, as underlying drivers might sleep during the hardware operation.
      Therefore, replace the RCU critical section with a mutex. This is
      consistent with team_vlan_rx_add_vid.
      
      Fixes: 3d249d4c ("net: introduce ethernet teaming device")
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0fe318bb
    • Doron Tsur's avatar
      net/mlx5_core: Fix trimming down IRQ number · 1940d0b0
      Doron Tsur authored
      [ Upstream commit 0b6e26ce ]
      
      With several ConnectX-4 cards installed on a server, one may receive
      irqn > 255 from the kernel API, which we mistakenly trim to 8bit.
      
      This causes EQ creation failure with the following stack trace:
      [<ffffffff812a11f4>] dump_stack+0x48/0x64
      [<ffffffff810ace21>] __setup_irq+0x3a1/0x4f0
      [<ffffffff810ad7e0>] request_threaded_irq+0x120/0x180
      [<ffffffffa0923660>] ? mlx5_eq_int+0x450/0x450 [mlx5_core]
      [<ffffffffa0922f64>] mlx5_create_map_eq+0x1e4/0x2b0 [mlx5_core]
      [<ffffffffa091de01>] alloc_comp_eqs+0xb1/0x180 [mlx5_core]
      [<ffffffffa091ea99>] mlx5_dev_init+0x5e9/0x6e0 [mlx5_core]
      [<ffffffffa091ec29>] init_one+0x99/0x1c0 [mlx5_core]
      [<ffffffff812e2afc>] local_pci_probe+0x4c/0xa0
      
      Fixing it by changing of the irqn type from u8 to unsigned int to
      support values > 255
      
      Fixes: 61d0e73e ('net/mlx5_core: Use the the real irqn in eq->irqn')
      Reported-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDoron Tsur <doront@mellanox.com>
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1940d0b0
    • Sven Eckelmann's avatar
      batman-adv: Drop immediate orig_node free function · 3f3f7cab
      Sven Eckelmann authored
      [ Upstream commit 42eff6a6 ]
      
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_orig_node_free_ref.
      
      Fixes: 72822225 ("batman-adv: Fix rcu_barrier() miss due to double call_rcu() in TT code")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f3f7cab
    • Sven Eckelmann's avatar
      batman-adv: Drop immediate batadv_hard_iface free function · 7c2b5cfe
      Sven Eckelmann authored
      [ Upstream commit b4d922cf ]
      
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_hardif_free_ref.
      
      Fixes: 89652331 ("batman-adv: split tq information in neigh_node struct")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c2b5cfe
    • Sven Eckelmann's avatar
      batman-adv: Drop immediate neigh_ifinfo free function · 5947b88c
      Sven Eckelmann authored
      [ Upstream commit ae3e1e36 ]
      
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_neigh_ifinfo_free_ref.
      
      Fixes: 89652331 ("batman-adv: split tq information in neigh_node struct")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5947b88c
    • Sven Eckelmann's avatar
      batman-adv: Drop immediate batadv_neigh_node free function · 6b283f6a
      Sven Eckelmann authored
      [ Upstream commit 2baa753c ]
      
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_neigh_node_free_ref.
      
      Fixes: 89652331 ("batman-adv: split tq information in neigh_node struct")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b283f6a
    • Sven Eckelmann's avatar
      batman-adv: Drop immediate batadv_orig_ifinfo free function · b8b82bfd
      Sven Eckelmann authored
      [ Upstream commit deed9660 ]
      
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_orig_ifinfo_free_ref.
      
      Fixes: 7351a482 ("batman-adv: split out router from orig_node")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8b82bfd
    • Sven Eckelmann's avatar
      batman-adv: Avoid recursive call_rcu for batadv_nc_node · fd937fb1
      Sven Eckelmann authored
      [ Upstream commit 44e8e7e9 ]
      
      The batadv_nc_node_free_ref function uses call_rcu to delay the free of the
      batadv_nc_node object until no (already started) rcu_read_lock is enabled
      anymore. This makes sure that no context is still trying to access the
      object which should be removed. But batadv_nc_node also contains a
      reference to orig_node which must be removed.
      
      The reference drop of orig_node was done in the call_rcu function
      batadv_nc_node_free_rcu but should actually be done in the
      batadv_nc_node_release function to avoid nested call_rcus. This is
      important because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will
      not detect the inner call_rcu as relevant for its execution. Otherwise this
      barrier will most likely be inserted in the queue before the callback of
      the first call_rcu was executed. The caller of rcu_barrier will therefore
      continue to run before the inner call_rcu callback finished.
      
      Fixes: d56b1705 ("batman-adv: network coding - detect coding nodes and remove these after timeout")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd937fb1
    • Sven Eckelmann's avatar
      batman-adv: Avoid recursive call_rcu for batadv_bla_claim · cd5a5c3b
      Sven Eckelmann authored
      [ Upstream commit 63b39927 ]
      
      The batadv_claim_free_ref function uses call_rcu to delay the free of the
      batadv_bla_claim object until no (already started) rcu_read_lock is enabled
      anymore. This makes sure that no context is still trying to access the
      object which should be removed. But batadv_bla_claim also contains a
      reference to backbone_gw which must be removed.
      
      The reference drop of backbone_gw was done in the call_rcu function
      batadv_claim_free_rcu but should actually be done in the
      batadv_claim_release function to avoid nested call_rcus. This is important
      because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will not
      detect the inner call_rcu as relevant for its execution. Otherwise this
      barrier will most likely be inserted in the queue before the callback of
      the first call_rcu was executed. The caller of rcu_barrier will therefore
      continue to run before the inner call_rcu callback finished.
      
      Fixes: 23721387 ("batman-adv: add basic bridge loop avoidance code")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Acked-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd5a5c3b
    • Ben Hutchings's avatar
      ppp, slip: Validate VJ compression slot parameters completely · 6b4fa561
      Ben Hutchings authored
      [ Upstream commit 4ab42d78 ]
      
      Currently slhc_init() treats out-of-range values of rslots and tslots
      as equivalent to 0, except that if tslots is too large it will
      dereference a null pointer (CVE-2015-7799).
      
      Add a range-check at the top of the function and make it return an
      ERR_PTR() on error instead of NULL.  Change the callers accordingly.
      
      Compile-tested only.
      Reported-by: default avatar郭永刚 <guoyonggang@360.cn>
      References: http://article.gmane.org/gmane.comp.security.oss.general/17908Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b4fa561
    • Ben Hutchings's avatar
      isdn_ppp: Add checks for allocation failure in isdn_ppp_open() · e9af90c4
      Ben Hutchings authored
      [ Upstream commit 0baa57d8 ]
      
      Compile-tested only.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9af90c4
    • Nikolay Aleksandrov's avatar
      bridge: fix lockdep addr_list_lock false positive splat · 58aae144
      Nikolay Aleksandrov authored
      [ Upstream commit c6894dec ]
      
      After promisc mode management was introduced a bridge device could do
      dev_set_promiscuity from its ndo_change_rx_flags() callback which in
      turn can be called after the bridge's addr_list_lock has been taken
      (e.g. by dev_uc_add). This causes a false positive lockdep splat because
      the port interfaces' addr_list_lock is taken when br_manage_promisc()
      runs after the bridge's addr list lock was already taken.
      To remove the false positive introduce a custom bridge addr_list_lock
      class and set it on bridge init.
      A simple way to reproduce this is with the following:
      $ brctl addbr br0
      $ ip l add l br0 br0.100 type vlan id 100
      $ ip l set br0 up
      $ ip l set br0.100 up
      $ echo 1 > /sys/class/net/br0/bridge/vlan_filtering
      $ brctl addif br0 eth0
      Splat:
      [   43.684325] =============================================
      [   43.684485] [ INFO: possible recursive locking detected ]
      [   43.684636] 4.4.0-rc8+ #54 Not tainted
      [   43.684755] ---------------------------------------------
      [   43.684906] brctl/1187 is trying to acquire lock:
      [   43.685047]  (_xmit_ETHER){+.....}, at: [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40
      [   43.685460]  but task is already holding lock:
      [   43.685618]  (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80
      [   43.686015]  other info that might help us debug this:
      [   43.686316]  Possible unsafe locking scenario:
      
      [   43.686743]        CPU0
      [   43.686967]        ----
      [   43.687197]   lock(_xmit_ETHER);
      [   43.687544]   lock(_xmit_ETHER);
      [   43.687886] *** DEADLOCK ***
      
      [   43.688438]  May be due to missing lock nesting notation
      
      [   43.688882] 2 locks held by brctl/1187:
      [   43.689134]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81510317>] rtnl_lock+0x17/0x20
      [   43.689852]  #1:  (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80
      [   43.690575] stack backtrace:
      [   43.690970] CPU: 0 PID: 1187 Comm: brctl Not tainted 4.4.0-rc8+ #54
      [   43.691270] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
      [   43.691770]  ffffffff826a25c0 ffff8800369fb8e0 ffffffff81360ceb ffffffff826a25c0
      [   43.692425]  ffff8800369fb9b8 ffffffff810d0466 ffff8800369fb968 ffffffff81537139
      [   43.693071]  ffff88003a08c880 0000000000000000 00000000ffffffff 0000000002080020
      [   43.693709] Call Trace:
      [   43.693931]  [<ffffffff81360ceb>] dump_stack+0x4b/0x70
      [   43.694199]  [<ffffffff810d0466>] __lock_acquire+0x1e46/0x1e90
      [   43.694483]  [<ffffffff81537139>] ? netlink_broadcast_filtered+0x139/0x3e0
      [   43.694789]  [<ffffffff8153b5da>] ? nlmsg_notify+0x5a/0xc0
      [   43.695064]  [<ffffffff810d10f5>] lock_acquire+0xe5/0x1f0
      [   43.695340]  [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40
      [   43.695623]  [<ffffffff815edea5>] _raw_spin_lock_bh+0x45/0x80
      [   43.695901]  [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40
      [   43.696180]  [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40
      [   43.696460]  [<ffffffff8150189c>] dev_set_promiscuity+0x3c/0x50
      [   43.696750]  [<ffffffffa0586845>] br_port_set_promisc+0x25/0x50 [bridge]
      [   43.697052]  [<ffffffffa05869aa>] br_manage_promisc+0x8a/0xe0 [bridge]
      [   43.697348]  [<ffffffffa05826ee>] br_dev_change_rx_flags+0x1e/0x20 [bridge]
      [   43.697655]  [<ffffffff81501532>] __dev_set_promiscuity+0x132/0x1f0
      [   43.697943]  [<ffffffff81501672>] __dev_set_rx_mode+0x82/0x90
      [   43.698223]  [<ffffffff815072de>] dev_uc_add+0x5e/0x80
      [   43.698498]  [<ffffffffa05b3c62>] vlan_device_event+0x542/0x650 [8021q]
      [   43.698798]  [<ffffffff8109886d>] notifier_call_chain+0x5d/0x80
      [   43.699083]  [<ffffffff810988b6>] raw_notifier_call_chain+0x16/0x20
      [   43.699374]  [<ffffffff814f456e>] call_netdevice_notifiers_info+0x6e/0x80
      [   43.699678]  [<ffffffff814f4596>] call_netdevice_notifiers+0x16/0x20
      [   43.699973]  [<ffffffffa05872be>] br_add_if+0x47e/0x4c0 [bridge]
      [   43.700259]  [<ffffffffa058801e>] add_del_if+0x6e/0x80 [bridge]
      [   43.700548]  [<ffffffffa0588b5f>] br_dev_ioctl+0xaf/0xc0 [bridge]
      [   43.700836]  [<ffffffff8151a7ac>] dev_ifsioc+0x30c/0x3c0
      [   43.701106]  [<ffffffff8151aac9>] dev_ioctl+0xf9/0x6f0
      [   43.701379]  [<ffffffff81254345>] ? mntput_no_expire+0x5/0x450
      [   43.701665]  [<ffffffff812543ee>] ? mntput_no_expire+0xae/0x450
      [   43.701947]  [<ffffffff814d7b02>] sock_do_ioctl+0x42/0x50
      [   43.702219]  [<ffffffff814d8175>] sock_ioctl+0x1e5/0x290
      [   43.702500]  [<ffffffff81242d0b>] do_vfs_ioctl+0x2cb/0x5c0
      [   43.702771]  [<ffffffff81243079>] SyS_ioctl+0x79/0x90
      [   43.703033]  [<ffffffff815eebb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
      
      CC: Vlad Yasevich <vyasevic@redhat.com>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      CC: Bridge list <bridge@lists.linux-foundation.org>
      CC: Andy Gospodarek <gospo@cumulusnetworks.com>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Fixes: 2796d0c6 ("bridge: Automatically manage port promiscuous mode.")
      Reported-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58aae144
    • Eric Dumazet's avatar
      ipv6: update skb->csum when CE mark is propagated · 65530976
      Eric Dumazet authored
      [ Upstream commit 34ae6a1a ]
      
      When a tunnel decapsulates the outer header, it has to comply
      with RFC 6080 and eventually propagate CE mark into inner header.
      
      It turns out IP6_ECN_set_ce() does not correctly update skb->csum
      for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure"
      messages and stack traces.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65530976
    • Rabin Vincent's avatar
      net: bpf: reject invalid shifts · f2da6274
      Rabin Vincent authored
      [ Upstream commit 229394e8 ]
      
      On ARM64, a BUG() is triggered in the eBPF JIT if a filter with a
      constant shift that can't be encoded in the immediate field of the
      UBFM/SBFM instructions is passed to the JIT.  Since these shifts
      amounts, which are negative or >= regsize, are invalid, reject them in
      the eBPF verifier and the classic BPF filter checker, for all
      architectures.
      Signed-off-by: default avatarRabin Vincent <rabin@rab.in>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2da6274