1. 05 Jul, 2017 40 commits
    • Robin Murphy's avatar
      iommu: Handle default domain attach failure · d7fcb303
      Robin Murphy authored
      commit 797a8b4d upstream.
      
      We wouldn't normally expect ops->attach_dev() to fail, but on IOMMUs
      with limited hardware resources, or generally misconfigured systems,
      it is certainly possible. We report failure correctly from the external
      iommu_attach_device() interface, but do not do so in iommu_group_add()
      when attaching to the default domain. The result of failure there is
      that the device, group and domain all get left in a broken,
      part-configured state which leads to weird errors and misbehaviour down
      the line when IOMMU API calls sort-of-but-don't-quite work.
      
      Check the return value of __iommu_attach_device() on the default domain,
      and refactor the error handling paths to cope with its failure and clean
      up correctly in such cases.
      
      Fixes: e39cb8a3 ("iommu: Make sure a device is always attached to a domain")
      Reported-by: default avatarPunit Agrawal <punit.agrawal@arm.com>
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7fcb303
    • David Dillow's avatar
      iommu/vt-d: Don't over-free page table directories · c19bfc67
      David Dillow authored
      commit f7116e11 upstream.
      
      dma_pte_free_level() recurses down the IOMMU page tables and frees
      directory pages that are entirely contained in the given PFN range.
      Unfortunately, it incorrectly calculates the starting address covered
      by the PTE under consideration, which can lead to it clearing an entry
      that is still in use.
      
      This occurs if we have a scatterlist with an entry that has a length
      greater than 1026 MB and is aligned to 2 MB for both the IOMMU and
      physical addresses. For example, if __domain_mapping() is asked to map a
      two-entry scatterlist with 2 MB and 1028 MB segments to PFN 0xffff80000,
      it will ask if dma_pte_free_pagetable() is asked to PFNs from
      0xffff80200 to 0xffffc05ff, it will also incorrectly clear the PFNs from
      0xffff80000 to 0xffff801ff because of this issue. The current code will
      set level_pfn to 0xffff80200, and 0xffff80200-0xffffc01ff fits inside
      the range being cleared. Properly setting the level_pfn for the current
      level under consideration catches that this PTE is outside of the range
      being cleared.
      
      This patch also changes the value passed into dma_pte_free_level() when
      it recurses. This only affects the first PTE of the range being cleared,
      and is handled by the existing code that ensures we start our cursor no
      lower than start_pfn.
      
      This was found when using dma_map_sg() to map large chunks of contiguous
      memory, which immediatedly led to faults on the first access of the
      erroneously-deleted mappings.
      
      Fixes: 3269ee0b ("intel-iommu: Fix leaks in pagetable freeing")
      Reviewed-by: default avatarBenjamin Serebrin <serebrin@google.com>
      Signed-off-by: default avatarDavid Dillow <dillow@google.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c19bfc67
    • Junxiao Bi's avatar
      ocfs2: o2hb: revert hb threshold to keep compatible · d5c5e8ba
      Junxiao Bi authored
      commit 33496c3c upstream.
      
      Configfs is the interface for ocfs2-tools to set configure to kernel and
      $configfs_dir/cluster/$clustername/heartbeat/dead_threshold is the one
      used to configure heartbeat dead threshold.  Kernel has a default value
      of it but user can set O2CB_HEARTBEAT_THRESHOLD in /etc/sysconfig/o2cb
      to override it.
      
      Commit 45b99773 ("ocfs2/cluster: use per-attribute show and store
      methods") changed heartbeat dead threshold name while ocfs2-tools did
      not, so ocfs2-tools won't set this configurable and the default value is
      always used.  So revert it.
      
      Fixes: 45b99773 ("ocfs2/cluster: use per-attribute show and store methods")
      Link: http://lkml.kernel.org/r/1490665245-15374-1-git-send-email-junxiao.bi@oracle.comSigned-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Acked-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5c5e8ba
    • Andy Lutomirski's avatar
      x86/mm: Fix flush_tlb_page() on Xen · 8af88a95
      Andy Lutomirski authored
      commit dbd68d8e upstream.
      
      flush_tlb_page() passes a bogus range to flush_tlb_others() and
      expects the latter to fix it up.  native_flush_tlb_others() has the
      fixup but Xen's version doesn't.  Move the fixup to
      flush_tlb_others().
      
      AFAICS the only real effect is that, without this fix, Xen would
      flush everything instead of just the one page on remote vCPUs in
      when flush_tlb_page() was called.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: e7b52ffd ("x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range")
      Link: http://lkml.kernel.org/r/10ed0e4dfea64daef10b87fb85df1746999b4dba.1492844372.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8af88a95
    • Joerg Roedel's avatar
      x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space · 3667dafd
      Joerg Roedel authored
      commit 5ed386ec upstream.
      
      When this function fails it just sends a SIGSEGV signal to
      user-space using force_sig(). This signal is missing
      essential information about the cause, e.g. the trap_nr or
      an error code.
      
      Fix this by propagating the error to the only caller of
      mpx_handle_bd_fault(), do_bounds(), which sends the correct
      SIGSEGV signal to the process.
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: fe3d197f ('x86, mpx: On-demand kernel allocation of bounds tables')
      Link: http://lkml.kernel.org/r/1491488362-27198-1-git-send-email-joro@8bytes.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3667dafd
    • Baoquan He's avatar
      x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug · b287ade8
      Baoquan He authored
      commit 8eabf42a upstream.
      
      Kernel text KASLR is separated into physical address and virtual
      address randomization. And for virtual address randomization, we
      only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE.
      So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR,
      but not the original kernel loading address 'output'.
      
      The bug will cause kernel boot failure if kernel is loaded at a different
      position than the address, 16M, which is decided at compiled time.
      Kexec/kdump is such practical case.
      
      To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial
      value.
      Tested-by: default avatarDave Young <dyoung@redhat.com>
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 8391c73c ("x86/KASLR: Randomize virtual address separately")
      Link: http://lkml.kernel.org/r/1498567146-11990-3-git-send-email-bhe@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b287ade8
    • Arnaldo Carvalho de Melo's avatar
      tools arch: Sync arch/x86/lib/memcpy_64.S with the kernel · 15541e64
      Arnaldo Carvalho de Melo authored
      commit e883d09c upstream.
      
      Just a minor fix done in:
      
        Fixes: 26a37ab3 ("x86/mce: Fix copy/paste error in exception table entries")
      
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/n/tip-ni9jzdd5yxlail6pq8cuexw2@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      15541e64
    • Doug Berger's avatar
      ARM: 8685/1: ensure memblock-limit is pmd-aligned · a2c222be
      Doug Berger authored
      commit 9e25ebfe upstream.
      
      The pmd containing memblock_limit is cleared by prepare_page_table()
      which creates the opportunity for early_alloc() to allocate unmapped
      memory if memblock_limit is not pmd aligned causing a boot-time hang.
      
      Commit 965278dc ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM")
      attempted to resolve this problem, but there is a path through the
      adjust_lowmem_bounds() routine where if all memory regions start and
      end on pmd-aligned addresses the memblock_limit will be set to
      arm_lowmem_limit.
      
      Since arm_lowmem_limit can be affected by the vmalloc early parameter,
      the value of arm_lowmem_limit may not be pmd-aligned. This commit
      corrects this oversight such that memblock_limit is always rounded
      down to pmd-alignment.
      
      Fixes: 965278dc ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM")
      Signed-off-by: default avatarDoug Berger <opendmb@gmail.com>
      Suggested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2c222be
    • Lorenzo Pieralisi's avatar
      ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation · 7661b196
      Lorenzo Pieralisi authored
      commit cb7cf772 upstream.
      
      The BAD_MADT_GICC_ENTRY() macro checks if a GICC MADT entry passes
      muster from an ACPI specification standpoint. Current macro detects the
      MADT GICC entry length through ACPI firmware version (it changed from 76
      to 80 bytes in the transition from ACPI 5.1 to ACPI 6.0 specification)
      but always uses (erroneously) the ACPICA (latest) struct (ie struct
      acpi_madt_generic_interrupt - that is 80-bytes long) length to check if
      the current GICC entry memory record exceeds the MADT table end in
      memory as defined by the MADT table header itself, which may result in
      false negatives depending on the ACPI firmware version and how the MADT
      entries are laid out in memory (ie on ACPI 5.1 firmware MADT GICC
      entries are 76 bytes long, so by adding 80 to a GICC entry start address
      in memory the resulting address may well be past the actual MADT end,
      triggering a false negative).
      
      Fix the BAD_MADT_GICC_ENTRY() macro by reshuffling the condition checks
      and update them to always use the firmware version specific MADT GICC
      entry length in order to carry out boundary checks.
      
      Fixes: b6cfb277 ("ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro")
      Reported-by: default avatarJulien Grall <julien.grall@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Julien Grall <julien.grall@arm.com>
      Cc: Hanjun Guo <hanjun.guo@linaro.org>
      Cc: Al Stone <ahs3@redhat.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7661b196
    • Adam Ford's avatar
      ARM: dts: OMAP3: Fix MFG ID EEPROM · 4efe34b5
      Adam Ford authored
      commit 06e1a5cc upstream.
      
      The manufacturing information is stored in the EEPROM.  This chip
      is an AT24C64 not not (nor has it ever been) 24C02.  This patch will
      correctly address the EEPROM to read the entire contents and not just
      256 bytes (of 0xff).
      
      Fixes: 5e3447a2 ("ARM: dts: LogicPD Torpedo: Add AT24 EEPROM Support")
      Signed-off-by: default avatarAdam Ford <aford173@gmail.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4efe34b5
    • Dave Gerlach's avatar
      ARM: OMAP2+: omap_device: Sync omap_device and pm_runtime after probe defer · 07bb2c7e
      Dave Gerlach authored
      commit 04abaf07 upstream.
      
      Starting from commit 5de85b9d ("PM / runtime: Re-init runtime PM
      states at probe error and driver unbind") pm_runtime core now changes
      device runtime_status back to after RPM_SUSPENDED after a probe defer.
      Certain OMAP devices make use of "ti,no-idle-on-init" flag which causes
      omap_device_enable to be called during the BUS_NOTIFY_ADD_DEVICE event
      during probe, along with pm_runtime_set_active.
      
      This call to pm_runtime_set_active typically will prevent a call to
      pm_runtime_get in a driver probe function from re-enabling the
      omap_device. However, in the case of a probe defer that happens before
      the driver probe function is able to run, such as a missing pinctrl
      states defer, pm_runtime_reinit will set the device as RPM_SUSPENDED and
      then once driver probe is actually able to run, pm_runtime_get will see
      the device as suspended and call through to the omap_device layer,
      attempting to enable the already enabled omap_device and causing errors
      like this:
      
      omap-gpmc 50000000.gpmc: omap_device: omap_device_enable() called from
      invalid state 1
      omap-gpmc 50000000.gpmc: use pm_runtime_put_sync_suspend() in driver?
      
      We can avoid this error by making sure the pm_runtime status of a device
      matches the omap_device state before a probe attempt. By extending the
      omap_device bus notifier to act on the BUS_NOTIFY_BIND_DRIVER event we
      can check if a device is enabled in omap_device but with a pm_runtime
      status of RPM_SUSPENDED and once again mark the device as RPM_ACTIVE to
      avoid a second incorrect call to omap_device_enable.
      
      Fixes: 5de85b9d ("PM / runtime: Re-init runtime PM states at probe
      error and driver unbind")
      Tested-by: default avatarFranklin S Cooper Jr. <fcooper@ti.com>
      Signed-off-by: default avatarDave Gerlach <d-gerlach@ti.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07bb2c7e
    • Andrew F. Davis's avatar
      regulator: tps65086: Fix DT node referencing in of_parse_cb · e57aa416
      Andrew F. Davis authored
      commit 6308f178 upstream.
      
      When we check for additional DT properties in the current node we
      use the device_node passed in with the configuration data, this
      will not point to the correct DT node, use the one passed in
      for this purpose.
      
      Fixes: d2a2e729 ("regulator: tps65086: Add regulator driver for the TPS65086 PMIC")
      Reported-by: default avatarSteven Kipisz <s-kipisz2@ti.com>
      Signed-off-by: default avatarAndrew F. Davis <afd@ti.com>
      Tested-by: default avatarSteven Kipisz <s-kipisz2@ti.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e57aa416
    • Andrew F. Davis's avatar
      regulator: tps65086: Fix expected switch DT node names · 88baad2e
      Andrew F. Davis authored
      commit 1c47f7c3 upstream.
      
      The three load switches are called SWA1, SWB1, and SWB2. The
      node names describing properties for these are expected to be
      the same, but due to a typo they are not. Fix this here.
      
      Fixes: d2a2e729 ("regulator: tps65086: Add regulator driver for the TPS65086 PMIC")
      Reported-by: default avatarSteven Kipisz <s-kipisz2@ti.com>
      Signed-off-by: default avatarAndrew F. Davis <afd@ti.com>
      Tested-by: default avatarSteven Kipisz <s-kipisz2@ti.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88baad2e
    • Johan Hovold's avatar
      spi: fix device-node leaks · 9846c679
      Johan Hovold authored
      commit 8324147f upstream.
      
      Make sure to release the device-node reference taken in
      of_register_spi_device() on errors and when deregistering the device.
      
      Fixes: 284b0189 ("spi: Add OF binding support for SPI busses")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9846c679
    • Daniel Kurtz's avatar
      spi: When no dma_chan map buffers with spi_master's parent · c52829f6
      Daniel Kurtz authored
      commit 88b0aa54 upstream.
      
      Back before commit 1dccb598 ("arm64: simplify dma_get_ops"), for
      arm64, devices for which dma_ops were not explicitly set were automatically
      configured to use swiotlb_dma_ops, since this was hard-coded as the
      global "dma_ops" in arm64_dma_init().
      
      Now that global "dma_ops" has been removed, all devices much have their
      dma_ops explicitly set by a call to arch_setup_dma_ops(), otherwise the
      device is assigned dummy_dma_ops, and thus calls to map_sg for such a
      device will fail (return 0).
      
      Mediatek SPI uses DMA but does not use a dma channel.  Support for this
      was added by commit c37f45b5 ("spi: support spi without dma channel
      to use can_dma()"), which uses the master_spi dev to DMA map buffers.
      
      The master_spi device is not a platform device, rather it is created
      in spi_alloc_device(), and therefore its dma_ops are never set.
      
      Therefore, when the mediatek SPI driver when it does DMA (for large SPI
      transactions > 32 bytes), SPI will use spi_map_buf()->dma_map_sg() to
      map the buffer for use in DMA.  But dma_map_sg()->dma_map_sg_attrs() returns
      0, because ops->map_sg is dummy_dma_ops->__dummy_map_sg, and hence
      spi_map_buf() returns -ENOMEM (-12).
      
      Fix this by using the real spi_master's parent device which should be a
      real physical device with DMA properties.
      Signed-off-by: default avatarDaniel Kurtz <djkurtz@chromium.org>
      Fixes: c37f45b5 ("spi: support spi without dma channel to use can_dma()")
      Cc: Leilk Liu <leilk.liu@mediatek.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c52829f6
    • Matt Fleming's avatar
      sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting · 478273e1
      Matt Fleming authored
      commit 6e5f32f7 upstream.
      
      If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
      the pending sample window time on exit, setting the next update not
      one window into the future, but two.
      
      This situation on exiting NO_HZ is described by:
      
        this_rq->calc_load_update < jiffies < calc_load_update
      
      In this scenario, what we should be doing is:
      
        this_rq->calc_load_update = calc_load_update		     [ next window ]
      
      But what we actually do is:
      
        this_rq->calc_load_update = calc_load_update + LOAD_FREQ   [ next+1 window ]
      
      This has the effect of delaying load average updates for potentially
      up to ~9seconds.
      
      This can result in huge spikes in the load average values due to
      per-cpu uninterruptible task counts being out of sync when accumulated
      across all CPUs.
      
      It's safe to update the per-cpu active count if we wake between sample
      windows because any load that we left in 'calc_load_idle' will have
      been zero'd when the idle load was folded in calc_global_load().
      
      This issue is easy to reproduce before,
      
        commit 9d89c257 ("sched/fair: Rewrite runnable load and utilization average tracking")
      
      just by forking short-lived process pipelines built from ps(1) and
      grep(1) in a loop. I'm unable to reproduce the spikes after that
      commit, but the bug still seems to be present from code review.
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Fixes: commit 5167e8d5 ("sched/nohz: Rewrite and fix load-avg computation -- again")
      Link: http://lkml.kernel.org/r/20170217120731.11868-2-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      478273e1
    • Eric Anholt's avatar
      watchdog: bcm281xx: Fix use of uninitialized spinlock. · eea0261d
      Eric Anholt authored
      commit fedf266f upstream.
      
      The bcm_kona_wdt_set_resolution_reg() call takes the spinlock, so
      initialize it earlier.  Fixes a warning at boot with lock debugging
      enabled.
      
      Fixes: 6adb730d ("watchdog: bcm281xx: Watchdog Driver")
      Signed-off-by: default avatarEric Anholt <eric@anholt.net>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWim Van Sebroeck <wim@iguana.be>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eea0261d
    • Florian Westphal's avatar
      netfilter: use skb_to_full_sk in ip_route_me_harder · 4211442b
      Florian Westphal authored
      commit 29e09229 upstream.
      
      inet_sk(skb->sk) is illegal in case skb is attached to request socket.
      
      Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener")
      Reported by: Daniel J Blueman <daniel@quora.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Tested-by: default avatarDaniel J Blueman <daniel@quora.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4211442b
    • Dan Carpenter's avatar
      xfrm: Oops on error in pfkey_msg2xfrm_state() · ac273023
      Dan Carpenter authored
      commit 1e3d0c2c upstream.
      
      There are some missing error codes here so we accidentally return NULL
      instead of an error pointer.  It results in a NULL pointer dereference.
      
      Fixes: df71837d ("[LSM-IPSec]: Security association restriction.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac273023
    • Dan Carpenter's avatar
      xfrm: NULL dereference on allocation failure · c460f2be
      Dan Carpenter authored
      commit e747f643 upstream.
      
      The default error code in pfkey_msg2xfrm_state() is -ENOBUFS.  We
      added a new call to security_xfrm_state_alloc() which sets "err" to zero
      so there several places where we can return ERR_PTR(0) if kmalloc()
      fails.  The caller is expecting error pointers so it leads to a NULL
      dereference.
      
      Fixes: df71837d ("[LSM-IPSec]: Security association restriction.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c460f2be
    • Sabrina Dubroca's avatar
      xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY · 1e166625
      Sabrina Dubroca authored
      commit 9b3eb541 upstream.
      
      When CONFIG_XFRM_SUB_POLICY=y, xfrm_dst stores a copy of the flowi for
      that dst. Unfortunately, the code that allocates and fills this copy
      doesn't care about what type of flowi (flowi, flowi4, flowi6) gets
      passed. In multiple code paths (from raw_sendmsg, from TCP when
      replying to a FIN, in vxlan, geneve, and gre), the flowi that gets
      passed to xfrm is actually an on-stack flowi4, so we end up reading
      stuff from the stack past the end of the flowi4 struct.
      
      Since xfrm_dst->origin isn't used anywhere following commit
      ca116922 ("xfrm: Eliminate "fl" and "pol" args to
      xfrm_bundle_ok()."), just get rid of it.  xfrm_dst->partner isn't used
      either, so get rid of that too.
      
      Fixes: 9d6ec938 ("ipv4: Use flowi4 in public route lookup interfaces.")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e166625
    • Ard Biesheuvel's avatar
      mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings · 647f6052
      Ard Biesheuvel authored
      commit 029c54b0 upstream.
      
      Existing code that uses vmalloc_to_page() may assume that any address
      for which is_vmalloc_addr() returns true may be passed into
      vmalloc_to_page() to retrieve the associated struct page.
      
      This is not un unreasonable assumption to make, but on architectures
      that have CONFIG_HAVE_ARCH_HUGE_VMAP=y, it no longer holds, and we need
      to ensure that vmalloc_to_page() does not go off into the weeds trying
      to dereference huge PUDs or PMDs as table entries.
      
      Given that vmalloc() and vmap() themselves never create huge mappings or
      deal with compound pages at all, there is no correct answer in this
      case, so return NULL instead, and issue a warning.
      
      When reading /proc/kcore on arm64, you will hit an oops as soon as you
      hit the huge mappings used for the various segments that make up the
      mapping of vmlinux.  With this patch applied, you will no longer hit the
      oops, but the kcore contents willl be incorrect (these regions will be
      zeroed out)
      
      We are fixing this for kcore specifically, so it avoids vread() for
      those regions.  At least one other problematic user exists, i.e.,
      /dev/kmem, but that is currently broken on arm64 for other reasons.
      
      Link: http://lkml.kernel.org/r/20170609082226.26152-1-ard.biesheuvel@linaro.orgSigned-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarLaura Abbott <labbott@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: zhong jiang <zhongjiang@huawei.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [ardb: non-trivial backport to v4.9]
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      647f6052
    • Eugeniu Rosca's avatar
      ravb: Fix use-after-free on `ifconfig eth0 down` · f9f73c58
      Eugeniu Rosca authored
      
      [ Upstream commit 79514ef6 ]
      
      Commit a47b70ea ("ravb: unmap descriptors when freeing rings") has
      introduced the issue seen in [1] reproduced on H3ULCB board.
      
      Fix this by relocating the RX skb ringbuffer free operation, so that
      swiotlb page unmapping can be done first. Freeing of aligned TX buffers
      is not relevant to the issue seen in [1]. Still, reposition TX free
      calls as well, to have all kfree() operations performed consistently
      _after_ dma_unmap_*()/dma_free_*().
      
      [1] Console screenshot with the problem reproduced:
      
      salvator-x login: root
      root@salvator-x:~# ifconfig eth0 up
      Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: \
             attached PHY driver [Micrel KSZ9031 Gigabit PHY]   \
             (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=235)
      IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
      root@salvator-x:~#
      root@salvator-x:~# ifconfig eth0 down
      
      ==================================================================
      BUG: KASAN: use-after-free in swiotlb_tbl_unmap_single+0xc4/0x35c
      Write of size 1538 at addr ffff8006d884f780 by task ifconfig/1649
      
      CPU: 0 PID: 1649 Comm: ifconfig Not tainted 4.12.0-rc4-00004-g112eb072 #32
      Hardware name: Renesas H3ULCB board based on r8a7795 (DT)
      Call trace:
      [<ffff20000808f11c>] dump_backtrace+0x0/0x3a4
      [<ffff20000808f4d4>] show_stack+0x14/0x1c
      [<ffff20000865970c>] dump_stack+0xf8/0x150
      [<ffff20000831f8b0>] print_address_description+0x7c/0x330
      [<ffff200008320010>] kasan_report+0x2e0/0x2f4
      [<ffff20000831eac0>] check_memory_region+0x20/0x14c
      [<ffff20000831f054>] memcpy+0x48/0x68
      [<ffff20000869ed50>] swiotlb_tbl_unmap_single+0xc4/0x35c
      [<ffff20000869fcf4>] unmap_single+0x90/0xa4
      [<ffff20000869fd14>] swiotlb_unmap_page+0xc/0x14
      [<ffff2000080a2974>] __swiotlb_unmap_page+0xcc/0xe4
      [<ffff2000088acdb8>] ravb_ring_free+0x514/0x870
      [<ffff2000088b25dc>] ravb_close+0x288/0x36c
      [<ffff200008aaf8c4>] __dev_close_many+0x14c/0x174
      [<ffff200008aaf9b4>] __dev_close+0xc8/0x144
      [<ffff200008ac2100>] __dev_change_flags+0xd8/0x194
      [<ffff200008ac221c>] dev_change_flags+0x60/0xb0
      [<ffff200008ba2dec>] devinet_ioctl+0x484/0x9d4
      [<ffff200008ba7b78>] inet_ioctl+0x190/0x194
      [<ffff200008a78c44>] sock_do_ioctl+0x78/0xa8
      [<ffff200008a7a128>] sock_ioctl+0x110/0x3c4
      [<ffff200008365a70>] vfs_ioctl+0x90/0xa0
      [<ffff200008365dbc>] do_vfs_ioctl+0x148/0xc38
      [<ffff2000083668f0>] SyS_ioctl+0x44/0x74
      [<ffff200008083770>] el0_svc_naked+0x24/0x28
      
      The buggy address belongs to the page:
      page:ffff7e001b6213c0 count:0 mapcount:0 mapping:          (null) index:0x0
      flags: 0x4000000000000000()
      raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff
      raw: 0000000000000000 ffff7e001b6213e0 0000000000000000 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8006d884f680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff8006d884f700: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      >ffff8006d884f780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                         ^
       ffff8006d884f800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff8006d884f880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      ==================================================================
      Disabling lock debugging due to kernel taint
      root@salvator-x:~#
      
      Fixes: a47b70ea ("ravb: unmap descriptors when freeing rings")
      Signed-off-by: default avatarEugeniu Rosca <erosca@de.adit-jv.com>
      Acked-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9f73c58
    • Peter Dawson's avatar
      ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets · adfe95fe
      Peter Dawson authored
      
      [ Upstream commit 0e9a7095 ]
      
      This fix addresses two problems in the way the DSCP field is formulated
       on the encapsulating header of IPv6 tunnels.
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661
      
      1) The IPv6 tunneling code was manipulating the DSCP field of the
       encapsulating packet using the 32b flowlabel. Since the flowlabel is
       only the lower 20b it was incorrect to assume that the upper 12b
       containing the DSCP and ECN fields would remain intact when formulating
       the encapsulating header. This fix handles the 'inherit' and
       'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.
      
      2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
       incorrect and resulted in the DSCP value always being set to 0.
      
      Commit 90427ef5 ("ipv6: fix flow labels when the traffic class
       is non-0") caused the regression by masking out the flowlabel
       which exposed the incorrect handling of the DSCP portion of the
       flowlabel in ip6_tunnel and ip6_gre.
      
      Fixes: 90427ef5 ("ipv6: fix flow labels when the traffic class is non-0")
      Signed-off-by: default avatarPeter Dawson <peter.a.dawson@boeing.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      adfe95fe
    • Xin Long's avatar
      sctp: check af before verify address in sctp_addr_id2transport · 168bd51e
      Xin Long authored
      
      [ Upstream commit 912964ea ]
      
      Commit 6f29a130 ("sctp: sctp_addr_id2transport should verify the
      addr before looking up assoc") invoked sctp_verify_addr to verify the
      addr.
      
      But it didn't check af variable beforehand, once users pass an address
      with family = 0 through sockopt, sctp_get_af_specific will return NULL
      and NULL pointer dereference will be caused by af->sockaddr_len.
      
      This patch is to fix it by returning NULL if af variable is NULL.
      
      Fixes: 6f29a130 ("sctp: sctp_addr_id2transport should verify the addr before looking up assoc")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      168bd51e
    • Jack Morgenstein's avatar
      net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV · 399566f8
      Jack Morgenstein authored
      
      [ Upstream commit 9577b174 ]
      
      When running SRIOV, warnings for SRQ LIMIT events flood the Hypervisor's
      message log when (correct, normally operating) apps use SRQ LIMIT events
      as a trigger to post WQEs to SRQs.
      
      Add more information to the existing debug printout for SRQ_LIMIT, and
      output the warning messages only for the SRQ CATAS ERROR event.
      
      Fixes: acba2420 ("mlx4_core: Add wrapper functions and comm channel and slave event support to EQs")
      Fixes: e0debf9c ("mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level")
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      399566f8
    • Masami Hiramatsu's avatar
      perf probe: Fix to probe on gcc generated functions in modules · b6f75b98
      Masami Hiramatsu authored
      
      [ Upstream commit 613f050d ]
      
      Fix to probe on gcc generated functions on modules. Since
      probing on a module is based on its symbol name, it should
      be adjusted on actual symbols.
      
      E.g. without this fix, perf probe shows probe definition
      on non-exist symbol as below.
      
        $ perf probe -m build-x86_64/net/netfilter/nf_nat.ko -F in_range*
        in_range.isra.12
        $ perf probe -m build-x86_64/net/netfilter/nf_nat.ko -D in_range
        p:probe/in_range nf_nat:in_range+0
      
      With this fix, perf probe correctly shows a probe on
      gcc-generated symbol.
      
        $ perf probe -m build-x86_64/net/netfilter/nf_nat.ko -D in_range
        p:probe/in_range nf_nat:in_range.isra.12+0
      
      This also fixes same problem on online module as below.
      
        $ perf probe -m i915 -D assert_plane
        p:probe/assert_plane i915:assert_plane.constprop.134+0
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/148411450673.9978.14905987549651656075.stgit@devboxSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b6f75b98
    • Parthasarathy Bhuvaragan's avatar
      tipc: allocate user memory with GFP_KERNEL flag · 9f8ffe4e
      Parthasarathy Bhuvaragan authored
      
      [ Upstream commit 57d5f64d ]
      
      Until now, we allocate memory always with GFP_ATOMIC flag.
      When the system is under memory pressure and a user tries to send,
      the send fails due to low memory. However, the user application
      can wait for free memory if we allocate it using GFP_KERNEL flag.
      
      In this commit, we use allocate memory with GFP_KERNEL for all user
      allocation.
      Reported-by: default avatarRune Torgersen <runet@innovsys.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f8ffe4e
    • Karicheri, Muralidharan's avatar
      net: phy: dp83867: allow RGMII_TXID/RGMII_RXID interface types · 18b200e0
      Karicheri, Muralidharan authored
      
      [ Upstream commit 34c55cf2 ]
      
      Currently dp83867 driver returns error if phy interface type
      PHY_INTERFACE_MODE_RGMII_RXID is used to set the rx only internal
      delay. Similarly issue happens for PHY_INTERFACE_MODE_RGMII_TXID.
      Fix this by checking also the interface type if a particular delay
      value is missing in the phy dt bindings. Also update the DT document
      accordingly.
      Signed-off-by: default avatarMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: default avatarSekhar Nori <nsekhar@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18b200e0
    • Masami Hiramatsu's avatar
      perf probe: Fix to show correct locations for events on modules · e1eac347
      Masami Hiramatsu authored
      
      [ Upstream commit d2d4edbe ]
      
      Fix to show correct locations for events on modules by relocating given
      address instead of retrying after failure.
      
      This happens when the module text size is big enough, bigger than
      sh_addr, because the original code retries with given address + sh_addr
      if it failed to find CU DIE at the given address.
      
      Any address smaller than sh_addr always fails and it retries with the
      correct address, but addresses bigger than sh_addr will get a CU DIE
      which is on the given address (not adjusted by sh_addr).
      
      In my environment(x86-64), the sh_addr of ".text" section is 0x10030.
      Since i915 is a huge kernel module, we can see this issue as below.
      
        $ grep "[Tt] .*\[i915\]" /proc/kallsyms | sort | head -n1
        ffffffffc0270000 t i915_switcheroo_can_switch	[i915]
      
      ffffffffc0270000 + 0x10030 = ffffffffc0280030, so we'll check
      symbols cross this boundary.
      
        $ grep "[Tt] .*\[i915\]" /proc/kallsyms | grep -B1 ^ffffffffc028\
        | head -n 2
        ffffffffc027ff80 t haswell_init_clock_gating	[i915]
        ffffffffc0280110 t valleyview_init_clock_gating	[i915]
      
      So setup probes on both function and see what happen.
      
        $ sudo ./perf probe -m i915 -a haswell_init_clock_gating \
              -a valleyview_init_clock_gating
        Added new events:
          probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915)
          probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe:valleyview_init_clock_gating -aR sleep 1
      
        $ sudo ./perf probe -l
          probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
          probe:valleyview_init_clock_gating (on i915_vga_set_decode:4@gpu/drm/i915/i915_drv.c in i915)
      
      As you can see, haswell_init_clock_gating is correctly shown,
      but valleyview_init_clock_gating is not.
      
      With this patch, both events are shown correctly.
      
        $ sudo ./perf probe -l
          probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
          probe:valleyview_init_clock_gating (on valleyview_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
      
      Committer notes:
      
      In my case:
      
        # perf probe -m i915 -a haswell_init_clock_gating -a valleyview_init_clock_gating
        Added new events:
          probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915)
          probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915)
      
        You can now use it in all perf tools, such as:
      
      	  perf record -e probe:valleyview_init_clock_gating -aR sleep 1
      
        # perf probe -l
          probe:haswell_init_clock_gating (on i915_getparam+432@gpu/drm/i915/i915_drv.c in i915)
          probe:valleyview_init_clock_gating (on __i915_printk+240@gpu/drm/i915/i915_drv.c in i915)
        #
      
        # readelf -SW /lib/modules/4.9.0+/build/vmlinux | egrep -w '.text|Name'
         [Nr] Name   Type      Address          Off    Size   ES Flg Lk Inf Al
         [ 1] .text  PROGBITS  ffffffff81000000 200000 822fd3 00  AX  0   0 4096
        #
      
        So both are b0rked, now with the fix:
      
        # perf probe -m i915 -a haswell_init_clock_gating -a valleyview_init_clock_gating
        Added new events:
          probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915)
          probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915)
      
        You can now use it in all perf tools, such as:
      
      	perf record -e probe:valleyview_init_clock_gating -aR sleep 1
      
        # perf probe -l
          probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
          probe:valleyview_init_clock_gating (on valleyview_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
        #
      
      Both looks correct.
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/148411436777.9978.1440275861947194930.stgit@devboxSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1eac347
    • Ivan Vecera's avatar
      be2net: fix MAC addr setting on privileged BE3 VFs · cc439964
      Ivan Vecera authored
      
      [ Upstream commit 34393529 ]
      
      During interface opening MAC address stored in netdev->dev_addr is
      programmed in the HW with exception of BE3 VFs where the initial
      MAC is programmed by parent PF. This is OK when MAC address is not
      changed when an interfaces is down. In this case the requested MAC is
      stored to netdev->dev_addr and later is stored into HW during opening.
      But this is not done for all BE3 VFs so the NIC HW does not know
      anything about this change and all traffic is filtered.
      
      This is the case of bonding if fail_over_mac == 0 where the MACs of
      the slaves are changed while they are down.
      
      The be2net behavior is too restrictive because if a BE3 VF has
      the FILTMGMT privilege then it is able to modify its MAC without
      any restriction.
      
      To solve the described problem the driver should take care about these
      privileged BE3 VFs so the MAC is programmed during opening. And by
      contrast unpriviled BE3 VFs should not be allowed to change its MAC
      in any case.
      
      Cc: Sathya Perla <sathya.perla@broadcom.com>
      Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
      Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      Cc: Somnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarIvan Vecera <cera@cera.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc439964
    • Ivan Vecera's avatar
      be2net: don't delete MAC on close on unprivileged BE3 VFs · 02434def
      Ivan Vecera authored
      
      [ Upstream commit 6d928ae5 ]
      
      BE3 VFs without FILTMGMT privilege are not allowed to modify its MAC,
      VLAN table and UC/MC lists. So don't try to delete MAC on such VFs.
      
      Cc: Sathya Perla <sathya.perla@broadcom.com>
      Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
      Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      Cc: Somnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarIvan Vecera <cera@cera.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02434def
    • Ivan Vecera's avatar
      be2net: fix status check in be_cmd_pmac_add() · fa1dbf50
      Ivan Vecera authored
      
      [ Upstream commit fe68d8bf ]
      
      Return value from be_mcc_notify_wait() contains a base completion status
      together with an additional status. The base_status() macro need to be
      used to access base status.
      
      Fixes: e3a7ae2c be2net: Changing MAC Address of a VF was broken
      Cc: Sathya Perla <sathya.perla@broadcom.com>
      Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
      Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      Cc: Somnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarIvan Vecera <cera@cera.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa1dbf50
    • Amelie Delaunay's avatar
      usb: dwc2: gadget: Fix GUSBCFG.USBTRDTIM value · 5f54c4e1
      Amelie Delaunay authored
      
      [ Upstream commit ca02954a ]
      
      USBTrdTim must be programmed to 0x5 when phy has a UTMI+ 16-bit wide
      interface or 0x9 when it has a 8-bit wide interface.
      GUSBCFG reset value (Value After Reset: 0x1400) sets USBTrdTim to 0x5.
      In case of 8-bit UTMI+, without clearing GUSBCFG.USBTRDTIM mask, USBTrdTim
      results in 0xD (0x5 | 0x9).
      That's why we need to clear GUSBCFG.USBTRDTIM mask before setting USBTrdTim
      value, to ensure USBTrdTim is correctly set in case of 8-bit UTMI+.
      Signed-off-by: default avatarAmelie Delaunay <amelie.delaunay@st.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f54c4e1
    • Heiko Carstens's avatar
      s390/ctl_reg: make __ctl_load a full memory barrier · 0e9867b7
      Heiko Carstens authored
      
      [ Upstream commit e991c24d ]
      
      We have quite a lot of code that depends on the order of the
      __ctl_load inline assemby and subsequent memory accesses, like
      e.g. disabling lowcore protection and the writing to lowcore.
      
      Since the __ctl_load macro does not have memory barrier semantics, nor
      any other dependencies the compiler is, theoretically, free to shuffle
      code around. Or in other words: storing to lowcore could happen before
      lowcore protection is disabled.
      
      In order to avoid this class of potential bugs simply add a full
      memory barrier to the __ctl_load macro.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0e9867b7
    • Nikita Yushchenko's avatar
      swiotlb: ensure that page-sized mappings are page-aligned · 9d00195b
      Nikita Yushchenko authored
      
      [ Upstream commit 602d9858 ]
      
      Some drivers do depend on page mappings to be page aligned.
      
      Swiotlb already enforces such alignment for mappings greater than page,
      extend that to page-sized mappings as well.
      
      Without this fix, nvme hits BUG() in nvme_setup_prps(), because that routine
      assumes page-aligned mappings.
      Signed-off-by: default avatarNikita Yushchenko <nikita.yoush@cogentembedded.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d00195b
    • Dave Kleikamp's avatar
      coredump: Ensure proper size of sparse core files · 68a5dc38
      Dave Kleikamp authored
      
      [ Upstream commit 4d22c75d ]
      
      If the last section of a core file ends with an unmapped or zero page,
      the size of the file does not correspond with the last dump_skip() call.
      gdb complains that the file is truncated and can be confusing to users.
      
      After all of the vma sections are written, make sure that the file size
      is no smaller than the current file position.
      
      This problem can be demonstrated with gdb's bigcore testcase on the
      sparc architecture.
      Signed-off-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68a5dc38
    • Shaohua Li's avatar
      aio: fix lock dep warning · d21816c2
      Shaohua Li authored
      
      [ Upstream commit a12f1ae6 ]
      
      lockdep reports a warnning. file_start_write/file_end_write only
      acquire/release the lock for regular files. So checking the files in aio
      side too.
      
      [  453.532141] ------------[ cut here ]------------
      [  453.533011] WARNING: CPU: 1 PID: 1298 at ../kernel/locking/lockdep.c:3514 lock_release+0x434/0x670
      [  453.533011] DEBUG_LOCKS_WARN_ON(depth <= 0)
      [  453.533011] Modules linked in:
      [  453.533011] CPU: 1 PID: 1298 Comm: fio Not tainted 4.9.0+ #964
      [  453.533011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014
      [  453.533011]  ffff8803a24b7a70 ffffffff8196cffb ffff8803a24b7ae8 0000000000000000
      [  453.533011]  ffff8803a24b7ab8 ffffffff81091ee1 ffff8803a5dba700 00000dba00000008
      [  453.533011]  ffffed0074496f59 ffff8803a5dbaf54 ffff8803ae0f8488 fffffffffffffdef
      [  453.533011] Call Trace:
      [  453.533011]  [<ffffffff8196cffb>] dump_stack+0x67/0x9c
      [  453.533011]  [<ffffffff81091ee1>] __warn+0x111/0x130
      [  453.533011]  [<ffffffff81091f97>] warn_slowpath_fmt+0x97/0xb0
      [  453.533011]  [<ffffffff81091f00>] ? __warn+0x130/0x130
      [  453.533011]  [<ffffffff8191b789>] ? blk_finish_plug+0x29/0x60
      [  453.533011]  [<ffffffff811205d4>] lock_release+0x434/0x670
      [  453.533011]  [<ffffffff8198af94>] ? import_single_range+0xd4/0x110
      [  453.533011]  [<ffffffff81322195>] ? rw_verify_area+0x65/0x140
      [  453.533011]  [<ffffffff813aa696>] ? aio_write+0x1f6/0x280
      [  453.533011]  [<ffffffff813aa6c9>] aio_write+0x229/0x280
      [  453.533011]  [<ffffffff813aa4a0>] ? aio_complete+0x640/0x640
      [  453.533011]  [<ffffffff8111df20>] ? debug_check_no_locks_freed+0x1a0/0x1a0
      [  453.533011]  [<ffffffff8114793a>] ? debug_lockdep_rcu_enabled.part.2+0x1a/0x30
      [  453.533011]  [<ffffffff81147985>] ? debug_lockdep_rcu_enabled+0x35/0x40
      [  453.533011]  [<ffffffff812a92be>] ? __might_fault+0x7e/0xf0
      [  453.533011]  [<ffffffff813ac9bc>] do_io_submit+0x94c/0xb10
      [  453.533011]  [<ffffffff813ac2ae>] ? do_io_submit+0x23e/0xb10
      [  453.533011]  [<ffffffff813ac070>] ? SyS_io_destroy+0x270/0x270
      [  453.533011]  [<ffffffff8111d7b3>] ? mark_held_locks+0x23/0xc0
      [  453.533011]  [<ffffffff8100201a>] ? trace_hardirqs_on_thunk+0x1a/0x1c
      [  453.533011]  [<ffffffff813acb90>] SyS_io_submit+0x10/0x20
      [  453.533011]  [<ffffffff824f96aa>] entry_SYSCALL_64_fastpath+0x18/0xad
      [  453.533011]  [<ffffffff81119190>] ? trace_hardirqs_off_caller+0xc0/0x110
      [  453.533011] ---[ end trace b2fbe664d1cc0082 ]---
      
      Cc: Dmitry Monakhov <dmonakhov@openvz.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d21816c2
    • Jiri Olsa's avatar
      perf/x86: Reject non sampling events with precise_ip · 82835fb3
      Jiri Olsa authored
      
      [ Upstream commit 18e7a45a ]
      
      As Peter suggested [1] rejecting non sampling PEBS events,
      because they dont make any sense and could cause bugs
      in the NMI handler [2].
      
        [1] http://lkml.kernel.org/r/20170103094059.GC3093@worktop
        [2] http://lkml.kernel.org/r/1482931866-6018-3-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20170103142454.GA26251@kravaSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      82835fb3
    • Peter Zijlstra's avatar
      perf/core: Fix sys_perf_event_open() vs. hotplug · 1c686333
      Peter Zijlstra authored
      
      [ Upstream commit 63cae12b ]
      
      There is problem with installing an event in a task that is 'stuck' on
      an offline CPU.
      
      Blocked tasks are not dis-assosciated from offlined CPUs, after all, a
      blocked task doesn't run and doesn't require a CPU etc.. Only on
      wakeup do we ammend the situation and place the task on a available
      CPU.
      
      If we hit such a task with perf_install_in_context() we'll loop until
      either that task wakes up or the CPU comes back online, if the task
      waking depends on the event being installed, we're stuck.
      
      While looking into this issue, I also spotted another problem, if we
      hit a task with perf_install_in_context() that is in the middle of
      being migrated, that is we observe the old CPU before sending the IPI,
      but run the IPI (on the old CPU) while the task is already running on
      the new CPU, things also go sideways.
      
      Rework things to rely on task_curr() -- outside of rq->lock -- which
      is rather tricky. Imagine the following scenario where we're trying to
      install the first event into our task 't':
      
      CPU0            CPU1            CPU2
      
                      (current == t)
      
      t->perf_event_ctxp[] = ctx;
      smp_mb();
      cpu = task_cpu(t);
      
                      switch(t, n);
                                      migrate(t, 2);
                                      switch(p, t);
      
                                      ctx = t->perf_event_ctxp[]; // must not be NULL
      
      smp_function_call(cpu, ..);
      
                      generic_exec_single()
                        func();
                          spin_lock(ctx->lock);
                          if (task_curr(t)) // false
      
                          add_event_to_ctx();
                          spin_unlock(ctx->lock);
      
                                      perf_event_context_sched_in();
                                        spin_lock(ctx->lock);
                                        // sees event
      
      So its CPU0's store of t->perf_event_ctxp[] that must not go 'missing'.
      Because if CPU2's load of that variable were to observe NULL, it would
      not try to schedule the ctx and we'd have a task running without its
      counter, which would be 'bad'.
      
      As long as we observe !NULL, we'll acquire ctx->lock. If we acquire it
      first and not see the event yet, then CPU0 must observe task_curr()
      and retry. If the install happens first, then we must see the event on
      sched-in and all is well.
      
      I think we can translate the first part (until the 'must not be NULL')
      of the scenario to a litmus test like:
      
        C C-peterz
      
        {
        }
      
        P0(int *x, int *y)
        {
                int r1;
      
                WRITE_ONCE(*x, 1);
                smp_mb();
                r1 = READ_ONCE(*y);
        }
      
        P1(int *y, int *z)
        {
                WRITE_ONCE(*y, 1);
                smp_store_release(z, 1);
        }
      
        P2(int *x, int *z)
        {
                int r1;
                int r2;
      
                r1 = smp_load_acquire(z);
      	  smp_mb();
                r2 = READ_ONCE(*x);
        }
      
        exists
        (0:r1=0 /\ 2:r1=1 /\ 2:r2=0)
      
      Where:
        x is perf_event_ctxp[],
        y is our tasks's CPU, and
        z is our task being placed on the rq of CPU2.
      
      The P0 smp_mb() is the one added by this patch, ordering the store to
      perf_event_ctxp[] from find_get_context() and the load of task_cpu()
      in task_function_call().
      
      The smp_store_release/smp_load_acquire model the RCpc locking of the
      rq->lock and the smp_mb() of P2 is the context switch switching from
      whatever CPU2 was running to our task 't'.
      
      This litmus test evaluates into:
      
        Test C-peterz Allowed
        States 7
        0:r1=0; 2:r1=0; 2:r2=0;
        0:r1=0; 2:r1=0; 2:r2=1;
        0:r1=0; 2:r1=1; 2:r2=1;
        0:r1=1; 2:r1=0; 2:r2=0;
        0:r1=1; 2:r1=0; 2:r2=1;
        0:r1=1; 2:r1=1; 2:r2=0;
        0:r1=1; 2:r1=1; 2:r2=1;
        No
        Witnesses
        Positive: 0 Negative: 7
        Condition exists (0:r1=0 /\ 2:r1=1 /\ 2:r2=0)
        Observation C-peterz Never 0 7
        Hash=e427f41d9146b2a5445101d3e2fcaa34
      
      And the strong and weak model agree.
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: jeremy.linton@arm.com
      Link: http://lkml.kernel.org/r/20161209135900.GU3174@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c686333