1. 05 Jul, 2017 40 commits
    • Baoquan He's avatar
      x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug · cc6b8fbc
      Baoquan He authored
      commit 8eabf42a upstream.
      
      Kernel text KASLR is separated into physical address and virtual
      address randomization. And for virtual address randomization, we
      only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE.
      So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR,
      but not the original kernel loading address 'output'.
      
      The bug will cause kernel boot failure if kernel is loaded at a different
      position than the address, 16M, which is decided at compiled time.
      Kexec/kdump is such practical case.
      
      To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial
      value.
      Tested-by: default avatarDave Young <dyoung@redhat.com>
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 8391c73c ("x86/KASLR: Randomize virtual address separately")
      Link: http://lkml.kernel.org/r/1498567146-11990-3-git-send-email-bhe@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc6b8fbc
    • Thomas Gleixner's avatar
      x86/mshyperv: Remove excess #includes from mshyperv.h · 875cfdbe
      Thomas Gleixner authored
      commit 26fcd952 upstream.
      
      A recent commit included linux/slab.h in linux/irq.h. This breaks the build
      of vdso32 on a 64-bit kernel.
      
      The reason is that linux/irq.h gets included into the vdso code via
      linux/interrupt.h which is included from asm/mshyperv.h. That makes the
      32-bit vdso compile fail, because slab.h includes the pgtable headers for
      64-bit on a 64-bit build.
      
      Neither linux/clocksource.h nor linux/interrupt.h are needed in the
      mshyperv.h header file itself - it has a dependency on <linux/atomic.h>.
      
      Remove the includes and unbreak the build.
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: devel@linuxdriverproject.org
      Fixes: dee863b5 ("hv: export current Hyper-V clocksource")
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1706231038460.2647@nanosSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      875cfdbe
    • Josh Poimboeuf's avatar
      Revert "x86/entry: Fix the end of the stack for newly forked tasks" · b3bc8114
      Josh Poimboeuf authored
      commit ebd57499 upstream.
      
      Petr Mladek reported the following warning when loading the livepatch
      sample module:
      
        WARNING: CPU: 1 PID: 3699 at arch/x86/kernel/stacktrace.c:132 save_stack_trace_tsk_reliable+0x133/0x1a0
        ...
        Call Trace:
         __schedule+0x273/0x820
         schedule+0x36/0x80
         kthreadd+0x305/0x310
         ? kthread_create_on_cpu+0x80/0x80
         ? icmp_echo.part.32+0x50/0x50
         ret_from_fork+0x2c/0x40
      
      That warning means the end of the stack is no longer recognized as such
      for newly forked tasks.  The problem was introduced with the following
      commit:
      
        ff3f7e24 ("x86/entry: Fix the end of the stack for newly forked tasks")
      
      ... which was completely misguided.  It only partially fixed the
      reported issue, and it introduced another bug in the process.  None of
      the other entry code saves the frame pointer before calling into C code,
      so it doesn't make sense for ret_from_fork to do so either.
      
      Contrary to what I originally thought, the original issue wasn't related
      to newly forked tasks.  It was actually related to ftrace.  When entry
      code calls into a function which then calls into an ftrace handler, the
      stack frame looks different than normal.
      
      The original issue will be fixed in the unwinder, in a subsequent patch.
      Reported-by: default avatarPetr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: live-patching@vger.kernel.org
      Fixes: ff3f7e24 ("x86/entry: Fix the end of the stack for newly forked tasks")
      Link: http://lkml.kernel.org/r/f350760f7e82f0750c8d1dd093456eb212751caa.1495553739.git.jpoimboe@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3bc8114
    • Arnaldo Carvalho de Melo's avatar
      tools arch: Sync arch/x86/lib/memcpy_64.S with the kernel · 839fe2c4
      Arnaldo Carvalho de Melo authored
      commit e883d09c upstream.
      
      Just a minor fix done in:
      
        Fixes: 26a37ab3 ("x86/mce: Fix copy/paste error in exception table entries")
      
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/n/tip-ni9jzdd5yxlail6pq8cuexw2@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      839fe2c4
    • Christophe JAILLET's avatar
      ARM: davinci: PM: Do not free useful resources in normal path in 'davinci_pm_init' · b856d45c
      Christophe JAILLET authored
      commit 95d7c1f1 upstream.
      
      It is wrong to iounmap resources in the normal path of davinci_pm_init()
      
      The 3 ioremap'ed fields of 'pm_config' can be accessed later on in other
      functions, so we should return 'success' instead of unrolling everything.
      
      Fixes: aa9aa1ec ("ARM: davinci: PM: rework init, remove platform device")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      [nsekhar@ti.com: commit message and minor style fixes]
      Signed-off-by: default avatarSekhar Nori <nsekhar@ti.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b856d45c
    • Christophe JAILLET's avatar
      ARM: davinci: PM: Free resources in error handling path in 'davinci_pm_init' · b0ed4718
      Christophe JAILLET authored
      commit f3f6cc81 upstream.
      
      If 'sram_alloc' fails, we need to free already allocated resources.
      
      Fixes: aa9aa1ec ("ARM: davinci: PM: rework init, remove platform device")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarSekhar Nori <nsekhar@ti.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0ed4718
    • Doug Berger's avatar
      ARM: 8685/1: ensure memblock-limit is pmd-aligned · 0afbd9fd
      Doug Berger authored
      commit 9e25ebfe upstream.
      
      The pmd containing memblock_limit is cleared by prepare_page_table()
      which creates the opportunity for early_alloc() to allocate unmapped
      memory if memblock_limit is not pmd aligned causing a boot-time hang.
      
      Commit 965278dc ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM")
      attempted to resolve this problem, but there is a path through the
      adjust_lowmem_bounds() routine where if all memory regions start and
      end on pmd-aligned addresses the memblock_limit will be set to
      arm_lowmem_limit.
      
      Since arm_lowmem_limit can be affected by the vmalloc early parameter,
      the value of arm_lowmem_limit may not be pmd-aligned. This commit
      corrects this oversight such that memblock_limit is always rounded
      down to pmd-alignment.
      
      Fixes: 965278dc ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM")
      Signed-off-by: default avatarDoug Berger <opendmb@gmail.com>
      Suggested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0afbd9fd
    • Lorenzo Pieralisi's avatar
      ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation · 16dfde48
      Lorenzo Pieralisi authored
      commit cb7cf772 upstream.
      
      The BAD_MADT_GICC_ENTRY() macro checks if a GICC MADT entry passes
      muster from an ACPI specification standpoint. Current macro detects the
      MADT GICC entry length through ACPI firmware version (it changed from 76
      to 80 bytes in the transition from ACPI 5.1 to ACPI 6.0 specification)
      but always uses (erroneously) the ACPICA (latest) struct (ie struct
      acpi_madt_generic_interrupt - that is 80-bytes long) length to check if
      the current GICC entry memory record exceeds the MADT table end in
      memory as defined by the MADT table header itself, which may result in
      false negatives depending on the ACPI firmware version and how the MADT
      entries are laid out in memory (ie on ACPI 5.1 firmware MADT GICC
      entries are 76 bytes long, so by adding 80 to a GICC entry start address
      in memory the resulting address may well be past the actual MADT end,
      triggering a false negative).
      
      Fix the BAD_MADT_GICC_ENTRY() macro by reshuffling the condition checks
      and update them to always use the firmware version specific MADT GICC
      entry length in order to carry out boundary checks.
      
      Fixes: b6cfb277 ("ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro")
      Reported-by: default avatarJulien Grall <julien.grall@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Julien Grall <julien.grall@arm.com>
      Cc: Hanjun Guo <hanjun.guo@linaro.org>
      Cc: Al Stone <ahs3@redhat.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16dfde48
    • Timmy Li's avatar
      ARM64: PCI: Fix struct acpi_pci_root_ops allocation failure path · 5e1c6a5d
      Timmy Li authored
      commit 717902cc upstream.
      
      Commit 093d24a2 ("arm64: PCI: Manage controller-specific data on
      per-controller basis") added code to allocate ACPI PCI root_ops
      dynamically on a per host bridge basis but failed to update the
      corresponding memory allocation failure path in pci_acpi_scan_root()
      leading to a potential memory leakage.
      
      Fix it by adding the required kfree call.
      
      Fixes: 093d24a2 ("arm64: PCI: Manage controller-specific data on per-controller basis")
      Reviewed-by: default avatarTomasz Nowicki <tn@semihalf.com>
      Signed-off-by: default avatarTimmy Li <lixiaoping3@huawei.com>
      [lorenzo.pieralisi@arm.com: refactored code, rewrote commit log]
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      CC: Will Deacon <will.deacon@arm.com>
      CC: Bjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e1c6a5d
    • Eric Anholt's avatar
      watchdog: bcm281xx: Fix use of uninitialized spinlock. · 96101035
      Eric Anholt authored
      commit fedf266f upstream.
      
      The bcm_kona_wdt_set_resolution_reg() call takes the spinlock, so
      initialize it earlier.  Fixes a warning at boot with lock debugging
      enabled.
      
      Fixes: 6adb730d ("watchdog: bcm281xx: Watchdog Driver")
      Signed-off-by: default avatarEric Anholt <eric@anholt.net>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWim Van Sebroeck <wim@iguana.be>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96101035
    • Dan Carpenter's avatar
      xfrm: Oops on error in pfkey_msg2xfrm_state() · 662bb18e
      Dan Carpenter authored
      commit 1e3d0c2c upstream.
      
      There are some missing error codes here so we accidentally return NULL
      instead of an error pointer.  It results in a NULL pointer dereference.
      
      Fixes: df71837d ("[LSM-IPSec]: Security association restriction.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      662bb18e
    • Dan Carpenter's avatar
      xfrm: NULL dereference on allocation failure · f37a5bfa
      Dan Carpenter authored
      commit e747f643 upstream.
      
      The default error code in pfkey_msg2xfrm_state() is -ENOBUFS.  We
      added a new call to security_xfrm_state_alloc() which sets "err" to zero
      so there several places where we can return ERR_PTR(0) if kmalloc()
      fails.  The caller is expecting error pointers so it leads to a NULL
      dereference.
      
      Fixes: df71837d ("[LSM-IPSec]: Security association restriction.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f37a5bfa
    • Sabrina Dubroca's avatar
      xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY · 29be0c1a
      Sabrina Dubroca authored
      commit 9b3eb541 upstream.
      
      When CONFIG_XFRM_SUB_POLICY=y, xfrm_dst stores a copy of the flowi for
      that dst. Unfortunately, the code that allocates and fills this copy
      doesn't care about what type of flowi (flowi, flowi4, flowi6) gets
      passed. In multiple code paths (from raw_sendmsg, from TCP when
      replying to a FIN, in vxlan, geneve, and gre), the flowi that gets
      passed to xfrm is actually an on-stack flowi4, so we end up reading
      stuff from the stack past the end of the flowi4 struct.
      
      Since xfrm_dst->origin isn't used anywhere following commit
      ca116922 ("xfrm: Eliminate "fl" and "pol" args to
      xfrm_bundle_ok()."), just get rid of it.  xfrm_dst->partner isn't used
      either, so get rid of that too.
      
      Fixes: 9d6ec938 ("ipv4: Use flowi4 in public route lookup interfaces.")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29be0c1a
    • Hangbin Liu's avatar
      xfrm: move xfrm_garbage_collect out of xfrm_policy_flush · e0cee9f3
      Hangbin Liu authored
      commit 138437f5 upstream.
      
      Now we will force to do garbage collection if any policy removed in
      xfrm_policy_flush(). But during xfrm_net_exit(). We call flow_cache_fini()
      first and set set fc->percpu to NULL. Then after we call xfrm_policy_fini()
      -> frxm_policy_flush() -> flow_cache_flush(), we will get NULL pointer
      dereference when check percpu_empty. The code path looks like:
      
      flow_cache_fini()
        - fc->percpu = NULL
      xfrm_policy_fini()
        - xfrm_policy_flush()
          - xfrm_garbage_collect()
            - flow_cache_flush()
              - flow_cache_percpu_empty()
      	  - fcp = per_cpu_ptr(fc->percpu, cpu)
      
      To reproduce, just add ipsec in netns and then remove the netns.
      
      v2:
      As Xin Long suggested, since only two other places need to call it. move
      xfrm_garbage_collect() outside xfrm_policy_flush().
      
      v3:
      Fix subject mismatch after v2 fix.
      
      Fixes: 35db0691 ("xfrm: do the garbage collection after flushing policy")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e0cee9f3
    • Yossi Kuperman's avatar
      xfrm6: Fix IPv6 payload_len in xfrm6_transport_finish · 4d03c617
      Yossi Kuperman authored
      commit 7c88e21a upstream.
      
      IPv6 payload length indicates the size of the payload, including any
      extension headers.
      
      In xfrm6_transport_finish, ipv6_hdr(skb)->payload_len is set to the
      payload size only, regardless of the presence of any extension headers.
      After ESP GRO transport mode decapsulation, ipv6_rcv trims the packet
      according to the wrong payload_len, thus corrupting the packet.
      
      Set payload_len to account for extension headers as well.
      
      Fixes: 7785bba2 ("esp: Add a software GRO codepath")
      Signed-off-by: default avatarYossi Kuperman <yossiku@mellanox.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d03c617
    • Juergen Gross's avatar
      xen/blkback: don't free be structure too early · c4ed418d
      Juergen Gross authored
      commit 71df1d7c upstream.
      
      The be structure must not be freed when freeing the blkif structure
      isn't done. Otherwise a use-after-free of be when unmapping the ring
      used for communicating with the frontend will occur in case of a
      late call of xenblk_disconnect() (e.g. due to an I/O still active
      when trying to disconnect).
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Tested-by: default avatarSteven Haigh <netwiz@crc.id.au>
      Acked-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4ed418d
    • Ard Biesheuvel's avatar
      mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings · f21d9eda
      Ard Biesheuvel authored
      commit 029c54b0 upstream.
      
      Existing code that uses vmalloc_to_page() may assume that any address
      for which is_vmalloc_addr() returns true may be passed into
      vmalloc_to_page() to retrieve the associated struct page.
      
      This is not un unreasonable assumption to make, but on architectures
      that have CONFIG_HAVE_ARCH_HUGE_VMAP=y, it no longer holds, and we need
      to ensure that vmalloc_to_page() does not go off into the weeds trying
      to dereference huge PUDs or PMDs as table entries.
      
      Given that vmalloc() and vmap() themselves never create huge mappings or
      deal with compound pages at all, there is no correct answer in this
      case, so return NULL instead, and issue a warning.
      
      When reading /proc/kcore on arm64, you will hit an oops as soon as you
      hit the huge mappings used for the various segments that make up the
      mapping of vmlinux.  With this patch applied, you will no longer hit the
      oops, but the kcore contents willl be incorrect (these regions will be
      zeroed out)
      
      We are fixing this for kcore specifically, so it avoids vread() for
      those regions.  At least one other problematic user exists, i.e.,
      /dev/kmem, but that is currently broken on arm64 for other reasons.
      
      Link: http://lkml.kernel.org/r/20170609082226.26152-1-ard.biesheuvel@linaro.orgSigned-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarLaura Abbott <labbott@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: zhong jiang <zhongjiang@huawei.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f21d9eda
    • Thomas Gleixner's avatar
      pinctrl/amd: Use regular interrupt instead of chained · 0ec03ce7
      Thomas Gleixner authored
      commit ba714a9c upstream.
      
      The AMD pinctrl driver uses a chained interrupt to demultiplex the GPIO
      interrupts. Kevin Vandeventer reported, that his new AMD Ryzen locks up
      hard on boot when the AMD pinctrl driver is initialized. The reason is an
      interrupt storm. It's not clear whether that's caused by hardware or
      firmware or both.
      
      Using chained interrupts on X86 is a dangerous endavour. If a system is
      misconfigured or the hardware buggy there is no safety net to catch an
      interrupt storm.
      
      Convert the driver to use a regular interrupt for the demultiplex
      handler. This allows the interrupt storm detector to catch the malfunction
      and lets the system boot up.
      
      This should be backported to stable because it's likely that more users run
      into this problem as the AMD Ryzen machines are spreading.
      
      Reported-by: Kevin Vandeventer
      Link: https://bugzilla.suse.com/show_bug.cgi?id=1034261Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ec03ce7
    • Baoquan He's avatar
      x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds() · d9efa9db
      Baoquan He authored
      commit fc5f9d5f upstream.
      
      Jeff Moyer reported that on his system with two memory regions 0~64G and
      1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling KASLR
      will make the system hang intermittently during boot. While adding 'nokaslr'
      won't.
      
      The back trace is:
      
       Oops: 0000 [#1] SMP
      
       RIP: memcpy_erms()
       [ .... ]
       Call Trace:
        pmem_rw_page()
        bdev_read_page()
        do_mpage_readpage()
        mpage_readpages()
        blkdev_readpages()
        __do_page_cache_readahead()
        force_page_cache_readahead()
        page_cache_sync_readahead()
        generic_file_read_iter()
        blkdev_read_iter()
        __vfs_read()
        vfs_read()
        SyS_read()
        entry_SYSCALL_64_fastpath()
      
      This crash happens because the for loop count calculation in sync_global_pgds()
      is not correct. When a mapping area crosses PGD entries, we should
      calculate the starting address of region which next PGD covers and assign
      it to next for loop count, but not add PGDIR_SIZE directly. The old
      code works right only if the mapping area is an exact multiple of PGDIR_SIZE,
      otherwize the end region could be skipped so that it can't be synchronized
      to all other processes from kernel PGD init_mm.pgd.
      
      In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than
      PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it
      makes this area be mapped inside one PGD entry. With KASLR enabled,
      this area could cross two PGD entries, then the next PGD entry won't
      be synced to all other processes. That is why we saw empty PGD.
      
      Fix it.
      Reported-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jinbum Park <jinb.park7@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1493864747-8506-1-git-send-email-bhe@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9efa9db
    • Vallish Vaidyeshwara's avatar
      dm thin: do not queue freed thin mapping for next stage processing · a340661a
      Vallish Vaidyeshwara authored
      commit 00a0ea33 upstream.
      
      process_prepared_discard_passdown_pt1() should cleanup
      dm_thin_new_mapping in cases of error.
      
      dm_pool_inc_data_range() can fail trying to get a block reference:
      
      metadata operation 'dm_pool_inc_data_range' failed: error = -61
      
      When dm_pool_inc_data_range() fails, dm thin aborts current metadata
      transaction and marks pool as PM_READ_ONLY. Memory for thin mapping
      is released as well. However, current thin mapping will be queued
      onto next stage as part of queue_passdown_pt2() or passdown_endio().
      This dangling thin mapping memory when processed and accessed in
      next stage will lead to device mapper crashing.
      
      Code flow without fix:
      -> process_prepared_discard_passdown_pt1(m)
         -> dm_thin_remove_range()
         -> discard passdown
            --> passdown_endio(m) queues m onto next stage
         -> dm_pool_inc_data_range() fails, frees memory m
                  but does not remove it from next stage queue
      
      -> process_prepared_discard_passdown_pt2(m)
         -> processes freed memory m and crashes
      
      One such stack:
      
      Call Trace:
      [<ffffffffa037a46f>] dm_cell_release_no_holder+0x2f/0x70 [dm_bio_prison]
      [<ffffffffa039b6dc>] cell_defer_no_holder+0x3c/0x80 [dm_thin_pool]
      [<ffffffffa039b88b>] process_prepared_discard_passdown_pt2+0x4b/0x90 [dm_thin_pool]
      [<ffffffffa0399611>] process_prepared+0x81/0xa0 [dm_thin_pool]
      [<ffffffffa039e735>] do_worker+0xc5/0x820 [dm_thin_pool]
      [<ffffffff8152bf54>] ? __schedule+0x244/0x680
      [<ffffffff81087e72>] ? pwq_activate_delayed_work+0x42/0xb0
      [<ffffffff81089f53>] process_one_work+0x153/0x3f0
      [<ffffffff8108a71b>] worker_thread+0x12b/0x4b0
      [<ffffffff8108a5f0>] ? rescuer_thread+0x350/0x350
      [<ffffffff8108fd6a>] kthread+0xca/0xe0
      [<ffffffff8108fca0>] ? kthread_park+0x60/0x60
      [<ffffffff81530b45>] ret_from_fork+0x25/0x30
      
      The fix is to first take the block ref count for discarded block and
      then do a passdown discard of this block. If block ref count fails,
      then bail out aborting current metadata transaction, mark pool as
      PM_READ_ONLY and also free current thin mapping memory (existing error
      handling code) without queueing this thin mapping onto next stage of
      processing. If block ref count succeeds, then passdown discard of this
      block. Discard callback of passdown_endio() will queue this thin mapping
      onto next stage of processing.
      
      Code flow with fix:
      -> process_prepared_discard_passdown_pt1(m)
         -> dm_thin_remove_range()
         -> dm_pool_inc_data_range()
            --> if fails, free memory m and bail out
         -> discard passdown
            --> passdown_endio(m) queues m onto next stage
      Reviewed-by: default avatarEduardo Valentin <eduval@amazon.com>
      Reviewed-by: default avatarCristian Gafton <gafton@amazon.com>
      Reviewed-by: default avatarAnchal Agarwal <anchalag@amazon.com>
      Signed-off-by: default avatarVallish Vaidyeshwara <vallish@amazon.com>
      Reviewed-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a340661a
    • Deepak Rawat's avatar
      drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr · d7822ccb
      Deepak Rawat authored
      commit 82fcee52 upstream.
      
      The hash table created during vmw_cmdbuf_res_man_create was
      never freed. This causes memory leak in context creation.
      Added the corresponding drm_ht_remove in vmw_cmdbuf_res_man_destroy.
      
      Tested for memory leak by running piglit overnight and kernel
      memory is not inflated which earlier was.
      Signed-off-by: default avatarDeepak Rawat <drawat@vmware.com>
      Reviewed-by: default avatarSinclair Yeh <syeh@vmware.com>
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7822ccb
    • Kan Liang's avatar
      perf/x86/intel/uncore: Fix wrong box pointer check · bdbe8503
      Kan Liang authored
      commit 80c65fdb upstream.
      
      Should not init a NULL box. It will cause system crash.
      The issue looks like caused by a typo.
      
      This was not noticed because there is no NULL box. Also, for most
      boxes, they are enabled by default. The init code is not critical.
      
      Fixes: fff4b87e ("perf/x86/intel/uncore: Make package handling more robust")
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170629190926.2456-1-kan.liang@intel.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bdbe8503
    • Vikas Shivappa's avatar
      x86/intel_rdt: Fix memory leak on mount failure · 60c9685b
      Vikas Shivappa authored
      commit 79298acc upstream.
      
      If mount fails, the kn_info directory is not freed causing memory leak.
      
      Add the missing error handling path.
      
      Fixes: 4e978d06 ("x86/intel_rdt: Add "info" files to resctrl file system")
      Signed-off-by: default avatarVikas Shivappa <vikas.shivappa@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: ravi.v.shankar@intel.com
      Cc: tony.luck@intel.com
      Cc: fenghua.yu@intel.com
      Cc: peterz@infradead.org
      Cc: vikas.shivappa@intel.com
      Cc: andi.kleen@intel.com
      Link: http://lkml.kernel.org/r/1498503368-20173-3-git-send-email-vikas.shivappa@linux.intel.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60c9685b
    • Bartosz Golaszewski's avatar
      gpiolib: fix filtering out unwanted events · dfa1f244
      Bartosz Golaszewski authored
      commit ad537b82 upstream.
      
      GPIOEVENT_REQUEST_BOTH_EDGES is not a single flag, but a binary OR of
      GPIOEVENT_REQUEST_RISING_EDGE and GPIOEVENT_REQUEST_FALLING_EDGE.
      
      The expression 'le->eflags & GPIOEVENT_REQUEST_BOTH_EDGES' we'll get
      evaluated to true even if only one event type was requested.
      
      Fix it by checking both RISING & FALLING flags explicitly.
      
      Fixes: 61f922db ("gpio: userspace ABI for reading GPIO line events")
      Signed-off-by: default avatarBartosz Golaszewski <brgl@bgdev.pl>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dfa1f244
    • Miklos Szeredi's avatar
      ovl: copy-up: don't unlock between lookup and link · 515a95fa
      Miklos Szeredi authored
      commit e85f82ff upstream.
      
      Nothing prevents mischief on upper layer while we are busy copying up the
      data.
      
      Move the lookup right before the looked up dentry is actually used.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 01ad3eb8 ("ovl: concurrent copy up of regular files")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      515a95fa
    • Benjamin Coddington's avatar
      Revert "NFS: nfs_rename() handle -ERESTARTSYS dentry left behind" · 003192c3
      Benjamin Coddington authored
      commit d9f29500 upstream.
      
      This reverts commit 920b4530 which could
      call d_move() without holding the directory's i_mutex, and reverts commit
      d4ea7e3c "NFS: Fix old dentry rehash after
      move", which was a follow-up fix.
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Fixes: 920b4530 ("NFS: nfs_rename() handle -ERESTARTSYS dentry left behind")
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      003192c3
    • Trond Myklebust's avatar
      NFSv4.1: Fix a race in nfs4_proc_layoutget · 95b2e088
      Trond Myklebust authored
      commit bd171930 upstream.
      
      If the task calling layoutget is signalled, then it is possible for the
      calls to nfs4_sequence_free_slot() and nfs4_layoutget_prepare() to race,
      in which case we leak a slot.
      The fix is to move the call to nfs4_sequence_free_slot() into the
      nfs4_layoutget_release() so that it gets called at task teardown time.
      
      Fixes: 2e80dbe7 ("NFSv4.1: Close callback races for OPEN, LAYOUTGET...")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95b2e088
    • Benjamin Coddington's avatar
      NFSv4.2: Don't send mode again in post-EXCLUSIVE4_1 SETATTR with umask · f8da5dee
      Benjamin Coddington authored
      commit 501e7a46 upstream.
      
      Now that we have umask support, we shouldn't re-send the mode in a SETATTR
      following an exclusive CREATE, or we risk having the same problem fixed in
      commit 5334c5bd ("NFS: Send attributes in OPEN request for
      NFS4_CREATE_EXCLUSIVE4_1"), which is that files with S_ISGID will have that
      bit stripped away.
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Fixes: dff25ddb ("nfs: add support for the umask attribute")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8da5dee
    • Hui Wang's avatar
      ALSA: hda - set input_path bitmap to zero after moving it to new place · f521e0bf
      Hui Wang authored
      commit a8f20fd2 upstream.
      
      Recently we met a problem, the codec has valid adcs and input pins,
      and they can form valid input paths, but the driver does not build
      valid controls for them like "Mic boost", "Capture Volume" and
      "Capture Switch".
      
      Through debugging, I found the driver needs to shrink the invalid
      adcs and input paths for this machine, so it will move the whole
      column bitmap value to the previous column, after moving it, the
      driver forgets to set the original column bitmap value to zero, as a
      result, the driver will invalidate the path whose index value is the
      original colume bitmap value. After executing this function, all
      valid input paths are invalidated by a mistake, there are no any
      valid input paths, so the driver won't build controls for them.
      
      Fixes: 3a65bcdc ("ALSA: hda - Fix inconsistent input_paths after ADC reduction")
      Signed-off-by: default avatarHui Wang <hui.wang@canonical.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f521e0bf
    • Takashi Iwai's avatar
      ALSA: hda - Fix endless loop of codec configure · a189e2f2
      Takashi Iwai authored
      commit d94815f9 upstream.
      
      azx_codec_configure() loops over the codecs found on the given
      controller via a linked list.  The code used to work in the past, but
      in the current version, this may lead to an endless loop when a codec
      binding returns an error.
      
      The culprit is that the snd_hda_codec_configure() unregisters the
      device upon error, and this eventually deletes the given codec object
      from the bus.  Since the list is initialized via list_del_init(), the
      next object points to the same device itself.  This behavior change
      was introduced at splitting the HD-audio code code, and forgotten to
      adapt it here.
      
      For fixing this bug, just use a *_safe() version of list iteration.
      
      Fixes: d068ebc2 ("ALSA: hda - Move some codes up to hdac_bus struct")
      Reported-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a189e2f2
    • Paul Burton's avatar
      MIPS: Fix IRQ tracing & lockdep when rescheduling · f1a8a4fe
      Paul Burton authored
      commit d8550860 upstream.
      
      When the scheduler sets TIF_NEED_RESCHED & we call into the scheduler
      from arch/mips/kernel/entry.S we disable interrupts. This is true
      regardless of whether we reach work_resched from syscall_exit_work,
      resume_userspace or by looping after calling schedule(). Although we
      disable interrupts in these paths we don't call trace_hardirqs_off()
      before calling into C code which may acquire locks, and we therefore
      leave lockdep with an inconsistent view of whether interrupts are
      disabled or not when CONFIG_PROVE_LOCKING & CONFIG_DEBUG_LOCKDEP are
      both enabled.
      
      Without tracing this interrupt state lockdep will print warnings such
      as the following once a task returns from a syscall via
      syscall_exit_partial with TIF_NEED_RESCHED set:
      
      [   49.927678] ------------[ cut here ]------------
      [   49.934445] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3687 check_flags.part.41+0x1dc/0x1e8
      [   49.946031] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
      [   49.946355] CPU: 0 PID: 1 Comm: init Not tainted 4.10.0-00439-gc9fd5d362289-dirty #197
      [   49.963505] Stack : 0000000000000000 ffffffff81bb5d6a 0000000000000006 ffffffff801ce9c4
      [   49.974431]         0000000000000000 0000000000000000 0000000000000000 000000000000004a
      [   49.985300]         ffffffff80b7e487 ffffffff80a24498 a8000000ff160000 ffffffff80ede8b8
      [   49.996194]         0000000000000001 0000000000000000 0000000000000000 0000000077c8030c
      [   50.007063]         000000007fd8a510 ffffffff801cd45c 0000000000000000 a8000000ff127c88
      [   50.017945]         0000000000000000 ffffffff801cf928 0000000000000001 ffffffff80a24498
      [   50.028827]         0000000000000000 0000000000000001 0000000000000000 0000000000000000
      [   50.039688]         0000000000000000 a8000000ff127bd0 0000000000000000 ffffffff805509bc
      [   50.050575]         00000000140084e0 0000000000000000 0000000000000000 0000000000040a00
      [   50.061448]         0000000000000000 ffffffff8010e1b0 0000000000000000 ffffffff805509bc
      [   50.072327]         ...
      [   50.076087] Call Trace:
      [   50.079869] [<ffffffff8010e1b0>] show_stack+0x80/0xa8
      [   50.086577] [<ffffffff805509bc>] dump_stack+0x10c/0x190
      [   50.093498] [<ffffffff8015dde0>] __warn+0xf0/0x108
      [   50.099889] [<ffffffff8015de34>] warn_slowpath_fmt+0x3c/0x48
      [   50.107241] [<ffffffff801c15b4>] check_flags.part.41+0x1dc/0x1e8
      [   50.114961] [<ffffffff801c239c>] lock_is_held_type+0x8c/0xb0
      [   50.122291] [<ffffffff809461b8>] __schedule+0x8c0/0x10f8
      [   50.129221] [<ffffffff80946a60>] schedule+0x30/0x98
      [   50.135659] [<ffffffff80106278>] work_resched+0x8/0x34
      [   50.142397] ---[ end trace 0cb4f6ef5b99fe21 ]---
      [   50.148405] possible reason: unannotated irqs-off.
      [   50.154600] irq event stamp: 400463
      [   50.159566] hardirqs last  enabled at (400463): [<ffffffff8094edc8>] _raw_spin_unlock_irqrestore+0x40/0xa8
      [   50.171981] hardirqs last disabled at (400462): [<ffffffff8094eb98>] _raw_spin_lock_irqsave+0x30/0xb0
      [   50.183897] softirqs last  enabled at (400450): [<ffffffff8016580c>] __do_softirq+0x4ac/0x6a8
      [   50.195015] softirqs last disabled at (400425): [<ffffffff80165e78>] irq_exit+0x110/0x128
      
      Fix this by using the TRACE_IRQS_OFF macro to call trace_hardirqs_off()
      when CONFIG_TRACE_IRQFLAGS is enabled. This is done before invoking
      schedule() following the work_resched label because:
      
       1) Interrupts are disabled regardless of the path we take to reach
          work_resched() & schedule().
      
       2) Performing the tracing here avoids the need to do it in paths which
          disable interrupts but don't call out to C code before hitting a
          path which uses the RESTORE_SOME macro that will call
          trace_hardirqs_on() or trace_hardirqs_off() as appropriate.
      
      We call trace_hardirqs_on() using the TRACE_IRQS_ON macro before calling
      syscall_trace_leave() for similar reasons, ensuring that lockdep has a
      consistent view of state after we re-enable interrupts.
      Signed-off-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/15385/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f1a8a4fe
    • Paul Burton's avatar
      MIPS: pm-cps: Drop manual cache-line alignment of ready_count · 5033e622
      Paul Burton authored
      commit 161c51cc upstream.
      
      We allocate memory for a ready_count variable per-CPU, which is accessed
      via a cached non-coherent TLB mapping to perform synchronisation between
      threads within the core using LL/SC instructions. In order to ensure
      that the variable is contained within its own data cache line we
      allocate 2 lines worth of memory & align the resulting pointer to a line
      boundary. This is however unnecessary, since kmalloc is guaranteed to
      return memory which is at least cache-line aligned (see
      ARCH_DMA_MINALIGN). Stop the redundant manual alignment.
      
      Besides cleaning up the code & avoiding needless work, this has the side
      effect of avoiding an arithmetic error found by Bryan on 64 bit systems
      due to the 32 bit size of the former dlinesz. This led the ready_count
      variable to have its upper 32b cleared erroneously for MIPS64 kernels,
      causing problems when ready_count was later used on MIPS64 via cpuidle.
      Signed-off-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Fixes: 3179d37e ("MIPS: pm-cps: add PM state entry code for CPS systems")
      Reported-by: default avatarBryan O'Donoghue <bryan.odonoghue@imgtec.com>
      Reviewed-by: default avatarBryan O'Donoghue <bryan.odonoghue@imgtec.com>
      Tested-by: default avatarBryan O'Donoghue <bryan.odonoghue@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/15383/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5033e622
    • James Hogan's avatar
      MIPS: Avoid accidental raw backtrace · dba185d7
      James Hogan authored
      commit 85423636 upstream.
      
      Since commit 81a76d71 ("MIPS: Avoid using unwind_stack() with
      usermode") show_backtrace() invokes the raw backtracer when
      cp0_status & ST0_KSU indicates user mode to fix issues on EVA kernels
      where user and kernel address spaces overlap.
      
      However this is used by show_stack() which creates its own pt_regs on
      the stack and leaves cp0_status uninitialised in most of the code paths.
      This results in the non deterministic use of the raw back tracer
      depending on the previous stack content.
      
      show_stack() deals exclusively with kernel mode stacks anyway, so
      explicitly initialise regs.cp0_status to KSU_KERNEL (i.e. 0) to ensure
      we get a useful backtrace.
      
      Fixes: 81a76d71 ("MIPS: Avoid using unwind_stack() with usermode")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/16656/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dba185d7
    • Karl Beldan's avatar
      MIPS: head: Reorder instructions missing a delay slot · 9de0e07d
      Karl Beldan authored
      commit 25d8b92e upstream.
      
      In this sequence the 'move' is assumed in the delay slot of the 'beq',
      but head.S is in reorder mode and the former gets pushed one 'nop'
      farther by the assembler.
      
      The corrected behavior made booting with an UHI supplied dtb erratic.
      
      Fixes: 15f37e15 ("MIPS: store the appended dtb address in a variable")
      Signed-off-by: default avatarKarl Beldan <karl.beldan+oss@gmail.com>
      Reviewed-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Jonas Gorski <jogo@openwrt.org>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/16614/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9de0e07d
    • Juergen Gross's avatar
      xen/blkback: don't use xen_blkif_get() in xen-blkback kthread · c806e018
      Juergen Gross authored
      commit a24fa22c upstream.
      
      There is no need to use xen_blkif_get()/xen_blkif_put() in the kthread
      of xen-blkback. Thread stopping is synchronous and using the blkif
      reference counting in the kthread will avoid to ever let the reference
      count drop to zero at the end of an I/O running concurrent to
      disconnecting and multiple rings.
      
      Setting ring->xenblkd to NULL after stopping the kthread isn't needed
      as the kthread does this already.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Tested-by: default avatarSteven Haigh <netwiz@crc.id.au>
      Acked-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c806e018
    • Kinglong Mee's avatar
      NFSv4.x/callback: Create the callback service through svc_create_pooled · 887e338c
      Kinglong Mee authored
      commit df807fff upstream.
      
      As the comments for svc_set_num_threads() said,
      " Destroying threads relies on the service threads filling in
      rqstp->rq_task, which only the nfs ones do.  Assumes the serv
      has been created using svc_create_pooled()."
      
      If creating service through svc_create(), the svc_pool_map_put()
      will be called in svc_destroy(), but the pool map isn't used.
      So that, the reference of pool map will be drop, the next using
      of pool map will get a zero npools.
      
      [  137.992130] divide error: 0000 [#1] SMP
      [  137.992148] Modules linked in: nfsd(E) nfsv4 nfs fscache fuse tun bridge stp llc ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event vmw_balloon coretemp crct10dif_pclmul crc32_pclmul ppdev ghash_clmulni_intel intel_rapl_perf joydev snd_ens1371 gameport snd_ac97_codec ac97_bus snd_seq snd_pcm snd_rawmidi snd_timer snd_seq_device snd soundcore parport_pc parport nfit acpi_cpufreq tpm_tis tpm_tis_core tpm vmw_vmci i2c_piix4 shpchp auth_rpcgss nfs_acl lockd(E) grace sunrpc(E) xfs libcrc32c vmwgfx drm_kms_helper ttm crc32c_intel drm e1000 mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi [last unloaded: nfsd]
      [  137.992336] CPU: 0 PID: 4514 Comm: rpc.nfsd Tainted: G            E   4.11.0-rc8+ #536
      [  137.992777] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
      [  137.993757] task: ffff955984101d00 task.stack: ffff9873c2604000
      [  137.994231] RIP: 0010:svc_pool_for_cpu+0x2b/0x80 [sunrpc]
      [  137.994768] RSP: 0018:ffff9873c2607c18 EFLAGS: 00010246
      [  137.995227] RAX: 0000000000000000 RBX: ffff95598376f000 RCX: 0000000000000002
      [  137.995673] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9559944aec00
      [  137.996156] RBP: ffff9873c2607c18 R08: ffff9559944aec28 R09: 0000000000000000
      [  137.996609] R10: 0000000001080002 R11: 0000000000000000 R12: ffff95598376f010
      [  137.997063] R13: ffff95598376f018 R14: ffff9559944aec28 R15: ffff9559944aec00
      [  137.997584] FS:  00007f755529eb40(0000) GS:ffff9559bb600000(0000) knlGS:0000000000000000
      [  137.998048] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  137.998548] CR2: 000055f3aecd9660 CR3: 0000000084290000 CR4: 00000000001406f0
      [  137.999052] Call Trace:
      [  137.999517]  svc_xprt_do_enqueue+0xef/0x260 [sunrpc]
      [  138.000028]  svc_xprt_received+0x47/0x90 [sunrpc]
      [  138.000487]  svc_add_new_perm_xprt+0x76/0x90 [sunrpc]
      [  138.000981]  svc_addsock+0x14b/0x200 [sunrpc]
      [  138.001424]  ? recalc_sigpending+0x1b/0x50
      [  138.001860]  ? __getnstimeofday64+0x41/0xd0
      [  138.002346]  ? do_gettimeofday+0x29/0x90
      [  138.002779]  write_ports+0x255/0x2c0 [nfsd]
      [  138.003202]  ? _copy_from_user+0x4e/0x80
      [  138.003676]  ? write_recoverydir+0x100/0x100 [nfsd]
      [  138.004098]  nfsctl_transaction_write+0x48/0x80 [nfsd]
      [  138.004544]  __vfs_write+0x37/0x160
      [  138.004982]  ? selinux_file_permission+0xd7/0x110
      [  138.005401]  ? security_file_permission+0x3b/0xc0
      [  138.005865]  vfs_write+0xb5/0x1a0
      [  138.006267]  SyS_write+0x55/0xc0
      [  138.006654]  entry_SYSCALL_64_fastpath+0x1a/0xa9
      [  138.007071] RIP: 0033:0x7f7554b9dc30
      [  138.007437] RSP: 002b:00007ffc9f92c788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  138.007807] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f7554b9dc30
      [  138.008168] RDX: 0000000000000002 RSI: 00005640cd536640 RDI: 0000000000000003
      [  138.008573] RBP: 00007ffc9f92c780 R08: 0000000000000001 R09: 0000000000000002
      [  138.008918] R10: 0000000000000064 R11: 0000000000000246 R12: 0000000000000004
      [  138.009254] R13: 00005640cdbf77a0 R14: 00005640cdbf7720 R15: 00007ffc9f92c238
      [  138.009610] Code: 0f 1f 44 00 00 48 8b 87 98 00 00 00 55 48 89 e5 48 83 78 08 00 74 10 8b 05 07 42 02 00 83 f8 01 74 40 83 f8 02 74 19 31 c0 31 d2 <f7> b7 88 00 00 00 5d 89 d0 48 c1 e0 07 48 03 87 90 00 00 00 c3
      [  138.010664] RIP: svc_pool_for_cpu+0x2b/0x80 [sunrpc] RSP: ffff9873c2607c18
      [  138.011061] ---[ end trace b3468224cafa7d11 ]---
      Signed-off-by: default avatarKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      887e338c
    • Eric Leblond's avatar
      netfilter: synproxy: fix conntrackd interaction · a3042f8a
      Eric Leblond authored
      commit 87e94dbc upstream.
      
      This patch fixes the creation of connection tracking entry from
      netlink when synproxy is used. It was missing the addition of
      the synproxy extension.
      
      This was causing kernel crashes when a conntrack entry created by
      conntrackd was used after the switch of traffic from active node
      to the passive node.
      Signed-off-by: default avatarEric Leblond <eric@regit.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a3042f8a
    • Serhey Popovych's avatar
      rtnetlink: add IFLA_GROUP to ifla_policy · f19613af
      Serhey Popovych authored
      
      [ Upstream commit db833d40 ]
      
      Network interface groups support added while ago, however
      there is no IFLA_GROUP attribute description in policy
      and netlink message size calculations until now.
      
      Add IFLA_GROUP attribute to the policy.
      
      Fixes: cbda10fa ("net_device: add support for network device groups")
      Signed-off-by: default avatarSerhey Popovych <serhe.popovych@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f19613af
    • Serhey Popovych's avatar
      ipv6: Do not leak throw route references · 4ade61f4
      Serhey Popovych authored
      
      [ Upstream commit 07f61557 ]
      
      While commit 73ba57bf ("ipv6: fix backtracking for throw routes")
      does good job on error propagation to the fib_rules_lookup()
      in fib rules core framework that also corrects throw routes
      handling, it does not solve route reference leakage problem
      happened when we return -EAGAIN to the fib_rules_lookup()
      and leave routing table entry referenced in arg->result.
      
      If rule with matched throw route isn't last matched in the
      list we overwrite arg->result losing reference on throw
      route stored previously forever.
      
      We also partially revert commit ab997ad4 ("ipv6: fix the
      incorrect return value of throw route") since we never return
      routing table entry with dst.error == -EAGAIN when
      CONFIG_IPV6_MULTIPLE_TABLES is on. Also there is no point
      to check for RTF_REJECT flag since it is always set throw
      route.
      
      Fixes: 73ba57bf ("ipv6: fix backtracking for throw routes")
      Signed-off-by: default avatarSerhey Popovych <serhe.popovych@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ade61f4
    • Gao Feng's avatar
      net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev · 16c4d1be
      Gao Feng authored
      
      [ Upstream commit 9745e362 ]
      
      The register_vlan_device would invoke free_netdev directly, when
      register_vlan_dev failed. It would trigger the BUG_ON in free_netdev
      if the dev was already registered. In this case, the netdev would be
      freed in netdev_run_todo later.
      
      So add one condition check now. Only when dev is not registered, then
      free it directly.
      
      The following is the part coredump when netdev_upper_dev_link failed
      in register_vlan_dev. I removed the lines which are too long.
      
      [  411.237457] ------------[ cut here ]------------
      [  411.237458] kernel BUG at net/core/dev.c:7998!
      [  411.237484] invalid opcode: 0000 [#1] SMP
      [  411.237705]  [last unloaded: 8021q]
      [  411.237718] CPU: 1 PID: 12845 Comm: vconfig Tainted: G            E   4.12.0-rc5+ #6
      [  411.237737] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
      [  411.237764] task: ffff9cbeb6685580 task.stack: ffffa7d2807d8000
      [  411.237782] RIP: 0010:free_netdev+0x116/0x120
      [  411.237794] RSP: 0018:ffffa7d2807dbdb0 EFLAGS: 00010297
      [  411.237808] RAX: 0000000000000002 RBX: ffff9cbeb6ba8fd8 RCX: 0000000000001878
      [  411.237826] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000000
      [  411.237844] RBP: ffffa7d2807dbdc8 R08: 0002986100029841 R09: 0002982100029801
      [  411.237861] R10: 0004000100029980 R11: 0004000100029980 R12: ffff9cbeb6ba9000
      [  411.238761] R13: ffff9cbeb6ba9060 R14: ffff9cbe60f1a000 R15: ffff9cbeb6ba9000
      [  411.239518] FS:  00007fb690d81700(0000) GS:ffff9cbebb640000(0000) knlGS:0000000000000000
      [  411.239949] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  411.240454] CR2: 00007f7115624000 CR3: 0000000077cdf000 CR4: 00000000003406e0
      [  411.240936] Call Trace:
      [  411.241462]  vlan_ioctl_handler+0x3f1/0x400 [8021q]
      [  411.241910]  sock_ioctl+0x18b/0x2c0
      [  411.242394]  do_vfs_ioctl+0xa1/0x5d0
      [  411.242853]  ? sock_alloc_file+0xa6/0x130
      [  411.243465]  SyS_ioctl+0x79/0x90
      [  411.243900]  entry_SYSCALL_64_fastpath+0x1e/0xa9
      [  411.244425] RIP: 0033:0x7fb69089a357
      [  411.244863] RSP: 002b:00007ffcd04e0fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
      [  411.245445] RAX: ffffffffffffffda RBX: 00007ffcd04e2884 RCX: 00007fb69089a357
      [  411.245903] RDX: 00007ffcd04e0fd0 RSI: 0000000000008983 RDI: 0000000000000003
      [  411.246527] RBP: 00007ffcd04e0fd0 R08: 0000000000000000 R09: 1999999999999999
      [  411.246976] R10: 000000000000053f R11: 0000000000000202 R12: 0000000000000004
      [  411.247414] R13: 00007ffcd04e1128 R14: 00007ffcd04e2888 R15: 0000000000000001
      [  411.249129] RIP: free_netdev+0x116/0x120 RSP: ffffa7d2807dbdb0
      Signed-off-by: default avatarGao Feng <gfree.wind@vip.163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16c4d1be