1. 27 Dec, 2017 1 commit
  2. 19 Dec, 2017 1 commit
    • Josh Poimboeuf's avatar
      x86/stacktrace: Make zombie stack traces reliable · 6454b3bd
      Josh Poimboeuf authored
      Commit:
      
        1959a601 ("x86/dumpstack: Pin the target stack when dumping it")
      
      changed the behavior of stack traces for zombies.  Before that commit,
      /proc/<pid>/stack reported the last execution path of the zombie before
      it died:
      
        [<ffffffff8105b877>] do_exit+0x6f7/0xa80
        [<ffffffff8105bc79>] do_group_exit+0x39/0xa0
        [<ffffffff8105bcf0>] __wake_up_parent+0x0/0x30
        [<ffffffff8152dd09>] system_call_fastpath+0x16/0x1b
        [<00007fd128f9c4f9>] 0x7fd128f9c4f9
        [<ffffffffffffffff>] 0xffffffffffffffff
      
      After the commit, it just reports an empty stack trace.
      
      The new behavior is actually probably more correct.  If the stack
      refcount has gone down to zero, then the task has already gone through
      do_exit() and isn't going to run anymore.  The stack could be freed at
      any time and is basically gone, so reporting an empty stack makes sense.
      
      However, save_stack_trace_tsk_reliable() treats such a missing stack
      condition as an error.  That can cause livepatch transition stalls if
      there are any unreaped zombies.  Instead, just treat it as a reliable,
      empty stack.
      Reported-and-tested-by: default avatarMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Fixes: af085d90 ("stacktrace/x86: add function for detecting reliable stack traces")
      Link: http://lkml.kernel.org/r/e4b09e630e99d0c1080528f0821fc9d9dbaeea82.1513631620.git.jpoimboe@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6454b3bd
  3. 18 Dec, 2017 1 commit
    • Tom Lendacky's avatar
      x86/mm: Unbreak modules that use the DMA API · 9d5f38ba
      Tom Lendacky authored
      Commit d8aa7eea ("x86/mm: Add Secure Encrypted Virtualization (SEV)
      support") changed sme_active() from an inline function that referenced
      sme_me_mask to a non-inlined function in order to make the sev_enabled
      variable a static variable.  This function was marked EXPORT_SYMBOL_GPL
      because at the time the patch was submitted, sme_me_mask was marked
      EXPORT_SYMBOL_GPL.
      
      Commit 87df2617 ("x86/mm: Unbreak modules that rely on external
      PAGE_KERNEL availability") changed sme_me_mask variable from
      EXPORT_SYMBOL_GPL to EXPORT_SYMBOL, allowing external modules the ability
      to build with CONFIG_AMD_MEM_ENCRYPT=y.  Now, however, with sev_active()
      no longer an inline function and marked as EXPORT_SYMBOL_GPL, external
      modules that use the DMA API are once again broken in 4.15. Since the DMA
      API is meant to be used by external modules, this needs to be changed.
      
      Change the sme_active() and sev_active() functions from EXPORT_SYMBOL_GPL
      to EXPORT_SYMBOL.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Link: https://lkml.kernel.org/r/20171215162011.14125.7113.stgit@tlendack-t1.amdoffice.net
      9d5f38ba
  4. 16 Dec, 2017 1 commit
    • Matthew Wilcox's avatar
      x86/build: Make isoimage work on Debian · 5f0e3fe6
      Matthew Wilcox authored
      Debian does not ship a 'mkisofs' symlink to genisoimage.  All modern
      distros ship genisoimage, so just use that directly.  That requires
      renaming the 'genisoimage' function.  Also neaten up the 'for' loop
      while I'm in here.
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5f0e3fe6
  5. 15 Dec, 2017 6 commits
  6. 11 Dec, 2017 3 commits
  7. 07 Dec, 2017 3 commits
  8. 06 Dec, 2017 6 commits
  9. 28 Nov, 2017 1 commit
  10. 27 Nov, 2017 1 commit
  11. 25 Nov, 2017 2 commits
    • Nadav Amit's avatar
      x86/tlb: Disable interrupts when changing CR4 · 9d0b6232
      Nadav Amit authored
      CR4 modifications are implemented as RMW operations which update a shadow
      variable and write the result to CR4. The RMW operation is protected by
      preemption disable, but there is no enforcement or debugging mechanism.
      
      CR4 modifications happen also in interrupt context via
      __native_flush_tlb_global(). This implementation does not affect a
      interrupted thread context CR4 operation, because the CR4 toggle restores
      the original content and does not modify the shadow variable.
      
      So the current situation seems to be safe, but a recent patch tried to add
      an actual RMW operation in interrupt context, which will cause subtle
      corruptions.
      
      To prevent that and make the CR4 handling future proof:
      
       - Add a lockdep assertion to __cr4_set() which will catch interrupt
         enabled invocations
      
       - Disable interrupts in the cr4 manipulator inlines
      
       - Rename cr4_toggle_bits() to cr4_toggle_bits_irqsoff(). This is called
         from __switch_to_xtra() where interrupts are already disabled and
         performance matters.
      
      All other call sites are not performance critical, so the extra overhead of
      an additional local_irq_save/restore() pair is not a problem. If new call
      sites care about performance then the necessary _irqsoff() variants can be
      added.
      
      [ tglx: Condensed the patch by moving the irq protection inside the
        	manipulator functions. Updated changelog ]
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Luck <tony.luck@intel.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: nadav.amit@gmail.com
      Cc: linux-edac@vger.kernel.org
      Link: https://lkml.kernel.org/r/20171125032907.2241-3-namit@vmware.com
      9d0b6232
    • Nadav Amit's avatar
      x86/tlb: Refactor CR4 setting and shadow write · 0c3292ca
      Nadav Amit authored
      Refactor the write to CR4 and its shadow value. This is done in
      preparation for the addition of an assertion to check that IRQs are
      disabled during CR4 update.
      
      No functional change.
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: nadav.amit@gmail.com
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: linux-edac@vger.kernel.org
      Link: https://lkml.kernel.org/r/20171125032907.2241-2-namit@vmware.com
      0c3292ca
  12. 24 Nov, 2017 1 commit
    • Masami Hiramatsu's avatar
      x86/decoder: Add new TEST instruction pattern · 12a78d43
      Masami Hiramatsu authored
      The kbuild test robot reported this build warning:
      
        Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c
      
        Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx)
        Warning: objdump says 3 bytes, but insn_get_length() says 2
        Warning: decoded and checked 1569014 instructions with 1 warnings
      
      This sequence seems to be a new instruction not in the opcode map in the Intel SDM.
      
      The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8.
      Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of
      the ModR/M Byte (bits 2,1,0 in parenthesis)"
      
      In that table, opcodes listed by the index REG bits as:
      
        000         001       010 011  100        101        110         111
       TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX
      
      So, it seems TEST Ib is assigned to 001.
      
      Add the new pattern.
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      12a78d43
  13. 23 Nov, 2017 4 commits
  14. 22 Nov, 2017 2 commits
    • Andrey Ryabinin's avatar
      x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow · f68d62a5
      Andrey Ryabinin authored
      [ Note, this commit is a cherry-picked version of:
      
          d17a1d97: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")
      
        ... for easier x86 entry code testing and back-porting. ]
      
      The KASAN shadow is currently mapped using vmemmap_populate() since that
      provides a semi-convenient way to map pages into init_top_pgt.  However,
      since that no longer zeroes the mapped pages, it is not suitable for
      KASAN, which requires zeroed shadow memory.
      
      Add kasan_populate_shadow() interface and use it instead of
      vmemmap_populate().  Besides, this allows us to take advantage of
      gigantic pages and use them to populate the shadow, which should save us
      some memory wasted on page tables and reduce TLB pressure.
      
      Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.comSigned-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f68d62a5
    • Andy Lutomirski's avatar
      x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing · 548c3050
      Andy Lutomirski authored
      When I added entry_SYSCALL_64_after_hwframe(), I left TRACE_IRQS_OFF
      before it.  This means that users of entry_SYSCALL_64_after_hwframe()
      were responsible for invoking TRACE_IRQS_OFF, and the one and only
      user (Xen, added in the same commit) got it wrong.
      
      I think this would manifest as a warning if a Xen PV guest with
      CONFIG_DEBUG_LOCKDEP=y were used with context tracking.  (The
      context tracking bit is to cause lockdep to get invoked before we
      turn IRQs back on.)  I haven't tested that for real yet because I
      can't get a kernel configured like that to boot at all on Xen PV.
      
      Move TRACE_IRQS_OFF below the label.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: 8a9949bc ("x86/xen/64: Rearrange the SYSCALL entries")
      Link: http://lkml.kernel.org/r/9150aac013b7b95d62c2336751d5b6e91d2722aa.1511325444.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      548c3050
  15. 21 Nov, 2017 5 commits
  16. 17 Nov, 2017 2 commits
    • Prarit Bhargava's avatar
      x86/smpboot: Fix __max_logical_packages estimate · b4c0a732
      Prarit Bhargava authored
      A system booted with a small number of cores enabled per package
      panics because the estimate of __max_logical_packages is too low.
      
      This occurs when the total number of active cores across all packages is
      less than the maximum core count for a single package. e.g.:
      
        On a 4 package system with 20 cores/package where only 4 cores are
        enabled on each package, the value of __max_logical_packages is
        calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4.
      
      Calculate __max_logical_packages after the cpu enumeration has completed.
      Use the boot cpu's data to extrapolate the number of packages.
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: He Chen <he.chen@linux.intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Piotr Luc <piotr.luc@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: https://lkml.kernel.org/r/20171114124257.22013-4-prarit@redhat.com
      b4c0a732
    • Andi Kleen's avatar
      x86/topology: Avoid wasting 128k for package id array · 30bb9811
      Andi Kleen authored
      Analyzing large early boot allocations unveiled the logical package id
      storage as a prominent memory waste. Since commit 1f12e32f
      ("x86/topology: Create logical package id") every 64-bit system allocates a
      128k array to convert logical package ids.
      
      This happens because the array is sized for MAX_LOCAL_APIC which is always
      32k on 64bit systems, and it needs 4 bytes for each entry.
      
      This is fairly wasteful, especially for the common case of having only one
      socket, which uses exactly 4 byte out of 128K.
      
      There is no user of the package id map which is performance critical, so
      the lookup is not required to be O(1). Store the logical processor id in
      cpu_data and use a loop based lookup.
      
      To keep the mapping stable accross cpu hotplug operations, add a flag to
      cpu_data which is set when the CPU is brought up the first time. When the
      flag is set, then cpu_data is not reinitialized by copying boot_cpu_data on
      subsequent bringups.
      
      [ tglx: Rename the flag to 'initialized', use proper pointers instead of
        	repeated cpu_data(x) evaluation and massage changelog. ]
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: He Chen <he.chen@linux.intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Piotr Luc <piotr.luc@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: https://lkml.kernel.org/r/20171114124257.22013-3-prarit@redhat.com
      30bb9811