1. 12 Sep, 2018 1 commit
    • Andy Lutomirski's avatar
      x86/pti/64: Remove the SYSCALL64 entry trampoline · bf904d27
      Andy Lutomirski authored
      The SYSCALL64 trampoline has a couple of nice properties:
      
       - The usual sequence of SWAPGS followed by two GS-relative accesses to
         set up RSP is somewhat slow because the GS-relative accesses need
         to wait for SWAPGS to finish.  The trampoline approach allows
         RIP-relative accesses to set up RSP, which avoids the stall.
      
       - The trampoline avoids any percpu access before CR3 is set up,
         which means that no percpu memory needs to be mapped in the user
         page tables.  This prevents using Meltdown to read any percpu memory
         outside the cpu_entry_area and prevents using timing leaks
         to directly locate the percpu areas.
      
      The downsides of using a trampoline may outweigh the upsides, however.
      It adds an extra non-contiguous I$ cache line to system calls, and it
      forces an indirect jump to transfer control back to the normal kernel
      text after CR3 is set up.  The latter is because x86 lacks a 64-bit
      direct jump instruction that could jump from the trampoline to the entry
      text.  With retpolines enabled, the indirect jump is extremely slow.
      
      Change the code to map the percpu TSS into the user page tables to allow
      the non-trampoline SYSCALL64 path to work under PTI.  This does not add a
      new direct information leak, since the TSS is readable by Meltdown from the
      cpu_entry_area alias regardless.  It does allow a timing attack to locate
      the percpu area, but KASLR is more or less a lost cause against local
      attack on CPUs vulnerable to Meltdown regardless.  As far as I'm concerned,
      on current hardware, KASLR is only useful to mitigate remote attacks that
      try to attack the kernel without first gaining RCE against a vulnerable
      user process.
      
      On Skylake, with CONFIG_RETPOLINE=y and KPTI on, this reduces syscall
      overhead from ~237ns to ~228ns.
      
      There is a possible alternative approach: Move the trampoline within 2G of
      the entry text and make a separate copy for each CPU.  This would allow a
      direct jump to rejoin the normal entry path. There are pro's and con's for
      this approach:
      
       + It avoids a pipeline stall
      
       - It executes from an extra page and read from another extra page during
         the syscall. The latter is because it needs to use a relative
         addressing mode to find sp1 -- it's the same *cacheline*, but accessed
         using an alias, so it's an extra TLB entry.
      
       - Slightly more memory. This would be one page per CPU for a simple
         implementation and 64-ish bytes per CPU or one page per node for a more
         complex implementation.
      
       - More code complexity.
      
      The current approach is chosen for simplicity and because the alternative
      does not provide a significant benefit, which makes it worth.
      
      [ tglx: Added the alternative discussion to the changelog ]
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/8c7c6e483612c3e4e10ca89495dc160b1aa66878.1536015544.git.luto@kernel.org
      bf904d27
  2. 08 Sep, 2018 2 commits
  3. 06 Sep, 2018 4 commits
    • Linus Torvalds's avatar
      Merge tag 'apparmor-pr-2018-09-06' of... · db44bf4b
      Linus Torvalds authored
      Merge tag 'apparmor-pr-2018-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
      
      Pull apparmor fix from John Johansen:
       "A fix for an issue syzbot discovered last week:
      
         - Fix for bad debug check when converting secids to secctx"
      
      * tag 'apparmor-pr-2018-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
        apparmor: fix bad debug check in apparmor_secid_to_secctx()
      db44bf4b
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · be65e259
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "This fixes two annoying bugs:
      
         - The first one is a side effect caused by using SRCU for rcuidle
           tracepoints. It seems that the perf was depending on the rcuidle
           tracepoints to make RCU watch when it wasn't.
      
           The real fix will be to have perf use SRCU instead of depending on
           RCU watching, but that can't be done until SRCU is safe to use in
           NMI context (Paul's working on that).
      
         - The second bug fix is for a bug that's been periodically making my
           tests fail randomly for some time. I haven't had time to track it
           down, but finally have. It has to do with stressing NMIs (via perf)
           while enabling or disabling ftrace function handling with lockdep
           enabled.
      
           If an interrupt happens and just as it returns, it sets lockdep
           back to "interrupts enabled" but before it returns an NMI is
           triggered, and if this happens while printk_nmi_enter has a
           breakpoint attached to it (because ftrace is converting it to or
           from nop to call fentry), the breakpoint trap also calls into
           lockdep, and since returning from the NMI to a interrupt handler,
           interrupts were disabled when the NMI went off, lockdep keeps its
           state as interrupts disabled when it returns back from the
           interrupt handler where interrupts are enabled.
      
           This causes lockdep_assert_irqs_enabled() to trigger a false
           positive"
      
      * tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        printk/tracing: Do not trace printk_nmi_enter()
        tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints
      be65e259
    • Linus Torvalds's avatar
      Merge tag 'for-4.19-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 5404525b
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
      
       - fix for improper fsync after hardlink
      
       - fix for a corruption during file deduplication
      
       - use after free fixes
      
       - RCU warning fix
      
       - fix for buffered write to nodatacow file
      
      * tag 'for-4.19-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: Fix suspicious RCU usage warning in btrfs_debug_in_rcu
        btrfs: use after free in btrfs_quota_enable
        btrfs: btrfs_shrink_device should call commit transaction at the end
        btrfs: fix qgroup_free wrong num_bytes in btrfs_subvolume_reserve_metadata
        Btrfs: fix data corruption when deduplicating between different files
        Btrfs: sync log after logging new name
        Btrfs: fix unexpected failure of nocow buffered writes after snapshotting when low on space
      5404525b
    • Steven Rostedt (VMware)'s avatar
      printk/tracing: Do not trace printk_nmi_enter() · d1c392c9
      Steven Rostedt (VMware) authored
      I hit the following splat in my tests:
      
      ------------[ cut here ]------------
      IRQs not enabled as expected
      WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c
      Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
      CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
      Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
      EIP: tick_nohz_idle_enter+0x44/0x8c
      Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
      75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0
      e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
      EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007
      ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38
      DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
      CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
      Call Trace:
       do_idle+0x33/0x202
       cpu_startup_entry+0x61/0x63
       start_secondary+0x18e/0x1ed
       startup_32_smp+0x164/0x168
      irq event stamp: 18773830
      hardirqs last  enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10
      hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10
      softirqs last  enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf
      softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b
      ---[ end trace b7c64aa79e17954a ]---
      
      After a bit of debugging, I found what was happening. This would trigger
      when performing "perf" with a high NMI interrupt rate, while enabling and
      disabling function tracer. Ftrace uses breakpoints to convert the nops at
      the start of functions to calls to the function trampolines. The breakpoint
      traps disable interrupts and this makes calls into lockdep via the
      trace_hardirqs_off_thunk in the entry.S code. What happens is the following:
      
        do_idle {
      
          [interrupts enabled]
      
          <interrupt> [interrupts disabled]
      	TRACE_IRQS_OFF [lockdep says irqs off]
      	[...]
      	TRACE_IRQS_IRET
      	    test if pt_regs say return to interrupts enabled [yes]
      	    TRACE_IRQS_ON [lockdep says irqs are on]
      
      	    <nmi>
      		nmi_enter() {
      		    printk_nmi_enter() [traced by ftrace]
      		    [ hit ftrace breakpoint ]
      		    <breakpoint exception>
      			TRACE_IRQS_OFF [lockdep says irqs off]
      			[...]
      			TRACE_IRQS_IRET [return from breakpoint]
      			   test if pt_regs say interrupts enabled [no]
      			   [iret back to interrupt]
      	   [iret back to code]
      
          tick_nohz_idle_enter() {
      
      	lockdep_assert_irqs_enabled() [lockdep say no!]
      
      Although interrupts are indeed enabled, lockdep thinks it is not, and since
      we now do asserts via lockdep, it gives a false warning. The issue here is
      that printk_nmi_enter() is called before lockdep_off(), which disables
      lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
      printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
      confused.
      
      Cc: stable@vger.kernel.org
      Fixes: 42a0bb3f ("printk/nmi: generic solution for safe printk in NMI")
      Acked-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: default avatarPetr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      d1c392c9
  4. 05 Sep, 2018 6 commits
    • Linus Torvalds's avatar
      Merge tag 'gpio-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · b36fdc68
      Linus Torvalds authored
      Pull GPIO fixes from Linus Walleij:
       "Some GPIO fixes. The ACPI stuff is probably the most annoying for
        users that get fixed this time.
      
         - Atomic contexts, cansleep* calls and such fastpath/slopwpath
           things.
      
         - Defer ACPI event handler registration to late_initcall() so IRQs do
           not fire in our face before other drivers have a chance to register
           handlers.
      
         - Race condition if a consumer requests a GPIO after
           gpiochip_add_data_with_key() but before of_gpiochip_add()
      
         - Probe errorpath in the dwapb driver"
      
      * tag 'gpio-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpio: Fix crash due to registration race
        gpio: dwapb: Fix error handling in dwapb_gpio_probe()
        gpiolib-acpi: Register GpioInt ACPI event handlers from a late_initcall
        gpiolib: acpi: Switch to cansleep version of GPIO library call
        gpio: adp5588: Fix sleep-in-atomic-context bug
      b36fdc68
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · f4697d9a
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "A set of very minor fixes and a couple of reverts to fix a major
        problem (the attempt to change the busy count causes a hang when
        attempting to change the drive cache type)"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: aacraid: fix a signedness bug
        Revert "scsi: core: avoid host-wide host_busy counter for scsi_mq"
        Revert "scsi: core: fix scsi_host_queue_ready"
        scsi: libata: Add missing newline at end of file
        scsi: target: iscsi: cxgbit: use pr_debug() instead of pr_info()
        scsi: hpsa: limit transfer length to 1MB, not 512kB
        scsi: lpfc: Correct MDS diag and nvmet configuration
        scsi: lpfc: Default fdmi_on to on
        scsi: csiostor: fix incorrect port capabilities
        scsi: csiostor: add a check for NULL pointer after kmalloc()
        scsi: documentation: add scsi_mod.use_blk_mq to scsi-parameters
        scsi: core: Update SCSI_MQ_DEFAULT help text to match default
      f4697d9a
    • Linus Torvalds's avatar
      Merge tag 'nds32-for-linus-4.19-tag1' of... · d0c1db1d
      Linus Torvalds authored
      Merge tag 'nds32-for-linus-4.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux
      
      Pull nds32 updates from Greentime Hu:
       "Contained in here are the bug fixes, building error fixes and ftrace
        support for nds32"
      
      * tag 'nds32-for-linus-4.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux:
        nds32: linker script: GCOV kernel may refers data in __exit
        nds32: fix build error because of wrong semicolon
        nds32: Fix a kernel panic issue because of wrong frame pointer access.
        nds32: Only print one page of stack when die to prevent printing too much information.
        nds32: Add macro definition for offset of lp register on stack
        nds32: Remove the deprecated ABI implementation
        nds32/stack: Get real return address by using ftrace_graph_ret_addr
        nds32/ftrace: Support dynamic function graph tracer
        nds32/ftrace: Support dynamic function tracer
        nds32/ftrace: Add RECORD_MCOUNT support
        nds32/ftrace: Support static function graph tracer
        nds32/ftrace: Support static function tracer
        nds32: Extract the checking and getting pointer to a macro
        nds32: Clean up the coding style
        nds32: Fix get_user/put_user macro expand pointer problem
        nds32: Fix empty call trace
        nds32: add NULL entry to the end of_device_id array
        nds32: fix logic for module
      d0c1db1d
    • Steven Rostedt (VMware)'s avatar
      tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints · 865e63b0
      Steven Rostedt (VMware) authored
      Borislav reported the following splat:
      
       =============================
       WARNING: suspicious RCU usage
       4.19.0-rc1+ #1 Not tainted
       -----------------------------
       ./include/linux/rcupdate.h:631 rcu_read_lock() used illegally while idle!
       other info that might help us debug this:
      
       RCU used illegally from idle CPU!
       rcu_scheduler_active = 2, debug_locks = 1
       RCU used illegally from extended quiescent state!
       1 lock held by swapper/0/0:
        #0: 000000004557ee0e (rcu_read_lock){....}, at: perf_event_output_forward+0x0/0x130
      
       stack backtrace:
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc1+ #1
       Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012
       Call Trace:
        dump_stack+0x85/0xcb
        perf_event_output_forward+0xf6/0x130
        __perf_event_overflow+0x52/0xe0
        perf_swevent_overflow+0x91/0xb0
        perf_tp_event+0x11a/0x350
        ? find_held_lock+0x2d/0x90
        ? __lock_acquire+0x2ce/0x1350
        ? __lock_acquire+0x2ce/0x1350
        ? retint_kernel+0x2d/0x2d
        ? find_held_lock+0x2d/0x90
        ? tick_nohz_get_sleep_length+0x83/0xb0
        ? perf_trace_cpu+0xbb/0xd0
        ? perf_trace_buf_alloc+0x5a/0xa0
        perf_trace_cpu+0xbb/0xd0
        cpuidle_enter_state+0x185/0x340
        do_idle+0x1eb/0x260
        cpu_startup_entry+0x5f/0x70
        start_kernel+0x49b/0x4a6
        secondary_startup_64+0xa4/0xb0
      
      This is due to the tracepoints moving to SRCU usage which does not require
      RCU to be "watching". But perf uses these tracepoints with RCU and expects
      it to be. Hence, we still need to add in the rcu_irq_enter/exit_irqson()
      calls for "rcuidle" tracepoints. This is a temporary fix until we have SRCU
      working in NMI context, and then perf can be converted to use that instead
      of normal RCU.
      
      Link: http://lkml.kernel.org/r/20180904162611.6a120068@gandalf.local.home
      
      Cc: x86-ml <x86@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reported-by: default avatarBorislav Petkov <bp@alien8.de>
      Tested-by: default avatarBorislav Petkov <bp@alien8.de>
      Reviewed-by: default avatar"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Fixes: e6753f23 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      865e63b0
    • Greentime Hu's avatar
      nds32: linker script: GCOV kernel may refers data in __exit · 3350139c
      Greentime Hu authored
      This patch is used to fix nds32 allmodconfig/allyesconfig build error
      because GCOV kernel embeds counters in the kernel for each line
      and a part of that embed in __exit text. So we need to keep the
      EXIT_TEXT and EXIT_DATA  if CONFIG_GCOV_KERNEL=y.
      
      Link: https://lkml.org/lkml/2018/9/1/125Signed-off-by: default avatarGreentime Hu <greentime@andestech.com>
      Reviewed-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      3350139c
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 0e9b1039
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "17 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        nilfs2: convert to SPDX license tags
        drivers/dax/device.c: convert variable to vm_fault_t type
        lib/Kconfig.debug: fix three typos in help text
        checkpatch: add __ro_after_init to known $Attribute
        mm: fix BUG_ON() in vmf_insert_pfn_pud() from VM_MIXEDMAP removal
        uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name
        memory_hotplug: fix kernel_panic on offline page processing
        checkpatch: add optional static const to blank line declarations test
        ipc/shm: properly return EIDRM in shm_lock()
        mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.
        mm/util.c: improve kvfree() kerneldoc
        tools/vm/page-types.c: fix "defined but not used" warning
        tools/vm/slabinfo.c: fix sign-compare warning
        kmemleak: always register debugfs file
        mm: respect arch_dup_mmap() return value
        mm, oom: fix missing tlb_finish_mmu() in __oom_reap_task_mm().
        mm: memcontrol: print proper OOM header when no eligible victim left
      0e9b1039
  5. 04 Sep, 2018 27 commits