• Linus Torvalds's avatar
    Merge tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 076f14be
    Linus Torvalds authored
    Pull x86 entry updates from Thomas Gleixner:
     "The x86 entry, exception and interrupt code rework
    
      This all started about 6 month ago with the attempt to move the Posix
      CPU timer heavy lifting out of the timer interrupt code and just have
      lockless quick checks in that code path. Trivial 5 patches.
    
      This unearthed an inconsistency in the KVM handling of task work and
      the review requested to move all of this into generic code so other
      architectures can share.
    
      Valid request and solved with another 25 patches but those unearthed
      inconsistencies vs. RCU and instrumentation.
    
      Digging into this made it obvious that there are quite some
      inconsistencies vs. instrumentation in general. The int3 text poke
      handling in particular was completely unprotected and with the batched
      update of trace events even more likely to expose to endless int3
      recursion.
    
      In parallel the RCU implications of instrumenting fragile entry code
      came up in several discussions.
    
      The conclusion of the x86 maintainer team was to go all the way and
      make the protection against any form of instrumentation of fragile and
      dangerous code pathes enforcable and verifiable by tooling.
    
      A first batch of preparatory work hit mainline with commit
      d5f744f9 ("Pull x86 entry code updates from Thomas Gleixner")
    
      That (almost) full solution introduced a new code section
      '.noinstr.text' into which all code which needs to be protected from
      instrumentation of all sorts goes into. Any call into instrumentable
      code out of this section has to be annotated. objtool has support to
      validate this.
    
      Kprobes now excludes this section fully which also prevents BPF from
      fiddling with it and all 'noinstr' annotated functions also keep
      ftrace off. The section, kprobes and objtool changes are already
      merged.
    
      The major changes coming with this are:
    
        - Preparatory cleanups
    
        - Annotating of relevant functions to move them into the
          noinstr.text section or enforcing inlining by marking them
          __always_inline so the compiler cannot misplace or instrument
          them.
    
        - Splitting and simplifying the idtentry macro maze so that it is
          now clearly separated into simple exception entries and the more
          interesting ones which use interrupt stacks and have the paranoid
          handling vs. CR3 and GS.
    
        - Move quite some of the low level ASM functionality into C code:
    
           - enter_from and exit to user space handling. The ASM code now
             calls into C after doing the really necessary ASM handling and
             the return path goes back out without bells and whistels in
             ASM.
    
           - exception entry/exit got the equivivalent treatment
    
           - move all IRQ tracepoints from ASM to C so they can be placed as
             appropriate which is especially important for the int3
             recursion issue.
    
        - Consolidate the declaration and definition of entry points between
          32 and 64 bit. They share a common header and macros now.
    
        - Remove the extra device interrupt entry maze and just use the
          regular exception entry code.
    
        - All ASM entry points except NMI are now generated from the shared
          header file and the corresponding macros in the 32 and 64 bit
          entry ASM.
    
        - The C code entry points are consolidated as well with the help of
          DEFINE_IDTENTRY*() macros. This allows to ensure at one central
          point that all corresponding entry points share the same
          semantics. The actual function body for most entry points is in an
          instrumentable and sane state.
    
          There are special macros for the more sensitive entry points, e.g.
          INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF.
          They allow to put the whole entry instrumentation and RCU handling
          into safe places instead of the previous pray that it is correct
          approach.
    
        - The INT3 text poke handling is now completely isolated and the
          recursion issue banned. Aside of the entry rework this required
          other isolation work, e.g. the ability to force inline bsearch.
    
        - Prevent #DB on fragile entry code, entry relevant memory and
          disable it on NMI, #MC entry, which allowed to get rid of the
          nested #DB IST stack shifting hackery.
    
        - A few other cleanups and enhancements which have been made
          possible through this and already merged changes, e.g.
          consolidating and further restricting the IDT code so the IDT
          table becomes RO after init which removes yet another popular
          attack vector
    
        - About 680 lines of ASM maze are gone.
    
      There are a few open issues:
    
       - An escape out of the noinstr section in the MCE handler which needs
         some more thought but under the aspect that MCE is a complete
         trainwreck by design and the propability to survive it is low, this
         was not high on the priority list.
    
       - Paravirtualization
    
         When PV is enabled then objtool complains about a bunch of indirect
         calls out of the noinstr section. There are a few straight forward
         ways to fix this, but the other issues vs. general correctness were
         more pressing than parawitz.
    
       - KVM
    
         KVM is inconsistent as well. Patches have been posted, but they
         have not yet been commented on or picked up by the KVM folks.
    
       - IDLE
    
         Pretty much the same problems can be found in the low level idle
         code especially the parts where RCU stopped watching. This was
         beyond the scope of the more obvious and exposable problems and is
         on the todo list.
    
      The lesson learned from this brain melting exercise to morph the
      evolved code base into something which can be validated and understood
      is that once again the violation of the most important engineering
      principle "correctness first" has caused quite a few people to spend
      valuable time on problems which could have been avoided in the first
      place. The "features first" tinkering mindset really has to stop.
    
      With that I want to say thanks to everyone involved in contributing to
      this effort. Special thanks go to the following people (alphabetical
      order): Alexandre Chartre, Andy Lutomirski, Borislav Petkov, Brian
      Gerst, Frederic Weisbecker, Josh Poimboeuf, Juergen Gross, Lai
      Jiangshan, Macro Elver, Paolo Bonzin,i Paul McKenney, Peter Zijlstra,
      Vitaly Kuznetsov, and Will Deacon"
    
    * tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (142 commits)
      x86/entry: Force rcu_irq_enter() when in idle task
      x86/entry: Make NMI use IDTENTRY_RAW
      x86/entry: Treat BUG/WARN as NMI-like entries
      x86/entry: Unbreak __irqentry_text_start/end magic
      x86/entry: __always_inline CR2 for noinstr
      lockdep: __always_inline more for noinstr
      x86/entry: Re-order #DB handler to avoid *SAN instrumentation
      x86/entry: __always_inline arch_atomic_* for noinstr
      x86/entry: __always_inline irqflags for noinstr
      x86/entry: __always_inline debugreg for noinstr
      x86/idt: Consolidate idt functionality
      x86/idt: Cleanup trap_init()
      x86/idt: Use proper constants for table size
      x86/idt: Add comments about early #PF handling
      x86/idt: Mark init only functions __init
      x86/entry: Rename trace_hardirqs_off_prepare()
      x86/entry: Clarify irq_{enter,exit}_rcu()
      x86/entry: Remove DBn stacks
      x86/entry: Remove debug IDT frobbing
      x86/entry: Optimize local_db_save() for virt
      ...
    076f14be
kvm.c 21.2 KB