1. 27 Apr, 2006 2 commits
    • Zachary Amsden's avatar
      [PATCH] x86/PAE: Fix pte_clear for the >4GB RAM case · 6e5882cf
      Zachary Amsden authored
      Proposed fix for ptep_get_and_clear_full PAE bug.  Pte_clear had the same bug,
      so use the same fix for both.  Turns out pmd_clear had it as well, but pgds
      are not affected.
      
      The problem is rather intricate.  Page table entries in PAE mode are 64-bits
      wide, but the only atomic 8-byte write operation available in 32-bit mode is
      cmpxchg8b, which is expensive (at least on P4), and thus avoided.  But it can
      happen that the processor may prefetch entries into the TLB in the middle of an
      operation which clears a page table entry.  So one must always clear the P-bit
      in the low word of the page table entry first when clearing it.
      
      Since the sequence *ptep = __pte(0) leaves the order of the write dependent on
      the compiler, it must be coded explicitly as a clear of the low word followed
      by a clear of the high word.  Further, there must be a write memory barrier
      here to enforce proper ordering by the compiler (and, in the future, by the
      processor as well).
      
      On > 4GB memory machines, the implementation of pte_clear for PAE was clearly
      deficient, as it could leave virtual mappings of physical memory above 4GB
      aliased to memory below 4GB in the TLB.  The implementation of
      ptep_get_and_clear_full has a similar bug, although not nearly as likely to
      occur, since the mappings being cleared are in the process of being destroyed,
      and should never be dereferenced again.
      
      But, as luck would have it, it is possible to trigger bugs even without ever
      dereferencing these bogus TLB mappings, even if the clear is followed fairly
      soon after with a TLB flush or invalidation.  The problem is that memory above
      4GB may now be aliased into the first 4GB of memory, and in fact, may hit a
      region of memory with non-memory semantics.  These regions include AGP and PCI
      space.  As such, these memory regions are not cached by the processor.  This
      introduces the bug.
      
      The processor can speculate memory operations, including memory writes, as long
      as they are committed with the proper ordering.  Speculating a memory write to
      a linear address that has a bogus TLB mapping is possible.  Normally, the
      speculation is harmless.  But for cached memory, it does leave the falsely
      speculated cacheline unmodified, but in a dirty state.  This cache line will be
      eventually written back.  If this cacheline happens to intersect a region of
      memory that is not protected by the cache coherency protocol, it can corrupt
      data in I/O memory, which is generally a very bad thing to do, and can cause
      total system failure or just plain undefined behavior.
      
      These bugs are extremely unlikely, but the severity is of such magnitude, and
      the fix so simple that I think fixing them immediately is justified.  Also,
      they are nearly impossible to debug.
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6e5882cf
    • Linus Torvalds's avatar
      Linux v2.6.17-rc3 · 2be4d502
      Linus Torvalds authored
      2be4d502
  2. 26 Apr, 2006 33 commits
  3. 25 Apr, 2006 5 commits