1. 30 Jan, 2017 3 commits
    • David S. Miller's avatar
      Merge branch 'sparc64-non-resumable-user-error-recovery' · 54791b27
      David S. Miller authored
      Liam R. Howlett says:
      
      ====================
      sparc64: Recover from userspace non-resumable PIO & MEM errors
      
      A non-resumable error from userspace is able to cause a kernel panic or trap
      loop due to the setup and handling of the queued traps once in the kernel.
      This patch series addresses both of these issues.
      
      The queues are fixed by simply zeroing the memory before use.
      
      PIO errors from userspace will result in a SIGBUS being sent to the user
      process.
      
      The MEM errors form userspace will result in a SIGKILL and also cause the
      offending pages to be claimed so they are no longer used in future tasks.
      SIGKILL is used to ensure that the process does not try to coredump and result
      in an attempt to read the memory again from within kernel space.  Although
      there is a HV call to scrub the memory (mem_scrub), there is no easy way to
      guarantee that the real memory address(es) are not used by other tasks.
      Clearing the error with mem_scrub would zero the memory and cause the other
      processes to proceed with bad data.
      
      The handling of other non-resumable errors remain unchanged and will cause a
      panic.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54791b27
    • Liam R. Howlett's avatar
      sparc64: Handle PIO & MEM non-resumable errors. · 04748724
      Liam R. Howlett authored
      User processes trying to access an invalid memory address via PIO will
      receive a SIGBUS signal instead of causing a panic.  Memory errors will
      receive a SIGKILL since a SIGBUS may result in a coredump which may
      attempt to repeat the faulting access.
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@Oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04748724
    • Liam R. Howlett's avatar
      sparc64: Zero pages on allocation for mondo and error queues. · 7a7dc961
      Liam R. Howlett authored
      Error queues use a non-zero first word to detect if the queues are full.
      Using pages that have not been zeroed may result in false positive
      overflow events.  These queues are set up once during boot so zeroing
      all mondo and error queue pages is safe.
      
      Note that the false positive overflow does not always occur because the
      page allocation for these queues is so early in the boot cycle that
      higher number CPUs get fresh pages.  It is only when traps are serviced
      with lower number CPUs who were given already used pages that this issue
      is exposed.
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@Oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a7dc961
  2. 17 Jan, 2017 1 commit
  3. 27 Dec, 2016 1 commit
  4. 20 Dec, 2016 35 commits