1. 04 Dec, 2016 1 commit
  2. 01 Dec, 2016 1 commit
    • Borislav Petkov's avatar
      EDAC, amd64: Improve amd64-specific printing macros · 5246c540
      Borislav Petkov authored
      Prefix the warn and error macros with the respective string so that
      callers don't have to say "Error" or "Warning". We save us string length
      this way in the actual calls.
      
      While at it, shorten the calls in reserve_mc_sibling_devs().
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
      5246c540
  3. 29 Nov, 2016 5 commits
  4. 28 Nov, 2016 4 commits
  5. 24 Nov, 2016 2 commits
  6. 23 Nov, 2016 3 commits
  7. 22 Nov, 2016 1 commit
    • Yazen Ghannam's avatar
      x86/mce/AMD: Add system physical address translation for AMD Fam17h · f5382de9
      Yazen Ghannam authored
      The Unified Memory Controllers (UMCs) on Fam17h log a normalized address
      in their MCA_ADDR registers. We need to convert that normalized address
      to a system physical address in order to support a few facilities:
      
      1) To offline poisoned pages in DRAM proactively in the deferred error
         handler.
      
      2) To print sysaddr and page info for DRAM ECC errors in EDAC.
      
      [ Boris: fixes/cleanups ontop:
      
        * hi_addr_offset = 0 - no need for that branch. Stick it all under the
          HiAddrOffsetEn case. It confines hi_addr_offset's declaration too.
      
        * Move variables to the innermost scope they're used at so that we save
          on stack and not blow it up immediately on function entry.
      
        * Do not modify *sys_addr prematurely - we want to not exit early and
          have modified *sys_addr some, which callers get to see. We either
          convert to a sys_addr or we don't do anything. And we signal that with
          the retval of the function.
      
        * Rename label out -> out_err - because it is the error path.
      
        * No need to pr_err of the conversion failed case: imagine a
          sparsely-populated machine with UMCs which don't have DIMMs. Callers
          should look at the retval instead and issue a printk only when really
          necessary. No need for useless info in dmesg.
      
        * s/temp_reg/tmp/ and other variable names shortening => shorter code.
      
        * Use BIT() everywhere.
      
        * Make error messages more informative.
      
        *  Small build fix for the !CONFIG_X86_MCE_AMD case.
      
        * ... and more minor cleanups.
      ]
      Signed-off-by: default avatarYazen Ghannam <Yazen.Ghannam@amd.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20161122111133.mjzpvzhf7o7yl2oa@pd.tnic
      [ Typo fixes. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f5382de9
  8. 21 Nov, 2016 5 commits
  9. 17 Nov, 2016 1 commit
    • Yanjiang Jin's avatar
      EDAC, mpc85xx: Implement remove method for the platform driver · 27bda205
      Yanjiang Jin authored
      If we execute the below steps without this patch:
      
        modprobe mpc85xx_edac [The first insmod, everything is well.]
        modprobe -r mpc85xx_edac
        modprobe mpc85xx_edac [insmod again, error happens.]
      
      We would get the error messages as below:
      
        BUG: recent printk recursion!
        Oops: Kernel access of bad area, sig: 11 [#48]
        Modules linked in: mpc85xx_edac edac_core softdog [last unloaded: mpc85xx_edac]
        CPU: 5 PID: 14773 Comm: modprobe Tainted: G D C 4.8.3-rt2
         .vsnprintf
         .vscnprintf
         .vprintk_emit
         .printk
         .edac_pci_add_device
         .mpc85xx_pci_err_probe
         .platform_drv_probe
         .driver_probe_device
         .__driver_attach
         .bus_for_each_dev
         .driver_attach
         .bus_add_driver
         .driver_register
         .__platform_register_drivers
         .mpc85xx_mc_init
         .do_one_initcall
         .do_init_module
         .load_module
         .SyS_finit_module
         system_call
      
      Address this by cleaning up properly when removing the platform driver.
      
      Tested on a T4240QDS board.
      Signed-off-by: default avatarYanjiang Jin <yanjiang.jin@windriver.com>
      Acked-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: york.sun@nxp.com
      Link: http://lkml.kernel.org/r/1479351380-17109-2-git-send-email-yanjiang.jin@windriver.com
      [ Boris: massage commit message. ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      27bda205
  10. 16 Nov, 2016 5 commits
  11. 15 Nov, 2016 1 commit
  12. 14 Nov, 2016 1 commit
  13. 11 Nov, 2016 2 commits
    • Yazen Ghannam's avatar
      x86/mce/AMD: Fix HWID_MCATYPE calculation by grouping arguments · 859af13a
      Yazen Ghannam authored
      The calculation of the hwid_mcatype value in get_smca_bank_info()
      became incorrect after applying the following commit:
      
        1ce9cd7f ("x86/RAS: Simplify SMCA HWID descriptor struct")
      
      This causes the function to not match a bank to its type.
      
      Disassembly of hwid_mcatype calculation after change:
      
            db:       8b 45 e0                mov    -0x20(%rbp),%eax
            de:       41 89 c4                mov    %eax,%r12d
            e1:       25 00 00 ff 0f          and    $0xfff0000,%eax
            e6:       41 c1 ec 10             shr    $0x10,%r12d
            ea:       41 09 c4                or     %eax,%r12d
      
      Disassembly of hwid_mcatype calculation in original code:
      
           286:       8b 45 d0                mov    -0x30(%rbp),%eax
           289:       41 89 c5                mov    %eax,%r13d
           28c:       c1 e8 10                shr    $0x10,%eax
           28f:       41 81 e5 ff 0f 00 00    and    $0xfff,%r13d
           296:       41 c1 e5 10             shl    $0x10,%r13d
           29a:       41 09 c5                or     %eax,%r13d
      
      Grouping the arguments to the HWID_MCATYPE() macro fixes the issue.
      
      ( Boris suggested adding parentheses in the macro. )
      Signed-off-by: default avatarYazen Ghannam <Yazen.Ghannam@amd.com>
      Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-edac@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      859af13a
    • Borislav Petkov's avatar
      x86/MCE: Correct TSC timestamping of error records · 54467353
      Borislav Petkov authored
      We did have logic in the MCE code which would TSC-timestamp an error
      record only when it is exact - i.e., when it wasn't detected by polling.
      This isn't the case anymore. So let's fix that:
      
      We have a valid TSC timestamp in the error record only when it has been
      a precise detection, i.e., either in the #MC handler or in one of the
      interrupt handlers (thresholding, deferred, ...).
      
      All other error records still have mce.time which contains the wall
      time in order to be able to place the error record in time at least
      approximately.
      
      Also, this fixes another bug where machine_check_poll() would clear
      mce.tsc unconditionally even if we requested precise MCP_TIMESTAMP
      logging.
      
      The proper fix would be to generate timestamp only when it has been
      requested and not always. But that would require a more thorough code
      audit of all mce_gather_info/mce_setup() users. Add a FIXME for now.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony <tony.luck@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: kernel test robot <xiaolong.ye@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: lkp@01.org
      Link: http://lkml.kernel.org/r/20161110131053.kybsijfs5venpjnf@pd.tnicSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      54467353
  14. 08 Nov, 2016 7 commits
  15. 05 Nov, 2016 1 commit