1. 19 Dec, 2015 6 commits
    • Hidehiro Kawai's avatar
      kexec: Fix race between panic() and crash_kexec() · 7bbee5ca
      Hidehiro Kawai authored
      Currently, panic() and crash_kexec() can be called at the same time.
      For example (x86 case):
      
      CPU 0:
        oops_end()
          crash_kexec()
            mutex_trylock() // acquired
              nmi_shootdown_cpus() // stop other CPUs
      
      CPU 1:
        panic()
          crash_kexec()
            mutex_trylock() // failed to acquire
          smp_send_stop() // stop other CPUs
          infinite loop
      
      If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
      fails.
      
      In another case:
      
      CPU 0:
        oops_end()
          crash_kexec()
            mutex_trylock() // acquired
              <NMI>
              io_check_error()
                panic()
                  crash_kexec()
                    mutex_trylock() // failed to acquire
                  infinite loop
      
      Clearly, this is an undesirable result.
      
      To fix this problem, this patch changes crash_kexec() to exclude others
      by using the panic_cpu atomic.
      Signed-off-by: default avatarHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: kexec@lists.infradead.org
      Cc: linux-doc@vger.kernel.org
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Minfei Huang <mnfhuang@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: x86-ml <x86@kernel.org>
      Link: http://lkml.kernel.org/r/20151210014630.25437.94161.stgit@softrsSigned-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      7bbee5ca
    • Hidehiro Kawai's avatar
      panic, x86: Allow CPUs to save registers even if looping in NMI context · 58c5661f
      Hidehiro Kawai authored
      Currently, kdump_nmi_shootdown_cpus(), a subroutine of crash_kexec(),
      sends an NMI IPI to CPUs which haven't called panic() to stop them,
      save their register information and do some cleanups for crash dumping.
      However, if such a CPU is infinitely looping in NMI context, we fail to
      save its register information into the crash dump.
      
      For example, this can happen when unknown NMIs are broadcast to all
      CPUs as follows:
      
        CPU 0                             CPU 1
        ===========================       ==========================
        receive an unknown NMI
        unknown_nmi_error()
          panic()                         receive an unknown NMI
            spin_trylock(&panic_lock)     unknown_nmi_error()
            crash_kexec()                   panic()
                                              spin_trylock(&panic_lock)
                                              panic_smp_self_stop()
                                                infinite loop
              kdump_nmi_shootdown_cpus()
                issue NMI IPI -----------> blocked until IRET
                                                infinite loop...
      
      Here, since CPU 1 is in NMI context, the second NMI from CPU 0 is
      blocked until CPU 1 executes IRET. However, CPU 1 never executes IRET,
      so the NMI is not handled and the callback function to save registers is
      never called.
      
      In practice, this can happen on some servers which broadcast NMIs to all
      CPUs when the NMI button is pushed.
      
      To save registers in this case, we need to:
      
        a) Return from NMI handler instead of looping infinitely
        or
        b) Call the callback function directly from the infinite loop
      
      Inherently, a) is risky because NMI is also used to prevent corrupted
      data from being propagated to devices.  So, we chose b).
      
      This patch does the following:
      
      1. Move the infinite looping of CPUs which haven't called panic() in NMI
         context (actually done by panic_smp_self_stop()) outside of panic() to
         enable us to refer pt_regs. Please note that panic_smp_self_stop() is
         still used for normal context.
      
      2. Call a callback of kdump_nmi_shootdown_cpus() directly to save
         registers and do some cleanups after setting waiting_for_crash_ipi which
         is used for counting down the number of CPUs which handled the callback
      Signed-off-by: default avatarHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Javi Merino <javi.merino@arm.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: kexec@lists.infradead.org
      Cc: linux-doc@vger.kernel.org
      Cc: lkml <linux-kernel@vger.kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Stefan Lippers-Hollmann <s.l-h@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Link: http://lkml.kernel.org/r/20151210014628.25437.75256.stgit@softrs
      [ Cleanup comments, fixup formatting. ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      58c5661f
    • Hidehiro Kawai's avatar
      panic, x86: Fix re-entrance problem due to panic on NMI · 1717f209
      Hidehiro Kawai authored
      If panic on NMI happens just after panic() on the same CPU, panic() is
      recursively called. Kernel stalls, as a result, after failing to acquire
      panic_lock.
      
      To avoid this problem, don't call panic() in NMI context if we've
      already entered panic().
      
      For that, introduce nmi_panic() macro to reduce code duplication. In
      the case of panic on NMI, don't return from NMI handlers if another CPU
      already panicked.
      Signed-off-by: default avatarHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Javi Merino <javi.merino@arm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: kexec@lists.infradead.org
      Cc: linux-doc@vger.kernel.org
      Cc: lkml <linux-kernel@vger.kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
      [ Cleanup comments, fixup formatting. ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      1717f209
    • Thomas Gleixner's avatar
      Merge branch 'linus' into x86/apic · d267b8d6
      Thomas Gleixner authored
      Pull in update changes so we can apply conflicting patches
      d267b8d6
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 1eab0e42
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix a potential regression introduced during the 4.3 cycle
        (generic power domains framework), a nasty bug that has been present
        forever (power capping RAPL driver), a build issue (Tegra cpufreq
        driver) and a minor ugliness introduced recently (intel_pstate).
      
        Specifics:
      
         - Fix a potential regression in the generic power domains framework
           introduced during the 4.3 development cycle that may lead to
           spurious failures of system suspend in certain situations (Ulf
           Hansson).
      
         - Fix a problem in the power capping RAPL (Running Average Power
           Limits) driver that causes it to initialize successfully on some
           systems where it is not supposed to do that which is due to an
           incorrect check in an initialization routine (Prarit Bhargava).
      
         - Fix a build problem in the cpufreq Tegra driver that depends on the
           regulator framework, but that dependency is not reflected in
           Kconfig (Arnd Bergmann).
      
         - Fix a recent mistake in the intel_pstate driver where a numeric
           constant is used directly instead of a symbol defined specifically
           for the case in question (Prarit Bhargava)"
      
      * tag 'pm+acpi-4.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        powercap / RAPL: fix BIOS lock check
        cpufreq: intel_pstate: Minor cleanup for FRAC_BITS
        cpufreq: tegra: add regulator dependency for T124
        PM / Domains: Allow runtime PM callbacks to be re-used during system PM
      1eab0e42
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 4fee35a3
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Three fixes this time, two in SES picked up by KASAN for various types
        of buffer overrun.  The first is a USB array which returns page 8
        whatever is asked for and causes us to overrun with incorrect data
        format assumptions and the second is an invalid iteration of page 10
        (the additional information page).
      
        The final fix is a reversion of a NULL deref fix which caused
        suspend/resume not to be called in pairs leading to incorrect device
        operation (Jens has queued a more proper fix for the problem in
        block)"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        ses: fix additional element traversal bug
        Revert "SCSI: Fix NULL pointer dereference in runtime PM"
        ses: Fix problems with simple enclosures
      4fee35a3
  2. 18 Dec, 2015 30 commits
  3. 17 Dec, 2015 4 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 73796d8b
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix uninitialized variable warnings in nfnetlink_queue, a lot of
          people reported this...  From Arnd Bergmann.
      
       2) Don't init mutex twice in i40e driver, from Jesse Brandeburg.
      
       3) Fix spurious EBUSY in rhashtable, from Herbert Xu.
      
       4) Missing DMA unmaps in mvpp2 driver, from Marcin Wojtas.
      
       5) Fix race with work structure access in pppoe driver causing
          corruptions, from Guillaume Nault.
      
       6) Fix OOPS due to sh_eth_rx() not checking whether netdev_alloc_skb()
          actually succeeded or not, from Sergei Shtylyov.
      
       7) Don't lose flags when settifn IFA_F_OPTIMISTIC in ipv6 code, from
          Bjørn Mork.
      
       8) VXLAN_HD_RCO defined incorrectly, fix from Jiri Benc.
      
       9) Fix clock source used for cookies in SCTP, from Marcelo Ricardo
          Leitner.
      
      10) aurora driver needs HAS_DMA dependency, from Geert Uytterhoeven.
      
      11) ndo_fill_metadata_dst op of vxlan has to handle ipv6 tunneling
          properly as well, from Jiri Benc.
      
      12) Handle request sockets properly in xfrm layer, from Eric Dumazet.
      
      13) Double stats update in ipv6 geneve transmit path, fix from Pravin B
          Shelar.
      
      14) sk->sk_policy[] needs RCU protection, and as a result
          xfrm_policy_destroy() needs to free policies using an RCU grace
          period, from Eric Dumazet.
      
      15) SCTP needs to clone ipv6 tx options in order to avoid use after
          free, from Eric Dumazet.
      
      16) Missing kbuild export if ila.h, from Stephen Hemminger.
      
      17) Missing mdiobus_alloc() return value checking in mdio-mux.c, from
          Tobias Klauser.
      
      18) Validate protocol value range in ->create() methods, from Hannes
          Frederic Sowa.
      
      19) Fix early socket demux races that result in illegal dst reuse, from
          Eric Dumazet.
      
      20) Validate socket address length in pptp code, from WANG Cong.
      
      21) skb_reorder_vlan_header() uses incorrect offset and can corrupt
          packets, from Vlad Yasevich.
      
      22) Fix memory leaks in nl80211 registry code, from Ola Olsson.
      
      23) Timeout loop count handing fixes in mISDN, xgbe, qlge, sfc, and
          qlcnic.  From Dan Carpenter.
      
      24) msg.msg_iocb needs to be cleared in recvfrom() otherwise, for
          example, AF_ALG will interpret it as an async call.  From Tadeusz
          Struk.
      
      25) inetpeer_set_addr_v4 forgets to initialize the 'vif' field, from
          Eric Dumazet.
      
      26) rhashtable enforces the minimum table size not early enough,
          breaking how we calculate the per-cpu lock allocations.  From
          Herbert Xu.
      
      27) Fix FCC port lockup in 82xx driver, from Martin Roth.
      
      28) FOU sockets need to be freed using RCU, from Hannes Frederic Sowa.
      
      29) Fix out-of-bounds access in __skb_complete_tx_timestamp() and
          sock_setsockopt() wrt.  timestamp handling.  From WANG Cong.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (117 commits)
        net: check both type and procotol for tcp sockets
        drivers: net: xgene: fix Tx flow control
        tcp: restore fastopen with no data in SYN packet
        af_unix: Revert 'lock_interruptible' in stream receive code
        fou: clean up socket with kfree_rcu
        82xx: FCC: Fixing a bug causing to FCC port lock-up
        gianfar: Don't enable RX Filer if not supported
        net: fix warnings in 'make htmldocs' by moving macro definition out of field declaration
        rhashtable: Fix walker list corruption
        rhashtable: Enforce minimum size on initial hash table
        inet: tcp: fix inetpeer_set_addr_v4()
        ipv6: automatically enable stable privacy mode if stable_secret set
        net: fix uninitialized variable issue
        bluetooth: Validate socket address length in sco_sock_bind().
        net_sched: make qdisc_tree_decrease_qlen() work for non mq
        ser_gigaset: remove unnecessary kfree() calls from release method
        ser_gigaset: fix deallocation of platform device structure
        ser_gigaset: turn nonsense checks into WARN_ON
        ser_gigaset: fix up NULL checks
        qlcnic: fix a timeout loop
        ...
      73796d8b
    • WANG Cong's avatar
      net: check both type and procotol for tcp sockets · ac5cc977
      WANG Cong authored
      Dmitry reported the following out-of-bound access:
      
      Call Trace:
       [<ffffffff816cec2e>] __asan_report_load4_noabort+0x3e/0x40
      mm/kasan/report.c:294
       [<ffffffff84affb14>] sock_setsockopt+0x1284/0x13d0 net/core/sock.c:880
       [<     inline     >] SYSC_setsockopt net/socket.c:1746
       [<ffffffff84aed7ee>] SyS_setsockopt+0x1fe/0x240 net/socket.c:1729
       [<ffffffff85c18c76>] entry_SYSCALL_64_fastpath+0x16/0x7a
      arch/x86/entry/entry_64.S:185
      
      This is because we mistake a raw socket as a tcp socket.
      We should check both sk->sk_type and sk->sk_protocol to ensure
      it is a tcp socket.
      
      Willem points out __skb_complete_tx_timestamp() needs to fix as well.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac5cc977
    • Iyappan Subramanian's avatar
      drivers: net: xgene: fix Tx flow control · 67894eec
      Iyappan Subramanian authored
      Currently the Tx flow control is based on reading the hardware state,
      which is not accurate since it may not reflect the descriptors that
      are not yet reached the memory.
      
      To accurately control the Tx flow, changing it to be software based.
      Signed-off-by: default avatarIyappan Subramanian <isubramanian@apm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67894eec
    • Eric Dumazet's avatar
      tcp: restore fastopen with no data in SYN packet · 07e100f9
      Eric Dumazet authored
      Yuchung tracked a regression caused by commit 57be5bda ("ip: convert
      tcp_sendmsg() to iov_iter primitives") for TCP Fast Open.
      
      Some Fast Open users do not actually add any data in the SYN packet.
      
      Fixes: 57be5bda ("ip: convert tcp_sendmsg() to iov_iter primitives")
      Reported-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07e100f9