1. 19 Jul, 2018 30 commits
    • Pavel Tatashin's avatar
      x86/tsc: Split native_calibrate_cpu() into early and late parts · 03821f45
      Pavel Tatashin authored
      During early boot TSC and CPU frequency can be calibrated using MSR, CPUID,
      and quick PIT calibration methods. The other methods PIT/HPET/PMTIMER are
      available only after ACPI is initialized.
      
      Split native_calibrate_cpu() into early and late parts so they can be
      called separately during early and late tsc calibration.
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-26-pasha.tatashin@oracle.com
      03821f45
    • Pavel Tatashin's avatar
      sched/clock: Use static key for sched_clock_running · 46457ea4
      Pavel Tatashin authored
      sched_clock_running may be read every time sched_clock_cpu() is called.
      Yet, this variable is updated only twice during boot, and never changes
      again, therefore it is better to make it a static key.
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-25-pasha.tatashin@oracle.com
      46457ea4
    • Pavel Tatashin's avatar
      sched/clock: Enable sched clock early · 857baa87
      Pavel Tatashin authored
      Allow sched_clock() to be used before schec_clock_init() is called.  This
      provides a way to get early boot timestamps on machines with unstable
      clocks.
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-24-pasha.tatashin@oracle.com
      857baa87
    • Pavel Tatashin's avatar
      sched/clock: Move sched clock initialization and merge with generic clock · 5d2a4e91
      Pavel Tatashin authored
      sched_clock_postinit() initializes a generic clock on systems where no
      other clock is provided. This function may be called only after
      timekeeping_init().
      
      Rename sched_clock_postinit to generic_clock_inti() and call it from
      sched_clock_init(). Move the call for sched_clock_init() until after
      time_init().
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-23-pasha.tatashin@oracle.com
      5d2a4e91
    • Pavel Tatashin's avatar
      x86/tsc: Use TSC as sched clock early · 4763f03d
      Pavel Tatashin authored
      All prerequesites for enabling TSC as sched clock early in the boot
      process are available now:
      
       - Early attempt of TSC calibration
      
       - Early availablity of static branch patching
      
      If TSC frequency can be established in the early calibration, enable the
      static key which switches sched clock to use TSC.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-22-pasha.tatashin@oracle.com
      4763f03d
    • Pavel Tatashin's avatar
      x86/tsc: Initialize cyc2ns when tsc frequency is determined · e2a9ca29
      Pavel Tatashin authored
      cyc2ns converts tsc to nanoseconds, and it is handled in a per-cpu data
      structure.
      
      Currently, the setup code for c2ns data for every possible CPU goes through
      the same sequence of calculations as for the boot CPU, but is based on the
      same tsc frequency as the boot CPU, and thus this is not necessary.
      
      Initialize the boot cpu when tsc frequency is determined. Copy the
      calculated data from the boot CPU to the other CPUs in tsc_init().
      
      In addition do the following:
      
       - Remove unnecessary zeroing of c2ns data by removing cyc2ns_data_init()
      
       - Split set_cyc2ns_scale() into two functions, so set_cyc2ns_scale() can be
         called when system is up, and wraps around __set_cyc2ns_scale() that can
         be called directly when system is booting but avoids saving restoring
         IRQs and going and waking up from idle.
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-21-pasha.tatashin@oracle.com
      e2a9ca29
    • Pavel Tatashin's avatar
      x86/tsc: Calibrate tsc only once · cf7a63ef
      Pavel Tatashin authored
      During boot tsc is calibrated twice: once in tsc_early_delay_calibrate(),
      and the second time in tsc_init().
      
      Rename tsc_early_delay_calibrate() to tsc_early_init(), and rework it so
      the calibration is done only early, and make tsc_init() to use the values
      already determined in tsc_early_init().
      
      Sometimes it is not possible to determine tsc early, as the subsystem that
      is required is not yet initialized, in such case try again later in
      tsc_init().
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-20-pasha.tatashin@oracle.com
      cf7a63ef
    • Pavel Tatashin's avatar
      ARM/time: Remove read_boot_clock64() · 227e3958
      Pavel Tatashin authored
      read_boot_clock64() is deleted, and replaced with
      read_persistent_wall_and_boot_offset().
      
      The default implementation of read_persistent_wall_and_boot_offset()
      provides a better fallback than the current stubs for read_boot_clock64()
      that arm has with no users, so remove the old code.
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-19-pasha.tatashin@oracle.com
      227e3958
    • Pavel Tatashin's avatar
      s390/time: Remove read_boot_clock64() · 00067a6d
      Pavel Tatashin authored
      read_boot_clock64() was replaced by read_persistent_wall_and_boot_offset()
      so remove it.
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-18-pasha.tatashin@oracle.com
      00067a6d
    • Pavel Tatashin's avatar
      timekeeping: Default boot time offset to local_clock() · 4b1b7f80
      Pavel Tatashin authored
      read_persistent_wall_and_boot_offset() is called during boot to read
      both the persistent clock and also return the offset between the boot time
      and the value of persistent clock.
      
      Change the default boot_offset from zero to local_clock() so architectures,
      that do not have a dedicated boot_clock but have early sched_clock(), such
      as SPARCv9, x86, and possibly more will benefit from this change by getting
      a better and more consistent estimate of the boot time without need for an
      arch specific implementation.
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-17-pasha.tatashin@oracle.com
      4b1b7f80
    • Pavel Tatashin's avatar
      timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset() · 3eca9937
      Pavel Tatashin authored
      If architecture does not support exact boot time, it is challenging to
      estimate boot time without having a reference to the current persistent
      clock value. Yet, it cannot read the persistent clock time again, because
      this may lead to math discrepancies with the caller of read_boot_clock64()
      who have read the persistent clock at a different time.
      
      This is why it is better to provide two values simultaneously: the
      persistent clock value, and the boot time.
      
      Replace read_boot_clock64() with:
      read_persistent_wall_and_boot_offset(wall_time, boot_offset)
      
      Where wall_time is returned by read_persistent_clock() And boot_offset is
      wall_time - boot time, which defaults to 0.
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-16-pasha.tatashin@oracle.com
      3eca9937
    • Pavel Tatashin's avatar
      s390/time: Add read_persistent_wall_and_boot_offset() · be2e0e42
      Pavel Tatashin authored
      read_persistent_wall_and_boot_offset() will replace read_boot_clock64()
      because on some architectures it is more convenient to read both sources
      as one may depend on the other. For s390, implementation is the same
      as read_boot_clock64() but also calling and returning value of
      read_persistent_clock64()
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-15-pasha.tatashin@oracle.com
      be2e0e42
    • Pavel Tatashin's avatar
      x86/xen/time: Output xen sched_clock time from 0 · 38669ba2
      Pavel Tatashin authored
      It is expected for sched_clock() to output data from 0, when system boots.
      
      Add an offset xen_sched_clock_offset (similarly how it is done in other
      hypervisors i.e. kvm_sched_clock_offset) to count sched_clock() from 0,
      when time is first initialized.
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-14-pasha.tatashin@oracle.com
      38669ba2
    • Pavel Tatashin's avatar
      x86/xen/time: Initialize pv xen time in init_hypervisor_platform() · 7b25b9cb
      Pavel Tatashin authored
      In every hypervisor except for xen pv time ops are initialized in
      init_hypervisor_platform().
      
      Xen PV domains initialize time ops in x86_init.paging.pagetable_init(),
      by calling xen_setup_shared_info() which is a poor design, as time is
      needed prior to memory allocator.
      
      xen_setup_shared_info() is called from two places: during boot, and
      after suspend. Split the content of xen_setup_shared_info() into
      three places:
      
      1. add the clock relavent data into new xen pv init_platform vector, and
         set clock ops in there.
      
      2. move xen_setup_vcpu_info_placement() to new xen_pv_guest_late_init()
         call.
      
      3. Re-initializing parts of shared info copy to xen_pv_post_suspend() to
         be symmetric to xen_pv_pre_suspend
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-13-pasha.tatashin@oracle.com
      7b25b9cb
    • Pavel Tatashin's avatar
      x86/tsc: Redefine notsc to behave as tsc=unstable · fe9af81e
      Pavel Tatashin authored
      Currently, the notsc kernel parameter disables the use of the TSC by
      sched_clock(). However, this parameter does not prevent the kernel from
      accessing tsc in other places.
      
      The only rationale to boot with notsc is to avoid timing discrepancies on
      multi-socket systems where TSC are not properly synchronized, and thus
      exclude TSC from being used for time keeping. But that prevents using TSC
      as sched_clock() as well, which is not necessary as the core sched_clock()
      implementation can handle non synchronized TSC based sched clocks just
      fine.
      
      However, there is another method to solve the above problem: booting with
      tsc=unstable parameter. This parameter allows sched_clock() to use TSC and
      just excludes it from timekeeping.
      
      So there is no real reason to keep notsc, but for compatibility reasons the
      parameter has to stay. Make it behave like 'tsc=unstable' instead.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarDou Liyang <douly.fnst@cn.fujitsu.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-12-pasha.tatashin@oracle.com
      fe9af81e
    • Borislav Petkov's avatar
      x86/CPU: Call detect_nopl() only on the BSP · 9b3661cd
      Borislav Petkov authored
      Make it use the setup_* variants and have it be called only on the BSP and
      drop the call in generic_identify() - X86_FEATURE_NOPL will be replicated
      to the APs through the forced caps. Helps to keep the mess at a manageable
      level.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-11-pasha.tatashin@oracle.com
      9b3661cd
    • Pavel Tatashin's avatar
      x86/jump_label: Initialize static branching early · 8990cac6
      Pavel Tatashin authored
      Static branching is useful to runtime patch branches that are used in hot
      path, but are infrequently changed.
      
      The x86 clock framework is one example that uses static branches to setup
      the best clock during boot and never changes it again.
      
      It is desired to enable the TSC based sched clock early to allow fine
      grained boot time analysis early on. That requires the static branching
      functionality to be functional early as well.
      
      Static branching requires patching nop instructions, thus,
      arch_init_ideal_nops() must be called prior to jump_label_init().
      
      Do all the necessary steps to call arch_init_ideal_nops() right after
      early_cpu_init(), which also allows to insert a call to jump_label_init()
      right after that. jump_label_init() will be called again from the generic
      init code, but the code is protected against reinitialization already.
      
      [ tglx: Massaged changelog ]
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-10-pasha.tatashin@oracle.com
      8990cac6
    • Pavel Tatashin's avatar
      x86/alternatives, jumplabel: Use text_poke_early() before mm_init() · 6fffacb3
      Pavel Tatashin authored
      It supposed to be safe to modify static branches after jump_label_init().
      But, because static key modifying code eventually calls text_poke() it can
      end up accessing a struct page which has not been initialized yet.
      
      Here is how to quickly reproduce the problem. Insert code like this
      into init/main.c:
      
      | +static DEFINE_STATIC_KEY_FALSE(__test);
      | asmlinkage __visible void __init start_kernel(void)
      | {
      |        char *command_line;
      |@@ -587,6 +609,10 @@ asmlinkage __visible void __init start_kernel(void)
      |        vfs_caches_init_early();
      |        sort_main_extable();
      |        trap_init();
      |+       {
      |+       static_branch_enable(&__test);
      |+       WARN_ON(!static_branch_likely(&__test));
      |+       }
      |        mm_init();
      
      The following warnings show-up:
      WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:701 text_poke+0x20d/0x230
      RIP: 0010:text_poke+0x20d/0x230
      Call Trace:
       ? text_poke_bp+0x50/0xda
       ? arch_jump_label_transform+0x89/0xe0
       ? __jump_label_update+0x78/0xb0
       ? static_key_enable_cpuslocked+0x4d/0x80
       ? static_key_enable+0x11/0x20
       ? start_kernel+0x23e/0x4c8
       ? secondary_startup_64+0xa5/0xb0
      
      ---[ end trace abdc99c031b8a90a ]---
      
      If the code above is moved after mm_init(), no warning is shown, as struct
      pages are initialized during handover from memblock.
      
      Use text_poke_early() in static branching until early boot IRQs are enabled
      and from there switch to text_poke. Also, ensure text_poke() is never
      invoked when unitialized memory access may happen by using adding a
      !after_bootmem assertion.
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-9-pasha.tatashin@oracle.com
      6fffacb3
    • Thomas Gleixner's avatar
      x86/kvmclock: Switch kvmclock data to a PER_CPU variable · 95a3d445
      Thomas Gleixner authored
      The previous removal of the memblock dependency from kvmclock introduced a
      static data array sized 64bytes * CONFIG_NR_CPUS. That's wasteful on large
      systems when kvmclock is not used.
      
      Replace it with:
      
       - A static page sized array of pvclock data. It's page sized because the
         pvclock data of the boot cpu is mapped into the VDSO so otherwise random
         other data would be exposed to the vDSO
      
       - A PER_CPU variable of pvclock data pointers. This is used to access the
         pcvlock data storage on each CPU.
      
      The setup is done in two stages:
      
       - Early boot stores the pointer to the static page for the boot CPU in
         the per cpu data.
      
       - In the preparatory stage of CPU hotplug assign either an element of
         the static array (when the CPU number is in that range) or allocate
         memory and initialize the per cpu pointer.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-8-pasha.tatashin@oracle.com
      95a3d445
    • Thomas Gleixner's avatar
      x86/kvmclock: Move kvmclock vsyscall param and init to kvmclock · e499a9b6
      Thomas Gleixner authored
      There is no point to have this in the kvm code itself and call it from
      there. This can be called from an initcall and the parameter is cleared
      when the hypervisor is not KVM.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-7-pasha.tatashin@oracle.com
      e499a9b6
    • Thomas Gleixner's avatar
      x86/kvmclock: Mark variables __initdata and __ro_after_init · 42f8df93
      Thomas Gleixner authored
      The kvmclock parameter is init data and the other variables are not
      modified after init.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-6-pasha.tatashin@oracle.com
      42f8df93
    • Thomas Gleixner's avatar
      x86/kvmclock: Cleanup the code · 146c394d
      Thomas Gleixner authored
      - Cleanup the mrs write for wall clock. The type casts to (int) are sloppy
        because the wrmsr parameters are u32 and aside of that wrmsrl() already
        provides the high/low split for free.
      
      - Remove the pointless get_cpu()/put_cpu() dance from various
        functions. Either they are called during early init where CPU is
        guaranteed to be 0 or they are already called from non preemptible
        context where smp_processor_id() can be used safely
      
      - Simplify the convoluted check for kvmclock in the init function.
      
      - Mark the parameter parsing function __init. No point in keeping it
        around.
      
      - Convert to pr_info()
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-5-pasha.tatashin@oracle.com
      146c394d
    • Thomas Gleixner's avatar
      x86/kvmclock: Decrapify kvm_register_clock() · 7a5ddc8f
      Thomas Gleixner authored
      The return value is pointless because the wrmsr cannot fail if
      KVM_FEATURE_CLOCKSOURCE or KVM_FEATURE_CLOCKSOURCE2 are set.
      
      kvm_register_clock() is only called locally so wants to be static.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-4-pasha.tatashin@oracle.com
      7a5ddc8f
    • Thomas Gleixner's avatar
      x86/kvmclock: Remove page size requirement from wall_clock · 7ef363a3
      Thomas Gleixner authored
      There is no requirement for wall_clock data to be page aligned or page
      sized.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-3-pasha.tatashin@oracle.com
      7ef363a3
    • Pavel Tatashin's avatar
      x86/kvmclock: Remove memblock dependency · 368a540e
      Pavel Tatashin authored
      KVM clock is initialized later compared to other hypervisor clocks because
      it has a dependency on the memblock allocator.
      
      Bring it in line with other hypervisors by using memory from the BSS
      instead of allocating it.
      
      The benefits:
      
        - Remove ifdef from common code
        - Earlier availability of the clock
        - Remove dependency on memblock, and reduce code
      
      The downside:
      
        - Static allocation of the per cpu data structures sized NR_CPUS * 64byte
          Will be addressed in follow up patches.
      
      [ tglx: Split out from larger series ]
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-2-pasha.tatashin@oracle.com
      368a540e
    • Thomas Gleixner's avatar
      Merge branch 'linus' into x86/timers · 73ab603f
      Thomas Gleixner authored
      Pick up upstream changes to avoid conflicts
      73ab603f
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.18-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · fb7d1bcf
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
      
       - Fix crashes that happen when PHY drivers are left disabled in the V3
         Semiconductor, MediaTek, Faraday, Aardvark, DesignWare, Versatile,
         and X-Gene host controller drivers (Sergei Shtylyov)
      
       - Fix a NULL pointer dereference in the endpoint library configfs
         support (Kishon Vijay Abraham I)
      
       - Fix a race condition in Hyper-V IRQ handling (Dexuan Cui)
      
      * tag 'pci-v4.18-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: v3-semi: Fix I/O space page leak
        PCI: mediatek: Fix I/O space page leak
        PCI: faraday: Fix I/O space page leak
        PCI: aardvark: Fix I/O space page leak
        PCI: designware: Fix I/O space page leak
        PCI: versatile: Fix I/O space page leak
        PCI: xgene: Fix I/O space page leak
        PCI: OF: Fix I/O space page leak
        PCI: endpoint: Fix NULL pointer dereference error when CONFIGFS is disabled
        PCI: hv: Disable/enable IRQs rather than BH in hv_compose_msi_msg()
      fb7d1bcf
    • Linus Torvalds's avatar
      Merge tag 'sound-4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · f39f28ff
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A rawmidi race fix and three trivial HD-audio quirks"
      
      * tag 'sound-4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda/realtek - Yet another Clevo P950 quirk entry
        ALSA: rawmidi: Change resized buffers atomically
        ALSA: hda/realtek - Add Panasonic CF-SZ6 headset jack quirk
        ALSA: hda: add mute led support for HP ProBook 455 G5
      f39f28ff
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · b4394c34
      Linus Torvalds authored
      Pull crypto fix from Herbert Xu:
       "This fixes an allocation error-path bug in af_alg discovered by
        syzkaller"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: af_alg - Initialize sg_num_bytes in error code path
      b4394c34
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 024ddc0c
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Lots of fixes, here goes:
      
         1) NULL deref in qtnfmac, from Gustavo A. R. Silva.
      
         2) Kernel oops when fw download fails in rtlwifi, from Ping-Ke Shih.
      
         3) Lost completion messages in AF_XDP, from Magnus Karlsson.
      
         4) Correct bogus self-assignment in rhashtable, from Rishabh
            Bhatnagar.
      
         5) Fix regression in ipv6 route append handling, from David Ahern.
      
         6) Fix masking in __set_phy_supported(), from Heiner Kallweit.
      
         7) Missing module owner set in x_tables icmp, from Florian Westphal.
      
         8) liquidio's timeouts are HZ dependent, fix from Nicholas Mc Guire.
      
         9) Link setting fixes for sh_eth and ravb, from Vladimir Zapolskiy.
      
        10) Fix NULL deref when using chains in act_csum, from Davide Caratti.
      
        11) XDP_REDIRECT needs to check if the interface is up and whether the
            MTU is sufficient. From Toshiaki Makita.
      
        12) Net diag can do a double free when killing TCP_NEW_SYN_RECV
            connections, from Lorenzo Colitti.
      
        13) nf_defrag in ipv6 can unnecessarily hold onto dst entries for a
            full minute, delaying device unregister. From Eric Dumazet.
      
        14) Update MAC entries in the correct order in ixgbe, from Alexander
            Duyck.
      
        15) Don't leave partial mangles bpf program in jit_subprogs, from
            Daniel Borkmann.
      
        16) Fix pfmemalloc SKB state propagation, from Stefano Brivio.
      
        17) Fix ACK handling in DCTCP congestion control, from Yuchung Cheng.
      
        18) Use after free in tun XDP_TX, from Toshiaki Makita.
      
        19) Stale ipv6 header pointer in ipv6 gre code, from Prashant Bhole.
      
        20) Don't reuse remainder of RX page when XDP is set in mlx4, from
            Saeed Mahameed.
      
        21) Fix window probe handling of TCP rapair sockets, from Stefan
            Baranoff.
      
        22) Missing socket locking in smc_ioctl(), from Ursula Braun.
      
        23) IPV6_ILA needs DST_CACHE, from Arnd Bergmann.
      
        24) Spectre v1 fix in cxgb3, from Gustavo A. R. Silva.
      
        25) Two spots in ipv6 do a rol32() on a hash value but ignore the
            result. Fixes from Colin Ian King"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (176 commits)
        tcp: identify cryptic messages as TCP seq # bugs
        ptp: fix missing break in switch
        hv_netvsc: Fix napi reschedule while receive completion is busy
        MAINTAINERS: Drop inactive Vitaly Bordug's email
        net: cavium: Add fine-granular dependencies on PCI
        net: qca_spi: Fix log level if probe fails
        net: qca_spi: Make sure the QCA7000 reset is triggered
        net: qca_spi: Avoid packet drop during initial sync
        ipv6: fix useless rol32 call on hash
        ipv6: sr: fix useless rol32 call on hash
        net: sched: Using NULL instead of plain integer
        net: usb: asix: replace mii_nway_restart in resume path
        net: cxgb3_main: fix potential Spectre v1
        lib/rhashtable: consider param->min_size when setting initial table size
        net/smc: reset recv timeout after clc handshake
        net/smc: add error handling for get_user()
        net/smc: optimize consumer cursor updates
        net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.
        ipv6: ila: select CONFIG_DST_CACHE
        net: usb: rtl8150: demote allmulti message to dev_dbg()
        ...
      024ddc0c
  2. 18 Jul, 2018 10 commits