- 12 Dec, 2015 3 commits
-
-
Lee Jones authored
The bootloader is charged with the responsibility to provide platform specific Dynamic Voltage and Frequency Scaling (DVFS) information via Device Tree. This driver takes the supplied configuration and registers it with the new generic OPP framework, to then be used with CPUFreq. Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Rafael J. Wysocki authored
-
Pi-Cheng Chen authored
Since the return value of ->init() of cpufreq driver is not propagated to the device driver model now, move resources allocation into ->probe() to handle -EPROBE_DEFER properly. Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 10 Dec, 2015 5 commits
-
-
Viresh Kumar authored
OPP bindings (for few properties) allow a platform to choose a value/range among a set of available options. The options are present as opp-<prop>-<name>, where the platform needs to supply the <name> string. The OPP properties which allow such an option are: opp-microvolt and opp-microamp. Add support to the OPP-core to parse these bindings, by introducing dev_pm_opp_{set|put}_prop_name() APIs. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Viresh Kumar authored
OPP bindings allow a platform to enable OPPs based on the version of the hardware they are used for. Add support to the OPP-core to parse these bindings, by introducing dev_pm_opp_{set|put}_supported_hw() APIs. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Philippe Longepe authored
In cases where we have many IOs, the global load becomes low and the load algorithm will decrease the requested P-State. Because of that, the IOs overheads will increase and impact the IO performances. To improve IO bound work, we can count the io-wait time as busy time in calculating CPU busy. This change uses get_cpu_iowait_time_us() to obtain the IO wait time value and converts time into number of cycles spent waiting on IO at the TSC rate. At the moment, this trick is only used for Atom. Signed-off-by: Philippe Longepe <philippe.longepe@intel.com> Signed-off-by: Stephane Gasparini <stephane.gasparini@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Philippe Longepe authored
The current function to calculate cpu utilization uses the average P-state ratio (APerf/Mperf) scaled by the ratio of the current P-state to the max available non-turbo one. This leads to an overestimation of utilization which causes higher-performance P-states to be selected more often and that leads to increased energy consumption. This is a problem for low-power systems, so it is better to use a different utilization calculation algorithm for them. Namely, the Percent Busy value (or load) can be estimated as the ratio of the MPERF counter that runs at a constant rate only during active periods (C0) to the time stamp counter (TSC) that also runs (at the same rate) during idle. That is: Percent Busy = 100 * (delta_mperf / delta_tsc) Use this algorithm for platforms with SoCs based on the Airmont and Silvermont Atom cores. Signed-off-by: Philippe Longepe <philippe.longepe@intel.com> Signed-off-by: Stephane Gasparini <stephane.gasparini@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Philippe Longepe authored
Target systems using different cpus have different power and performance requirements. They may use different algorithms to get the next P-state based on their power or performance preference. For example, power-constrained systems may not want to use high-performance P-states as aggressively as a full-size desktop or a server platform. A server platform may want to run close to the max to achieve better performance, while laptop-like systems may prefer sacrificing performance for longer battery lifes. For the above reasons, modify intel_pstate to allow the target P-state selection algorithm to be depend on the CPU ID. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Philippe Longepe <philippe.longepe@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 09 Dec, 2015 11 commits
-
-
Pi-Cheng Chen authored
Sometimes regulator_get_voltage() call returns negative values for reasons(e.g. underlying I2C bus timeout). Add check for the return values and fail out early. Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Pi-Cheng Chen authored
Remove redundant regulator_get_voltage() call to get Vsram value since it will be obtained later at the beginning of voltage tracking loop. Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Pi-Cheng Chen authored
Add CPUFREQ_HAVE_GOVERNOR_PER_POLICY to have individual set of tunables for each cluster of MT8173. Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Hongtao Jia authored
Register the qoriq cpufreq driver as a cooling device, based on the thermal device tree framework. When temperature crosses the passive trip point cpufreq is used to throttle CPUs. Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Jacob Tanenbaum authored
The cpufreq documentation specifies policy->cpuinfo.transition_latency the time it takes on this CPU to switch between two frequencies in nanoseconds (if appropriate, else specify CPUFREQ_ETERNAL) currently pcc-cpufreq does not expose the value and sets it to zero. I changed the pcc-cpufreq driver and it's documentation to conform to the default value specified in Documentation/cpu-freq/cpu-drivers.txt Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Punit Agrawal authored
Register passive cooling devices when initialising cpufreq on big.LITTLE systems. If the device tree provides a dynamic power coefficient for the CPUs then the bound cooling device will support the extensions that allow it to be used with all the existing thermal governors including the power allocator governor. A cooling device will be created per individual frequency domain and can be bound to thermal zones via the thermal DT bindings. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Punit Agrawal authored
Support registering cooling devices with dynamic power coefficient where provided by the device tree. This allows OF registered cooling devices driver to be used with the power_allocator thermal governor. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Punit Agrawal authored
The dynamic power consumption of a device is proportional to the square of voltage (V) and the clock frequency (f). It can be expressed as Pdyn = dynamic-power-coefficient * V^2 * f. The coefficient represents the running time dynamic power consumption in units of mw/MHz/uVolt^2 and can be used in the above formula to calculate the dynamic power in mW. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Rafael J. Wysocki authored
It is possible to get rid of the timer_lock spinlock used by the governor timer function for synchronization, but a couple of races need to be avoided. The first race is between multiple dbs_timer_handler() instances that may be running in parallel with each other on different CPUs. Namely, one of them has to queue up the work item, but it cannot be queued up more than once. To achieve that, atomic_inc_return() can be used on the skip_work field of struct cpu_common_dbs_info. The second race is between an already running dbs_timer_handler() and gov_cancel_work(). In that case the dbs_timer_handler() might not notice the skip_work incrementation in gov_cancel_work() and it might queue up its work item after gov_cancel_work() had returned (and that work item would corrupt skip_work going forward). To prevent that from happening, gov_cancel_work() can be made wait for the timer function to complete (on all CPUs) right after skip_work has been incremented. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
-
Viresh Kumar authored
Currently update_sampling_rate() runs over each online CPU and cancels/queues timers on all policy->cpus every time. This should be done just once for any cpu belonging to a policy. Create a cpumask and keep on clearing it as and when we process policies, so that we don't have to traverse through all CPUs of the same policy. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Viresh Kumar authored
cpufreq governors evaluate load at sampling rate and based on that they update frequency for a group of CPUs belonging to the same cpufreq policy. This is required to be done in a single thread for all policy->cpus, but because we don't want to wakeup idle CPUs to do just that, we use deferrable work for this. If we would have used a single delayed deferrable work for the entire policy, there were chances that the CPU required to run the handler can be in idle and we might end up not changing the frequency for the entire group with load variations. And so we were forced to keep per-cpu works, and only the one that expires first need to do the real work and others are rescheduled for next sampling time. We have been using the more complex solution until now, where we used a delayed deferrable work for this, which is a combination of a timer and a work. This could be made lightweight by keeping per-cpu deferred timers with a single work item, which is scheduled by the first timer that expires. This patch does just that and here are important changes: - The timer handler will run in irq context and so we need to use a spin_lock instead of the timer_mutex. And so a separate timer_lock is created. This also makes the use of the mutex and lock quite clear, as we know what exactly they are protecting. - A new field 'skip_work' is added to track when the timer handlers can queue a work. More comments present in code. Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 07 Dec, 2015 4 commits
-
-
Viresh Kumar authored
timer_mutex is required to be initialized only while memory for 'shared' is allocated and in a similar way it is required to be destroyed only when memory for 'shared' is freed. There is no need to do the same every time we start/stop the governor. Move code to initialize/destroy timer_mutex to the relevant places. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Viresh Kumar authored
Pass 'policy' as argument to ->gov_dbs_timer() instead of cdbs and dbs_data. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Viresh Kumar authored
We are guaranteed to have works scheduled for policy->cpus, as the policy isn't stopped yet. And so there is no need to check that again. Drop it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Viresh Kumar authored
We are comparing policy->governor against cpufreq_gov_ondemand to make sure that we update sampling rate only for the concerned CPUs. But that isn't enough. In case of governor_per_policy, there can be multiple instances of ondemand governor and we will always end up updating all of them with current code. What we rather need to do, is to compare dbs_data with poilcy->governor_data, which will match only for the policies governed by dbs_data. This code is also racy as the governor might be getting stopped at that time and we may end up scheduling work for a policy, which we have just disabled. Fix that by protecting the entire function with &od_dbs_cdata.mutex, which will prevent against races with policy START/STOP/etc. After these locks are in place, we can safely get the policy via per-cpu dbs_info. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 06 Dec, 2015 10 commits
-
-
Linus Torvalds authored
-
James Simmons authored
The ioctl IOC_LIBCFS_PING_TEST has not been used in ages. The recent nidstring changes which moved all the nidstring operations from libcfs to the LNet layer but this ioctl code was still using an nidstring operation that was causing a circular dependency loop between libcfs and LNet. Signed-off-by: James Simmons <jsimmons@infradead.org> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull vfs fixes from Al Viro: "A couple of fixes (-stable fodder) + dead code removal after the overlayfs fix. I agree that it's better to separate from the fix part to make backporting easier, but IMO it's not worth delaying said dead code removal until the next window" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Don't reset ->total_link_count on nested calls of vfs_path_lookup() ovl: get rid of the dead code left from broken (and disabled) optimizations ovl: fix permission checking for setattr
-
Al Viro authored
we already zero it on outermost set_nameidata(), so initialization in path_init() is pointless and wrong. The same DoS exists on pre-4.2 kernels, but there a slightly different fix will be needed. Cc: stable@vger.kernel.org # v4.2 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Miklos Szeredi authored
[Al Viro] The bug is in being too enthusiastic about optimizing ->setattr() away - instead of "copy verbatim with metadata" + "chmod/chown/utimes" (with the former being always safe and the latter failing in case of insufficient permissions) it tries to combine these two. Note that copyup itself will have to do ->setattr() anyway; _that_ is where the elevated capabilities are right. Having these two ->setattr() (one to set verbatim copy of metadata, another to do what overlayfs ->setattr() had been asked to do in the first place) combined is where it breaks. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fixes from Thomas Gleixner: "This updates contains the following changes: - Fix a signal handling regression in the bit wait functions. - Avoid false positive warnings in the wakeup path. - Initialize the scheduler root domain properly. - Handle gtime calculations in proc/$PID/stat proper. - Add more documentation for the barriers in try_to_wake_up(). - Fix a subtle race in try_to_wake_up() which might cause a task to be scheduled on two cpus - Compile static helper function only when it is used" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix an SMP ordering race in try_to_wake_up() vs. schedule() sched/core: Better document the try_to_wake_up() barriers sched/cputime: Fix invalid gtime in proc sched/core: Clear the root_domain cpumasks in init_rootdomain() sched/core: Remove false-positive warning from wake_up_process() sched/wait: Fix signal handling in bit wait helpers sched/rt: Hide the push_irq_work_func() declaration
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Thoma Gleixner: "Another round of fixes for x86: - Move the initialization of the microcode driver to late_initcall to make sure everything that init function needs is available. - Make sure that lockdep knows about interrupts being off in the entry code before calling into c-code. - Undo the cpu hotplug init delay regression. - Use the proper conditionals in the mpx instruction decoder. - Fixup restart_syscall for x32 tasks. - Fix the hugepage regression on PAE kernels which was introduced with the latest PAT changes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/signal: Fix restart_syscall number for x32 tasks x86/mpx: Fix instruction decoder condition x86/mm: Fix regression with huge pages on PAE x86 smpboot: Re-enable init_udelay=0 by default on modern CPUs x86/entry/64: Fix irqflag tracing wrt context tracking x86/microcode: Initialize the driver late when facilities are up
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "This is quite a bumper crop of fixes: three from Arnd correcting various build issues in some configurations, a lock recursion in qla2xxx. Two potentially exploitable issues in hpsa and mvsas, a potential null deref in st, a revert of a bdi registration fix that turned out to cause even more problems, a set of fixes to allow people who only defined MPT2SAS to still work after the mpt2/mpt3sas merger and a couple of fixes for issues turned up by the hyper-v storvsc driver" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: mpt3sas: fix Kconfig dependency problem for mpt2sas back compatibility Revert "scsi: Fix a bdi reregistration race" mpt3sas: Add dummy Kconfig option for backwards compatibility Fix a memory leak in scsi_host_dev_release() block/sd: Fix device-imposed transfer length limits scsi_debug: fix prevent_allow+verify regressions MAINTAINERS: Add myself as co-maintainer of the SCSI subsystem. sd: Make discard granularity match logical block size when LBPRZ=1 scsi: hpsa: select CONFIG_SCSI_SAS_ATTR scsi: advansys needs ISA dma api for ISA support scsi_sysfs: protect against double execution of __scsi_remove_device() st: fix potential null pointer dereference. scsi: report 'INQUIRY result too short' once per host advansys: fix big-endian builds qla2xxx: Fix rwlock recursion hpsa: logical vs bitwise AND typo mvsas: don't allow negative timeouts mpt3sas: Fix use sas_is_tlr_enabled API before enabling MPI2_SCSIIO_CONTROL_TLR_ON flag
-
git://people.freedesktop.org/~airlied/linuxLinus Torvalds authored
Pull drm fixes from Dave Airlie: "A bunch of change across the board, the main things are some vblank fallout in radeon and nouveau required some work, but I think this should fix it all. There is also one drm fix for an oops in vmwgfx with how we pass the drm master around. The rest is just some amdgpu, i915, imx and rockchip fixes. Probably more than I'd like at this point, but hopefully things settle down now" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (40 commits) drm/amdgpu: Fixup hw vblank counter/ts for new drm_update_vblank_count() (v3) drm/radeon: Fixup hw vblank counter/ts for new drm_update_vblank_count() (v2) drm/radeon: Retry DDC probing on DVI on failure if we got an HPD interrupt drm/amdgpu: add spin lock to protect freed list in vm (v2) drm/amdgpu: partially revert "drm/amdgpu: fix VM_CONTEXT*_PAGE_TABLE_END_ADDR" v2 drm/amdgpu: take a BO reference for the user fence drm/amdgpu: take a BO reference in the display code drm/amdgpu: set snooped flags only on system addresses v2 drm/nouveau: Fix pre-nv50 pageflip events (v4) drm: Fix an unwanted master inheritance v2 drm/amdgpu: fix race condition in amd_sched_entity_push_job drm/amdgpu: add err check for pin userptr drm/i915: take a power domain reference while checking the HDMI live status drm/i915: add MISSING_CASE to a few port/aux power domain helpers drm/i915/ddi: fix intel_display_port_aux_power_domain() after HDMI detect drm/i915: Introduce a gmbus power domain drm/i915: Clean up AUX power domain handling drm/rockchip: Use CRTC vblank event interface drm/rockchip: Fix module autoload for OF platform driver drm/rockchip: vop: fix window origin calculation ...
-
- 05 Dec, 2015 4 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds authored
Pull crypto fixes from Herbert Xu: "This fixes a couple of crypto drivers that were using memcmp to verify authentication tags. They now use crypto_memneq instead" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: talitos - Fix timing leak in ESP ICV verification crypto: nx - Fix timing leak in GCM and CCM decryption
-
Dmitry V. Levin authored
When restarting a syscall with regs->ax == -ERESTART_RESTARTBLOCK, regs->ax is assigned to a restart_syscall number. For x32 tasks, this syscall number must have __X32_SYSCALL_BIT set, otherwise it will be an x86_64 syscall number instead of a valid x32 syscall number. This issue has been there since the introduction of x32. Reported-by: strace/tests/restart_syscall.test Reported-and-tested-by: Elvira Khabirova <lineprinter0@gmail.com> Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Cc: Elvira Khabirova <lineprinter0@gmail.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20151130215436.GA25996@altlinux.orgSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Dave Hansen authored
MPX decodes instructions in order to tell which bounds register was violated. Part of this decoding involves looking at the "REX prefix" which is a special instrucion prefix used to retrofit support for new registers in to old instructions. The X86_REX_*() macros are defined to return actual bit values: #define X86_REX_R(rex) ((rex) & 4) *not* boolean values. However, the MPX code was checking for them like they were booleans. This might have led to us mis-decoding the "REX prefix" and giving false information out to userspace about bounds violations. X86_REX_B() actually is bit 1, so this is really only broken for the X86_REX_X() case. Fix the conditionals up to tolerate the non-boolean values. Fixes: fcc7ffd6 "x86, mpx: Decode MPX instruction to get bound violation information" Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20151201003113.D800C1E0@viggo.jf.intel.comSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
-
git://people.freedesktop.org/~agd5f/linuxDave Airlie authored
A few more last minute fixes for 4.4 on top of my pull request from earlier this week. The big change here is a vblank regression fix due to commit 4dfd6486 "drm: Use vblank timestamps to guesstimate how many vblanks were missed". Beyond that, a hotplug fix and a few VM fixes. * 'drm-fixes-4.4' of git://people.freedesktop.org/~agd5f/linux: drm/amdgpu: Fixup hw vblank counter/ts for new drm_update_vblank_count() (v3) drm/radeon: Fixup hw vblank counter/ts for new drm_update_vblank_count() (v2) drm/radeon: Retry DDC probing on DVI on failure if we got an HPD interrupt drm/amdgpu: add spin lock to protect freed list in vm (v2) drm/amdgpu: partially revert "drm/amdgpu: fix VM_CONTEXT*_PAGE_TABLE_END_ADDR" v2 drm/amdgpu: take a BO reference for the user fence drm/amdgpu: take a BO reference in the display code drm/amdgpu: set snooped flags only on system addresses v2 drm/amdgpu: fix race condition in amd_sched_entity_push_job drm/amdgpu: add err check for pin userptr add blacklist for thinkpad T40p drm/amdgpu: fix VM page table reference counting drm/amdgpu: fix userptr flags check
-
- 04 Dec, 2015 3 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds authored
Pull Ceph fix from Sage Weil: "This addresses a refcounting bug that leads to a use-after-free" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: don't put snap_context twice in rbd_queue_workfn()
-
Alex Deucher authored
commit 4dfd6486 "drm: Use vblank timestamps to guesstimate how many vblanks were missed" introduced in Linux 4.4-rc1 makes the drm core more fragile to drivers which don't update hw vblank counters and vblank timestamps in sync with firing of the vblank irq and essentially at leading edge of vblank. This exposed a problem with radeon-kms/amdgpu-kms which do not satisfy above requirements: The vblank irq fires a few scanlines before start of vblank, but programmed pageflips complete at start of vblank and vblank timestamps update at start of vblank, whereas the hw vblank counter increments only later, at start of vsync. This leads to problems like off by one errors for vblank counter updates, vblank counters apparently going backwards or vblank timestamps apparently having time going backwards. The net result is stuttering of graphics in games, or little hangs, as well as total failure of timing sensitive applications. See bug #93147 for an example of the regression on Linux 4.4-rc: https://bugs.freedesktop.org/show_bug.cgi?id=93147 This patch tries to align all above events better from the viewpoint of the drm core / of external callers to fix the problem: 1. The apparent start of vblank is shifted a few scanlines earlier, so the vblank irq now always happens after start of this extended vblank interval and thereby drm_update_vblank_count() always samples the updated vblank count and timestamp of the new vblank interval. To achieve this, the reporting of scanout positions by radeon_get_crtc_scanoutpos() now operates as if the vblank starts radeon_crtc->lb_vblank_lead_lines before the real start of the hw vblank interval. This means that the vblank timestamps which are based on these scanout positions will now update at this earlier start of vblank. 2. The driver->get_vblank_counter() function will bump the returned vblank count as read from the hw by +1 if the query happens after the shifted earlier start of the vblank, but before the real hw increment at start of vsync, so the counter appears to increment at start of vblank in sync with the timestamp update. 3. Calls from vblank irq-context and regular non-irq calls are now treated identical, always simulating the shifted vblank start, to avoid inconsistent results for queries happening from vblank irq vs. happening from drm_vblank_enable() or vblank_disable_fn(). 4. The radeon_flip_work_func will delay mmio programming a pageflip until the start of the real vblank iff it happens to execute inside the shifted earlier start of the vblank, so pageflips now also appear to execute at start of the shifted vblank, in sync with vblank counter and timestamp updates. This to avoid some races between updates of vblank count and timestamps that are used for swap scheduling and pageflip execution which could cause pageflips to execute before the scheduled target vblank. The lb_vblank_lead_lines "fudge" value is calculated as the size of the display controllers line buffer in scanlines for the given video mode: Vblank irq's are triggered by the line buffer logic when the line buffer refill for a video frame ends, ie. when the line buffer source read position enters the hw vblank. This means that a vblank irq could fire at most as many scanlines before the current reported scanout position of the crtc timing generator as the number of scanlines the line buffer can maximally hold for a given video mode. This patch has been successfully tested on a RV730 card with DCE-3 display engine and on a evergreen card with DCE-4 display engine, in single-display and dual-display configuration, with different video modes. A similar patch is needed for amdgpu-kms to fix the same problem. Limitations: - Maybe replace the udelay() in the flip_work_func() by a suitable usleep_range() for a bit better efficiency? Will try that. - Line buffer sizes in pixels are hard-coded on < DCE-4 to a value i just guessed to be high enough to work ok, lacking info on the true sizes atm. Probably fixes: fdo#93147 Port of Mario's radeon fix to amdgpu. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (v1) Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com> (v2) Refine amdgpu_flip_work_func() for better efficiency. In amdgpu_flip_work_func, replace the busy waiting udelay(5) with event lock held by a more performance and energy efficient usleep_range() until at least predicted true start of hw vblank, with some slack for scheduler happiness. Release the event lock during waits to not delay other outputs in doing their stuff, as the waiting can last up to 200 usecs in some cases. Also small fix to code comment and formatting in that function. (v2) Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> (v3) Fix crash in crtc disabled case
-
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds authored
Pull libnvdimm fixes from Dan Williams: - NFIT parsing regression fixes from Linda. The nvdimm hot-add implementation merged in 4.4-rc1 interpreted the specification in a way that breaks actual HPE platforms. We are also closing the loop with the ACPI Working Group to get this clarification added to the spec. - Andy pointed out that his laptop without nvdimm resources is loading the e820-nvdimm module by default, fix that up to only load the module when an e820-type-12 range is present. * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nfit: Adjust for different _FIT and NFIT headers nfit: Fix the check for a successful NFIT merge nfit: Account for table size length variation libnvdimm, e820: skip module loading when no type-12
-