- 09 Aug, 2012 15 commits
-
-
Daniel Vetter authored
I have the faint hope that the total absence of any locking for the rps code wasn't too good an idea and could very well have caused some rc6 related regressions. Unfortunately we've never managed to reproduce these issues on any of our own machines, so the only way to go about this is to enable it and see what happens. While at it, kill some stale comments and improve the logging. Acked-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
We change the drps/ips sw/hw state from different callers: Our own irq handler, the external intel-ips module and from process context. Most of these callers don't take any lock at all. Protect everything by making the mchdev_lock irqsave and grabbing it in all relevant callsites. Note that we have to convert a few sleeps in the drps enable/disable code to delays, but alas, I'm not volunteering to restructure the code around a few work items. For paranoia add a spin_locked assert to ironlake_set_drps, too. v2: Move one access inside the lock protection. Caught by the dev_priv->ips mass-rename ... v3: Resolve rebase conflict. Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
Like all the other drps/ips stuff. Hence add the corresponding check, give the function a preciser prefix and move the single reg clearing into the rps handling function, too. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
It's only ever a pointer to the global mchdev_lock, and we don't use it at all. Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
This way it's easier so see what belongs together, and what is used by the ilk ips code. Also add some comments that explain the locking. Note that (cur|min|max)_delay need to be duplicated, because they're also used by the ips code. v2: Missed one place that the dev_priv->ips change caught ... Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
It's no fun if your shell hangs when the driver has gone on vacation and you want to know why ... Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
- Take the dev->struct_mutex around access the corresponding state (and adjusting the rps hw state). - Add an assert to gen6_set_rps to ensure we don't forget about this in the future. - Don't set up the min/max_freq files if it doesn't apply to the hw. And do the same for the gen6+ cache sharing file while at it. v2: Move the gen6+ checks into the read/write callbacks. Thanks to the awesome drm midlayer we can't check that when registering the debugfs files, because the driver is not yet fully set up, specifically the ->load callback hasn't run yet. Oh how I despise this disaster ... v3: Also add a WARN_ON(mutex_is_locked) in set_rps to check the locking. v4: Use mutex_lock_interruptible, suggested by Chris Wilson. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (for v2) Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
The update_gfx_val function called from mark_busy wasn't taking the mchdev_lock, as it should have. Also sprinkle a few spinlock asserts over the code to document things better. Things are still rather confusing, especially since a few variables in dev_priv are used by both the gen6+ rps code and the ilk ips code. But protected by totally different locks. Follow-on patches will clean that up. v2: Don't add a deadlock ... hence split up update_gfx_val into a wrapper that grabs the lock and an internal __ variant for callsites within intel_pm.c that already have taken the lock. v3: Mark the internal helper as static, noticed by Ben Widawsky. v4: Damien Lespiau had questions about the safety of the ips setup sequence, explain in a comment why it works. Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Paulo Zanoni authored
By looking at the current way we're using these definitions I don't think this commit will fix any bug, but programmers from the future are evil and will certainly find ways to combine macro expansion with operator precedence to introduce bugs that are hard to find. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Paulo Zanoni authored
It's the only part of the i915_reg.h file that looks totally wrongly indented, so I assume my editor config is the correct one. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Paulo Zanoni authored
Don't rely on previous values already set on the register. Everything we're not explicitly setting should be zero for now. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Paulo Zanoni authored
Correctly erase the values previously set and also check for 6bpc and 10bpc. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Paulo Zanoni authored
During my tests, everything worked even if the wrong polarity was set. Still, we should try to set the correct values. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Paulo Zanoni authored
Mask the value before changing it and also select DVI when needed. DVI was working in cases where the BIOS was setting the correct value because we were not masking the value before changing it. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
Handy for lazy people like me, or when people forget to add the output of lspci -nn. v2: Chris Wilson noticed that we have this duplicated already in the i915_capabilites debugfs file. But there \n as separator looks better, which would be a bit verbose in dmesg. Abuse the preprocessor to extract this all. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
- 08 Aug, 2012 3 commits
-
-
Daniel Vetter authored
In commit commit 20b46e59 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Jul 26 11:16:14 2012 +0200 drm/i915: Only set the down rps limit when at the loweset frequency The computation for the new desired frequency was extracted, but since the desired frequency was passed-by value, the adjustments didn't propgate back. Fix this. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Userspace tries to estimate the cost of ring switching based on whether the GPU and GEM supports semaphores. (If we have multiple rings and no semaphores, userspace assumes that the cost of switching rings between batches is exorbitant and will endeavour to keep the next batch on the active ring - as a coarse approximation to tracking both destination and source surfaces.) Currently userspace has to guess whether semaphores exist based on the chipset generation and the module parameter, i915.semaphores. This is a crude and inaccurate guess as the defaults internally depend upon other chipset features being enabled or disabled, nor does it extend well into the future. By exporting a HAS_SEMAPHORES parameter, we can easily query the driver and obtain an accurate answer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
The requirements for the sync flush to be emitted prior to the render cache flush is only true for SandyBridge. On IvyBridge and friends we can just emit the flushes with an inline CS stall. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
- 26 Jul, 2012 5 commits
-
-
Eugeni Dodonov authored
We should not hit this under any sane conditions, but still, this does not looks right. CC: Chris Wilson <chris@chris-wilson.co.uk> CC: Daniel Vetter <daniel.vetter@ffwll.ch> CC: stable@vger.kernel.org Reported-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Reviewed-by: Chris Wlison <chris@chris-wilson.co.uk> Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
We believe to have squashed all issues around the gen6+ rps interrupt generation and why the gpu sometimes got stuck. With that cleared up, there's no user left for the sanitize_pm infrastructure, so let's just rip it out. Note that 'intel_reg_write 0xa014 0x13070000' is the w/a if we find ourselves stuck again. Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
The power docs say that when the gt leaves rc6, it is in the lowest frequency and only about 25 usec later will switch to the frequency selected in GEN6_RPNSWREQ. If the downclock limit expires in that window and the down limit is set to the lowest possible frequency, the hw will not send the down interrupt. Which leads to a too high gpu clock and wasted power. Chris Wilson already worked on this with commit 7b9e0ae6 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Apr 28 08:56:39 2012 +0100 drm/i915: Always update RPS interrupts thresholds along with frequency but got the logic inverted: The current code set the down limit as long as we haven't reached it. Instead of only once with reached the lowest frequency. Note that we can't always set the downclock limit to 0, because otherwise the hw will keep on bugging us with downclock request irqs once the lowest level is reached. For similar reasons also always set the upclock limit, otherwise the hw might poke us again with interrupts. v2: Chris Wilson noticed that the limit reg is also computed in sanitize_pm. To avoid duplication, extract the code into a common function. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
By selecting the cache level (essentially whether or not the CPU snoops any updates to the bo, and on more recent machines whether it resides inside the CPU's last-level-cache) a userspace driver is able to then manage all of its memory within buffer objects, if it so desires. This enables the userspace driver to accelerate uploads and more importantly downloads from the GPU and to able to mix CPU and GPU rendering/activity efficiently. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Added code comment about where we plan to stuff platform specific cacheing control bits in the ioctl struct.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Several functions of the GPU have the restriction that differing memory domains cannot be placed next to each other (as the GPU may prefetch beyond the end of one domain and hang as it crosses into the other domain). We use the facility of the drm_mm to mark ranges with a particular color that corresponds to the cache attributes of those pages in order to prevent allocating adjacent blocks of differing memory types. v2: Rebase ontop of drm_mm coloring v2. v3: Fix rebinding existing gtt_space and add a verification routine. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
- 25 Jul, 2012 17 commits
-
-
Ben Widawsky authored
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ben Widawsky authored
Originally I had a macro specifically for DPF support, and Daniel, with good reason asked me to change it to this. It's not the way I would have gone (and indeed I didn't), but for now there is no distinction as all platforms with L3 also have DPF. Note: The good reasons are that dpf is a l3$ feature (at least on currrent hw), hence I don't expect one to go without the other. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> [danvet: added note] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Ben Widawsky authored
Basic context support on HSW is no different than previous generations. The size of the context object changes, but that's about it. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
As suggested by Daniel, rip out the independent timers for device and crtc busyness and integrate the manual powermanagement of the display engine into the GEM core and its request tracking. The benefits are that the code is a lot smaller, fewer moving parts and should fit more neatly into the overall activity tracking of the driver. v2: Complete overhaul and removal of the racy timers and workers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
By moving the function to intel_ringbuffer and currying the appropriate parameter, hopefully we make the callsites easier to read and understand. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Otherwise once we use the buffer with a BLT command on gen2/3, we will always regard future command submissions as continuing the fenced access. However, now that we flush/invalidate between every batch we can drop this pessimism. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
Now that we unconditionally flush and invalidate between every batch buffer, we no longer need the complex logic to decide which domains require flushing. Remove it and rejoice. v2 (danvet): Keep around the flip waiting logic. It's gross and broken, I know, but we can't just kill that thing ... even if we just keep it around as a reminder that things are broken. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Rely instead on the insertion of the implicit flush before the seqno breadcrumb. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
As the flush is either performed explictly immediately after the execbuffer dispatch, or before the serialisation of last_fenced_seqno we can forgo the explict i915_gem_flush_ring(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
This is now handled by a global flag to ensure we emit a flush before the next serialisation point (if we failed to queue one previously). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
As we guarantee to emit a flush before emitting the breadcrumb or the next batchbuffer, there is no further need for the flushing list. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
As we always flush the GPU cache prior to emitting the breadcrumb, we no longer have to worry about the deferred flush causing the pending_gpu_write to be delayed. So we can instead utilize the known last_write_seqno to hopefully minimise the wait times. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
As we move to lazily clearing the GPU write domain only when the buffer becomes inactive, this leaves a window of opportunity for i915_gem_object_pin_to_display_plane() to detect a seemingly inconsistent value. This function is special as it tries to pipeline the operation to avoid the stall and so may not retires the buffer and we may not get the opportunity to clear the write domain. However, we know all is good, so drop the assertion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
Request preallocation was added to i915_add_request() in order to support the overlay. However, not all users care and can quite happily ignore the failure to allocate the request as they will simply repeat the request in the future. By pushing the allocation down into i915_add_request(), we can then remove some rather ugly error handling in the callers. v2: Nullify request->file_priv otherwise we chase a garbage pointer when retiring requests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
With the base addresses shifting around, this is easier to handle. Also move to the real reg offset on vlv. Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Daniel Vetter authored
Will be used more in the next patch. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Chris Wilson authored
The intention is to help select which engine to use for copies with interoperating clients - such as a GL client making a request to the X server to perform a SwapBuffers, which may require copying from the active GL back buffer to the X front buffer. We choose to report a mask of the active rings to future proof the interface against any changes which may allow for the object to reside upon multiple rings. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: bikeshed away the write ring mask and add the explanation Chris sent in a follow-up mail why we decided to use masks.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-