1. 13 Oct, 2015 3 commits
  2. 10 Oct, 2015 1 commit
  3. 09 Oct, 2015 8 commits
    • Matt Roper's avatar
      drm/i915: Partial revert of atomic watermark series · 261a27d1
      Matt Roper authored
      It's been reported that the atomic watermark series triggers some
      regressions on SKL, which we haven't been able to track down yet.  Let's
      temporarily revert these patches while we track down the root cause.
      
      This commit squashes the reverts of:
        76305b1a drm/i915: Calculate watermark configuration during atomic check (v2)
        a4611e44 drm/i915: Don't set plane visible during HW readout if CRTC is off
        a28170f3 drm/i915: Calculate ILK-style watermarks during atomic check (v3)
        de4a9f83 drm/i915: Calculate pipe watermarks into CRTC state (v3)
        de165e0b drm/i915: Refactor ilk_update_wm (v3)
      
      Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-October/077190.html
      Cc: "Zanoni, Paulo R" <paulo.r.zanoni@intel.com>
      Cc: "Vetter, Daniel" <daniel.vetter@intel.com>
      Signed-off-by: default avatarMatt Roper <matthew.d.roper@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      261a27d1
    • Tomas Elf's avatar
      drm/i915: Early exit from semaphore_waits_for for execlist mode. · 381e8ae3
      Tomas Elf authored
      When submitting semaphores in execlist mode the hang checker crashes in this
      function because it is only runnable in ring submission mode. The reason this
      is of particular interest to the TDR patch series is because we use semaphores
      as a mean to induce hangs during testing (which is the recommended way to
      induce hangs for gen8+). It's not clear how this is supposed to work in
      execlist mode since:
      
      1. This function requires a ring buffer.
      
      2. Retrieving a ring buffer in execlist mode requires us to retrieve the
      corresponding context, which we get from a request.
      
      3. Retieving a request from the hang checker is not straight-forward since that
      requires us to grab the struct_mutex in order to synchronize against the
      request retirement thread.
      
      4. Grabbing the struct_mutex from the hang checker is nothing that we will do
      since that puts us at risk of deadlock since a hung thread might be holding the
      struct_mutex already.
      
      Therefore it's not obvious how we're supposed to deal with this. For now, we're
      doing an early exit from this function, which avoids any kernel panic situation
      when running our own internal TDR ULT.
      
      * v2: (Chris Wilson)
      Turned the execlist mode check into a ringbuffer NULL check to make it more
      submission mode agnostic and less of a layering violation.
      Signed-off-by: default avatarTomas Elf <tomas.elf@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      381e8ae3
    • Tvrtko Ursulin's avatar
      drm/i915: Remove wrong warning from i915_gem_context_clean · 61fb5881
      Tvrtko Ursulin authored
      commit e9f24d5f
      Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Date:   Mon Oct 5 13:26:36 2015 +0100
      
          drm/i915: Clean up associated VMAs on context destruction
      
      Introduced a wrong assumption that all contexts have a ppgtt
      instance. This is not true when full PPGTT is not active so
      remove the WARN_ON_ONCE from the context cleanup code.
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Michel Thierry <michel.thierry@intel.com>
      Reviewed-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      61fb5881
    • Ville Syrjälä's avatar
      drm/i915: Determine the stolen memory base address on gen2 · 0ad98c74
      Ville Syrjälä authored
      There isn't an explicit stolen memory base register on gen2.
      Some old comment in the i915 code suggests we should get it via
      max_low_pfn_mapped, but that's clearly a bad idea on my MGM.
      
      The e820 map in said machine looks like this:
      [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009f7ff] usable
      [    0.000000] BIOS-e820: [mem 0x000000000009f800-0x000000000009ffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000000ce000-0x00000000000cffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000000dc000-0x00000000000fffff] reserved
      [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000001f6effff] usable
      [    0.000000] BIOS-e820: [mem 0x000000001f6f0000-0x000000001f6f7fff] ACPI data
      [    0.000000] BIOS-e820: [mem 0x000000001f6f8000-0x000000001f6fffff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x000000001f700000-0x000000001fffffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fec10000-0x00000000fec1ffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000ffb00000-0x00000000ffbfffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
      
      That makes max_low_pfn_mapped = 1f6f0000, so assuming our stolen memory
      would start there would place it on top of some ACPI memory regions.
      So not a good idea as already stated.
      
      The 9MB region after the ACPI regions at 0x1f700000 however looks
      promising given that the macine reports the stolen memory size to be
      8MB. Looking at the PGTBL_CTL register, the GTT entries are at offset
      0x1fee00000, and given that the GTT entries occupy 128KB, it looks like
      the stolen memory could start at 0x1f700000 and the GTT entries would
      occupy the last 128KB of the stolen memory.
      
      After some more digging through chipset documentation, I've determined
      the BIOS first allocates space for something called TSEG (something to
      do with SMM) from the top of memory, and then it allocates the graphics
      stolen memory below that. Accordind to the chipset documentation TSEG
      has a fixed size of 1MB on 855. So that explains the top 1MB in the
      e820 region. And it also confirms that the GTT entries are in fact at
      the end of the the stolen memory region.
      
      Derive the stolen memory base address on gen2 the same as the BIOS does
      (TOM-TSEG_SIZE-stolen_size). There are a few differences between the
      registers on various gen2 chipsets, so a few different codepaths are
      required.
      
      865G is again bit more special since it seems to support enough memory
      to hit 4GB address space issues. This means the PCI allocations will
      also affect the location of the stolen memory. Fortunately there
      appears to be the TOUD register which may give us the correct answer
      directly. But the chipset docs are a bit unclear, so I'm not 100%
      sure that the graphics stolen memory is always the last thing the
      BIOS steals. Someone would need to verify it on a real system.
      
      I tested this on the my 830 and 855 machines, and so far everything
      looks peachy.
      
      v2: Rewrite to use the TOM-TSEG_SIZE-stolen_size and TOUD methods
      v3: Fix TSEG size for 830
      v4: Add missing 'else' (Chris)
      Tested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      0ad98c74
    • Paulo Zanoni's avatar
      drm/i915: fix FBC buffer size checks · 856312ae
      Paulo Zanoni authored
      According to my experiments (and later confirmation from the hardware
      developers), the maximum sizes mentioned in the specification delimit
      how far in the buffer the hardware tracking can go. And the hardware
      calculates the size based on the plane address we provide - and the
      provided plane address might not be the real x:0,y:0 point due to the
      compute_page_offset() function.
      
      On platforms that do the x/y offset adjustment trick it will be really
      hard to reproduce a bug, but on the current SKL we can reproduce the
      bug with igt/kms_frontbuffer_tracking/fbc-farfromfence. With this
      patch, we'll go from "CRC assertion failure" to "FBC unexpectedly
      disabled", which is still a failure on the test suite but is not a
      perceived user bug - you will just not save as much power as you could
      if FBC is disabled.
      
      v2, rewrite patch after clarification from the Hadware guys:
        - Rename function so it's clear what the check is for.
        - Use the new intel_fbc_get_plane_source_sizes() function in order
          to get the proper sizes as seen by FBC.
      v3:
        - Rebase after the s/sizes/size/ on the previous patch.
        - Adjust comment wording (Ville).
        - s/used_/effective_/ (Ville).
      
      Testcase: igt/kms_frontbuffer_tracking/fbc-farfromfence (SKL)
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarPaulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      856312ae
    • Paulo Zanoni's avatar
      drm/i915: fix CFB size calculation · c4ffd409
      Paulo Zanoni authored
      We were considering the whole framebuffer height, but the spec says we
      should only consider the active display height size. There were still
      some unclear questions based on the spec, but the hardware guys
      clarified them for us. According to them:
      
      - CFB size = CFB stride * Number of lines FBC writes to CFB
      - CFB stride = plane stride / compression limit
      - Number of lines FBC writes to CFB = MIN(plane source height, maximum
        number of lines FBC writes to CFB)
      - Plane source height =
        - pipe source height (PIPE_SRCSZ register) (before SKL)
        - plane size register height (PLANE_SIZE register) (SKL+)
      - Maximum number of lines FBC writes to CFB =
        - plane source height (before HSW)
        - 2048 (HSW+)
      
      For the plane source height, I could just have made our code do
      I915_READ() in order to be more future proof, but since it's not cool
      to do register reads I decided to just recalculate the values we use
      when we actually write to those registers.
      
      With this patch, depending on your machine configuration, a lot of the
      kms_frontbuffer_tracking subtests that used to result in a SKIP due to
      not enough stolen memory still start resulting in a PASS.
      
      v2: Use the clipped src size instead of pipe_src_h (Ville).
      v3: Use the appropriate information provided by the hardware guys.
      v4: Bikesheds: s/sizes/size/, s/fb_cpp/cpp/ (Ville).
      v5: - Don't use crtc->config->pipe_src_x for BDW- (Ville).
          - Fix the register name written in the comment.
      Signed-off-by: default avatarPaulo Zanoni <paulo.r.zanoni@intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      c4ffd409
    • Paulo Zanoni's avatar
      drm/i915: remove pre-atomic check from SKL update_primary_plane · a42e5a23
      Paulo Zanoni authored
      The comment suggests the check was there for some non-fully-atomic
      case, and I couldn't find a case where we wouldn't correctly
      initialize plane_state, so remove the check.
      
      Let's leave a WARN there just in case.
      Signed-off-by: default avatarPaulo Zanoni <paulo.r.zanoni@intel.com>
      Acked-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      a42e5a23
    • Paulo Zanoni's avatar
      drm/i915: don't allocate fbcon from stolen memory if it's too big · 3badb49f
      Paulo Zanoni authored
      Technology has evolved and now we have eDP panels with 3200x1800
      resolution. In the meantime, the BIOS guys didn't change the default
      32mb for stolen memory. On top of that, we can't assume our users will
      be able to increase the default stolen memory size to more than 32mb -
      I'm not even sure all BIOSes allow that.
      
      So just the fbcon buffer alone eats 22mb of my stolen memroy, and due
      to the BDW/SKL restriction of not using the last 8mb of stolen memory,
      all that's left for FBC is 2mb! Since fbcon is not the coolest feature
      ever, I think it's better to save our precious stolen resource to FBC
      and the other guys.
      
      On the other hand, we really want to use as much stolen memory as
      possible, since on some older systems the stolen memory may be a
      considerable percentage of the total available memory.
      
      This patch tries to achieve a little balance using a simple heuristic:
      if the fbcon wants more than half of the available stolen memory,
      don't use stolen memory in order to leave some for FBC and the other
      features.
      
      The long term plan should be to implement a way to set priorities for
      stolen memory allocation and then evict low priority users when the
      high priority ones need the memory. While we still don't have that,
      let's try to make FBC usable with the simple solution.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarPaulo Zanoni <paulo.r.zanoni@intel.com>
      Reviewed-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      3badb49f
  4. 08 Oct, 2015 5 commits
  5. 07 Oct, 2015 16 commits
  6. 06 Oct, 2015 7 commits
    • Chris Wilson's avatar
      drm/i915: Use a task to cancel the userptr on invalidate_range · 380996aa
      Chris Wilson authored
      Whilst discussing possible ways to trigger an invalidate_range on a
      userptr with an aliased GGTT mmapping (and so cause a struct_mutex
      deadlock), the conclusion is that we can, and we must, prevent any
      possible deadlock by avoiding taking the mutex at all during
      invalidate_range. This has numerous advantages all of which stem from
      avoid the sleeping function from inside the unknown context. In
      particular, it simplifies the invalidate_range because we no longer
      have to juggle the spinlock/mutex and can just hold the spinlock
      for the entire walk. To compensate, we have to make get_pages a bit more
      complicated in order to serialise with a pending cancel_userptr worker.
      As we hold the struct_mutex, we have no choice but to return EAGAIN and
      hope that the worker is then flushed before we retry after reacquiring
      the struct_mutex.
      
      The important caveat is that the invalidate_range itself is no longer
      synchronous. There exists a small but definite period in time in which
      the old PTE's page remain accessible via the GPU. Note however that the
      physical pages themselves are not invalidated by the mmu_notifier, just
      the CPU view of the address space. The impact should be limited to a
      delay in pages being flushed, rather than a possibility of writing to
      the wrong pages. The only race condition that this worsens is remapping
      an userptr active on the GPU where fresh work may still reference the
      old pages due to struct_mutex contention. Given that userspace is racing
      with the GPU, it is fair to say that the results are undefined.
      
      v2: Only queue (and importantly only take one refcnt) the worker once.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      380996aa
    • Chris Wilson's avatar
      drm/i915: Fix userptr deadlock with aliased GTT mmappings · e4b946bf
      Chris Wilson authored
      Michał Winiarski found a really evil way to trigger a struct_mutex
      deadlock with userptr. He found that if he allocated a userptr bo and
      then GTT mmaped another bo, or even itself, at the same address as the
      userptr using MAP_FIXED, he could then cause a deadlock any time we then
      had to invalidate the GTT mmappings (so at will). Tvrtko then found by
      repeatedly allocating GTT mmappings he could alias with an old userptr
      mmap and also trigger the deadlock.
      
      To counter act the deadlock, we make the observation that we only need
      to take the struct_mutex if the object has any pages to revoke, and that
      before userspace can alias with the userptr address space, it must have
      invalidated the userptr->pages. Thus if we can check for those pages
      outside of the struct_mutex, we can avoid the deadlock. To do so we
      introduce a separate flag for userptr objects that we can inspect from
      the mmu-notifier underneath its spinlock.
      
      The patch makes one eye-catching change. That is the removal serial=0
      after detecting a to-be-freed object inside the invalidate walker. I
      felt setting serial=0 was a questionable pessimisation: it denies us the
      chance to reuse the current iterator for the next loop (before it is
      freed) and being explicit makes the reader question the validity of the
      locking (since the object-free race could occur elsewhere). The
      serialisation of the iterator is through the spinlock, if the object is
      freed before the next loop then the notifier.serial will be incremented
      and we start the walk from the beginning as we detect the invalid cache.
      
      To try and tame the error paths and interactions with the userptr->active
      flag, we have to do a fair amount of rearranging of get_pages_userptr().
      
      v2: Grammar fixes
      v3: Reorder set-active so that it is only set when obj->pages is set
      (and so needs cancellation). Only the order of setting obj->pages and
      the active-flag is crucial. Calling gup after invalidate-range begin
      means the userptr sees the new set of backing storage (and so will not
      need to invalidate its new pages), but we have to be careful not to set
      the active-flag prior to successfully establishing obj->pages.
      v4: Take the active->flag early so we know in the mmu-notifier when we
      have to cancel a pending gup-worker.
      v5: Rearrange the error path so that is not so convoluted
      v6: Set pinned to 0 when negative before calling release_pages()
      Reported-by: default avatarMichał Winiarski <michal.winiarski@intel.com>
      Testcase: igt/gem_userptr_blits/map-fixed*
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      e4b946bf
    • Chris Wilson's avatar
      drm/i915: Only update the current userptr worker · 68d6c840
      Chris Wilson authored
      The userptr worker allows for a slight race condition where upon there
      may two or more threads calling get_user_pages for the same object. When
      we have the array of pages, then we serialise the update of the object.
      However, the worker should only overwrite the obj->userptr.work pointer
      if and only if it is the active one. Currently we clear it for a
      secondary worker with the effect that we may rarely force a second
      lookup.
      
      v2: Rebase and rename a variable to avoid 80cols
      v3: Mention v2
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      68d6c840
    • Michel Thierry's avatar
      drm/i915: prevent out of range pt in the PDE macros (take 3) · 24dfd073
      Michel Thierry authored
      We tried to fix this in commit fdc454c1 ("drm/i915: Prevent out of
      range pt in gen6_for_each_pde").
      
      But the static analyzer still complains that, just before we break due
      to "iter < I915_PDES", we do "pt = (pd)->page_table[iter]" with an
      iter value that is bigger than I915_PDES. Of course, this isn't really
      a problem since no one uses pt outside the macro. Still, every single
      new usage of the macro will create a new issue for us to mark as a
      false positive.
      
      Also, Paulo re-started the discussion a while ago [1], but didn't end up
      implemented.
      
      In order to "solve" this "problem", this patch takes the ideas from
      Chris and Dave, but that check would change the desired behavior of the
      code, because the object (for example pdp->page_directory[iter]) can be
      null during init/alloc, and C would take this as false, breaking the for
      loop immediately.
      
      This has been already verified with "static analysis tools".
      
      [1]http://lists.freedesktop.org/archives/intel-gfx/2015-June/068548.html
      
      v2: Make it a single statement, while preventing the common subexpression
      elimination (Chris)
      
      Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dave Gordon <david.s.gordon@intel.com>
      Signed-off-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      24dfd073
    • Tvrtko Ursulin's avatar
      drm/i915: Clean up associated VMAs on context destruction · e9f24d5f
      Tvrtko Ursulin authored
      Prevent leaking VMAs and PPGTT VMs when objects are imported
      via flink.
      
      Scenario is that any VMAs created by the importer will be left
      dangling after the importer exits, or destroys the PPGTT context
      with which they are associated.
      
      This is caused by object destruction not running when the
      importer closes the buffer object handle due the reference held
      by the exporter. This also leaks the VM since the VMA has a
      reference on it.
      
      In practice these leaks can be observed by stopping and starting
      the X server on a kernel with fbcon compiled in. Every time
      X server exits another VMA will be leaked against the fbcon's
      frame buffer object.
      
      Also on systems where flink buffer sharing is used extensively,
      like Android, this leak has even more serious consequences.
      
      This version is takes a general approach from the  earlier work
      by Rafael Barbalho (drm/i915: Clean-up PPGTT on context
      destruction) and tries to incorporate the subsequent discussion
      between Chris Wilson and Daniel Vetter.
      
      v2:
      
      Removed immediate cleanup on object retire - it was causing a
      recursive VMA unbind via i915_gem_object_wait_rendering. And
      it is in fact not even needed since by definition context
      cleanup worker runs only after the last context reference has
      been dropped, hence all VMAs against the VM belonging to the
      context are already on the inactive list.
      
      v3:
      
      Previous version could deadlock since VMA unbind waits on any
      rendering on an object to complete. Objects can be busy in a
      different VM which would mean that the cleanup loop would do
      the wait with the struct mutex held.
      
      This is an even simpler approach where we just unbind VMAs
      without waiting since we know all VMAs belonging to this VM
      are idle, and there is nothing in flight, at the point
      context destructor runs.
      
      v4:
      
      Double underscore prefix for __915_vma_unbind_no_wait and a
      commit message typo fix. (Michel Thierry)
      
      Note that this is just a partial/interim fix since we have a bit a
      fundamental issue with cleaning up, e.g.
      
      https://bugs.freedesktop.org/show_bug.cgi?id=87729Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Testcase: igt/gem_ppgtt.c/flink-and-exit-vma-leak
      Reviewed-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Rafael Barbalho <rafael.barbalho@intel.com>
      Cc: Michel Thierry <michel.thierry@intel.com>
      [danvet: Add a note that this isn't everything.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      e9f24d5f
    • kbuild test robot's avatar
    • Ander Conselvan de Oliveira's avatar
      drm/i915: Rename DP link training functions · 2493f21f
      Ander Conselvan de Oliveira authored
      The link training functions had confusing names. The start function
      actually does the clock recovery phase of the link training, and the
      complete function does the channel equalization. So call them that
      instead. Also, every call to intel_dp_start_link_train() was followed
      by a call to intel_dp_complete_link_train(), so add a new start
      function that calls clock_recory and channel_equalization.
      Signed-off-by: default avatarAnder Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      2493f21f