1. 27 Oct, 2022 10 commits
    • Umesh Nerlige Ramappa's avatar
      drm/i915/perf: Determine gen12 oa ctx offset at runtime · a5c3a3cb
      Umesh Nerlige Ramappa authored
      Some SKUs of same gen12 platform may have different oactxctrl
      offsets. For gen12, determine oactxctrl offsets at runtime.
      
      v2: (Lionel)
      - Move MI definitions to intel_gpu_commands.h
      - Ensure __find_reg_in_lri does read past context image size
      
      v3: (Ashutosh)
      - Drop unnecessary use of double underscores
      - fix find_reg_in_lri
      - Return error if oa context offset is U32_MAX
      - Error out if oa_ctx_ctrl_offset does not find offset
      
      v4: (Ashutosh)
      - Warn on odd MI LRI_LEN
      - Remove unnecessary check for valid_oactxctrl_offset
      - Drop valid_oactxctrl_offset macro
      
      v5: Drop unrelated comment
      Signed-off-by: default avatarUmesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
      Reviewed-by: default avatarAshutosh Dixit <ashutosh.dixit@intel.com>
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-5-umesh.nerlige.ramappa@intel.com
      a5c3a3cb
    • Umesh Nerlige Ramappa's avatar
      drm/i915/perf: Fix noa wait predication for DG2 · 2d9da585
      Umesh Nerlige Ramappa authored
      Predication for batch buffer commands changed in XEHPSDV.
      MI_BATCH_BUFFER_START predicates based on MI_SET_PREDICATE_RESULT
      register. The MI_SET_PREDICATE_RESULT register can only be modified
      with MI_SET_PREDICATE command. When configured, the MI_SET_PREDICATE
      command sets MI_SET_PREDICATE_RESULT based on bit 0 of
      MI_PREDICATE_RESULT_2. Use this to configure predication in noa_wait.
      Signed-off-by: default avatarUmesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
      Reviewed-by: default avatarAshutosh Dixit <ashutosh.dixit@intel.com>
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-4-umesh.nerlige.ramappa@intel.com
      2d9da585
    • Umesh Nerlige Ramappa's avatar
      drm/i915/perf: Add 32-bit OAG and OAR formats for DG2 · 81d5f7d9
      Umesh Nerlige Ramappa authored
      Add new OA formats for DG2.
      
      MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18893
      
      v2:
      - Update commit title (Ashutosh)
      - Coding style fixes (Lionel)
      - 64 bit OA formats need UMD changes in GPUvis, drop for now and send in a
        separate series with UMD changes
      
      v3:
      - Update commit message to drop 64 bit related description
      Signed-off-by: default avatarUmesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
      Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #1
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-3-umesh.nerlige.ramappa@intel.com
      81d5f7d9
    • Umesh Nerlige Ramappa's avatar
      drm/i915/perf: Fix OA filtering logic for GuC mode · 682aa437
      Umesh Nerlige Ramappa authored
      With GuC mode of submission, GuC is in control of defining the context
      id field that is part of the OA reports. To filter reports, UMD and KMD
      must know what sw context id was chosen by GuC. There is not interface
      between KMD and GuC to determine this, so read the upper-dword of
      EXECLIST_STATUS to filter/squash OA reports for the specific context.
      
      v2: Explain guc id stealing w.r.t OA use case
      Signed-off-by: default avatarUmesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
      Reviewed-by: default avatarAshutosh Dixit <ashutosh.dixit@intel.com>
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-2-umesh.nerlige.ramappa@intel.com
      682aa437
    • Nathan Chancellor's avatar
      drm/i915: Fix CFI violations in gt_sysfs · a8a4f046
      Nathan Chancellor authored
      When booting with CONFIG_CFI_CLANG, there are numerous violations when
      accessing the files under
      /sys/devices/pci0000:00/0000:00:02.0/drm/card0/gt/gt0:
      
        $ cd /sys/devices/pci0000:00/0000:00:02.0/drm/card0/gt/gt0
      
        $ grep . *
        id:0
        punit_req_freq_mhz:350
        rc6_enable:1
        rc6_residency_ms:214934
        rps_act_freq_mhz:1300
        rps_boost_freq_mhz:1300
        rps_cur_freq_mhz:350
        rps_max_freq_mhz:1300
        rps_min_freq_mhz:350
        rps_RP0_freq_mhz:1300
        rps_RP1_freq_mhz:350
        rps_RPn_freq_mhz:350
        throttle_reason_pl1:0
        throttle_reason_pl2:0
        throttle_reason_pl4:0
        throttle_reason_prochot:0
        throttle_reason_ratl:0
        throttle_reason_status:0
        throttle_reason_thermal:0
        throttle_reason_vr_tdc:0
        throttle_reason_vr_thermalert:0
      
        $ sudo dmesg &| grep "CFI failure at"
        [  214.595903] CFI failure at kobj_attr_show+0x19/0x30 (target: id_show+0x0/0x70 [i915]; expected type: 0xc527b809)
        [  214.596064] CFI failure at kobj_attr_show+0x19/0x30 (target: punit_req_freq_mhz_show+0x0/0x40 [i915]; expected type: 0xc527b809)
        [  214.596407] CFI failure at kobj_attr_show+0x19/0x30 (target: rc6_enable_show+0x0/0x40 [i915]; expected type: 0xc527b809)
        [  214.596528] CFI failure at kobj_attr_show+0x19/0x30 (target: rc6_residency_ms_show+0x0/0x270 [i915]; expected type: 0xc527b809)
        [  214.596682] CFI failure at kobj_attr_show+0x19/0x30 (target: act_freq_mhz_show+0x0/0xe0 [i915]; expected type: 0xc527b809)
        [  214.596792] CFI failure at kobj_attr_show+0x19/0x30 (target: boost_freq_mhz_show+0x0/0xe0 [i915]; expected type: 0xc527b809)
        [  214.596893] CFI failure at kobj_attr_show+0x19/0x30 (target: cur_freq_mhz_show+0x0/0xe0 [i915]; expected type: 0xc527b809)
        [  214.596996] CFI failure at kobj_attr_show+0x19/0x30 (target: max_freq_mhz_show+0x0/0xe0 [i915]; expected type: 0xc527b809)
        [  214.597099] CFI failure at kobj_attr_show+0x19/0x30 (target: min_freq_mhz_show+0x0/0xe0 [i915]; expected type: 0xc527b809)
        [  214.597198] CFI failure at kobj_attr_show+0x19/0x30 (target: RP0_freq_mhz_show+0x0/0xe0 [i915]; expected type: 0xc527b809)
        [  214.597301] CFI failure at kobj_attr_show+0x19/0x30 (target: RP1_freq_mhz_show+0x0/0xe0 [i915]; expected type: 0xc527b809)
        [  214.597405] CFI failure at kobj_attr_show+0x19/0x30 (target: RPn_freq_mhz_show+0x0/0xe0 [i915]; expected type: 0xc527b809)
        [  214.597538] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
        [  214.597701] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
        [  214.597836] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
        [  214.597952] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
        [  214.598071] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
        [  214.598177] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
        [  214.598307] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
        [  214.598439] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
        [  214.598542] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
      
      With kCFI, indirect calls are validated against their expected type
      versus actual type and failures occur when the two types do not match.
      The ultimate issue is that these sysfs functions are expecting to be
      called via dev_attr_show() but they may also be called via
      kobj_attr_show(), as certain files are created under two different
      kobjects that have two different sysfs_ops in intel_gt_sysfs_register(),
      hence the warnings above. When accessing the gt_ files under
      /sys/devices/pci0000:00/0000:00:02.0/drm/card0, which are using the same
      sysfs functions, there are no violations, meaning the functions are
      being called with the proper type.
      
      To make everything work properly, adjust certain functions to match the
      type of the ->show() and ->store() members in 'struct kobj_attribute'.
      Add a macro to generate functions for that can be called via both
      dev_attr_{show,store}() or kobj_attr_{show,store}() so that they can be
      called through both kobject locations without violating kCFI and adjust
      the attribute groups to account for this.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/1716Reviewed-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Reviewed-by: default avatarAndrzej Hajda <andrzej.hajda@intel.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221013205909.1282545-1-nathan@kernel.org
      a8a4f046
    • Karolina Drobnik's avatar
      i915/i915_gem_context: Remove debug message in i915_gem_context_create_ioctl · 67f99e34
      Karolina Drobnik authored
      We know that as long as GEM context create ioctl succeeds, a context was
      created. There is no need to write about it, especially when such a message
      heavily pollutes dmesg and makes debugging actual errors harder.
      
      Since commit baa89ba3 ("drm/i915/gem: initial conversion to new
      logging macros using coccinelle"), the logging for creating a new user
      context was moved under the driver debug output (for lack of a means for
      per-user logs, and a lack of user-focused drm.debug parameter). This
      only reveals how obnoxious having that spam be part of the driver debug
      logs, so remove it. [ from Chris Wilson ]
      Suggested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarKarolina Drobnik <karolina.drobnik@intel.com>
      Cc: Andi Shyti <andi.shyti@linux.intel.com>
      Reviewed-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Signed-off-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221025091903.986819-1-karolina.drobnik@intel.com
      67f99e34
    • Robert Beckett's avatar
      drm/i915: stop abusing swiotlb_max_segment · 78a07fe7
      Robert Beckett authored
      swiotlb_max_segment used to return either the maximum size that swiotlb
      could bounce, or for Xen PV PAGE_SIZE even if swiotlb could bounce buffer
      larger mappings.  This made i915 on Xen PV work as it bypasses the
      coherency aspect of the DMA API and can't cope with bounce buffering
      and this avoided bounce buffering for the Xen/PV case.
      
      So instead of adding this hack back, check for Xen/PV directly in i915
      for the Xen case and otherwise use the proper DMA API helper to query
      the maximum mapping size.
      
      Replace swiotlb_max_segment() calls with dma_max_mapping_size().
      In i915_gem_object_get_pages_internal() no longer consider max_segment
      only if CONFIG_SWIOTLB is enabled. There can be other (iommu related)
      causes of specific max segment sizes.
      
      Fixes: a2daa27c ("swiotlb: simplify swiotlb_max_segment")
      Reported-by: default avatarMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Signed-off-by: default avatarRobert Beckett <bob.beckett@collabora.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      [hch: added the Xen hack, rewrote the changelog]
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221020110308.1582518-1-hch@lst.de
      78a07fe7
    • Matthew Auld's avatar
      Revert "drm/i915/uapi: expose GTT alignment" · b0feda9c
      Matthew Auld authored
      The process for merging uAPI is to have UMD side ready and reviewed and
      merged before merging. Revert for now until that is ready.
      
      This reverts commit d54576a0.
      Reported-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Signed-off-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Michal Mrozek <michal.mrozek@intel.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Stuart Summers <stuart.summers@intel.com>
      Cc: Jordan Justen <jordan.l.justen@intel.com>
      Cc: Yang A Shi <yang.a.shi@intel.com>
      Cc: Nirmoy Das <nirmoy.das@intel.com>
      Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
      Reviewed-by: default avatarNirmoy Das <nirmoy.das@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221024101946.28974-1-matthew.auld@intel.com
      b0feda9c
    • Alan Previn's avatar
      drm/i915/guc: Remove intel_context:number_committed_requests counter · a7ac9d84
      Alan Previn authored
      With the introduction of the delayed disable-sched behavior,
      we use the GuC's xarray of valid guc-id's as a way to
      identify if new requests had been added to a context
      when the said context is being checked for closure.
      
      Additionally that prior change also closes the race for when
      a new incoming request fails to cancel the pending
      delayed disable-sched worker.
      
      With these two complementary checks, we see no more
      use for intel_context:guc_state:number_committed_requests.
      Signed-off-by: default avatarAlan Previn <alan.previn.teres.alexis@intel.com>
      Reviewed-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221006225121.826257-3-alan.previn.teres.alexis@intel.com
      a7ac9d84
    • Matthew Brost's avatar
      drm/i915/guc: Delay disabling guc_id scheduling for better hysteresis · 83321094
      Matthew Brost authored
      Add a delay, configurable via debugfs (default 34ms), to disable
      scheduling of a context after the pin count goes to zero. Disable
      scheduling is a costly operation as it requires synchronizing with
      the GuC. So the idea is that a delay allows the user to resubmit
      something before doing this operation. This delay is only done if
      the context isn't closed and less than a given threshold
      (default is 3/4) of the guc_ids are in use.
      
      Alan Previn: Matt Brost first introduced this patch back in Oct 2021.
      However no real world workload with measured performance impact was
      available to prove the intended results. Today, this series is being
      republished in response to a real world workload that benefited greatly
      from it along with measured performance improvement.
      
      Workload description: 36 containers were created on a DG2 device where
      each container was performing a combination of 720p 3d game rendering
      and 30fps video encoding. The workload density was configured in a way
      that guaranteed each container to ALWAYS be able to render and
      encode no less than 30fps with a predefined maximum render + encode
      latency time. That means the totality of all 36 containers and their
      workloads were not saturating the engines to their max (in order to
      maintain just enough headrooom to meet the min fps and max latencies
      of incoming container submissions).
      
      Problem statement: It was observed that the CPU core processing the i915
      soft IRQ work was experiencing severe load. Using tracelogs and an
      instrumentation patch to count specific i915 IRQ events, it was confirmed
      that the majority of the CPU cycles were caused by the
      gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
      majority of the cycles was determined to be processing a specific G2H
      IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
      by GuC in response to i915 KMD sending H2G requests:
      INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
      whenever a context goes idle so that we can unpin the context from GuC.
      The high CPU utilization % symptom was limiting density scaling.
      
      Root Cause Analysis: Because the incoming execution buffers were spread
      across 36 different containers (each with multiple contexts) but the
      system in totality was NOT saturated to the max, it was assumed that each
      context was constantly idling between submissions. This was causing
      a thrashing of unpinning contexts from GuC at one moment, followed quickly
      by repinning them due to incoming workload the very next moment. These
      event-pairs were being triggered across multiple contexts per container,
      across all containers at the rate of > 30 times per sec per context.
      
      Metrics: When running this workload without this patch, we measured an
      average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
      seconds or ~10 million times over ~25+ mins. With this patch, the count
      reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
      improvement observed is ~99% for the average counts per 10 seconds.
      
      Design awareness: Selftest impact.
      As temporary WA disable this feature for the selftests. Selftests are
      very timing sensitive and any change in timing can cause failure. A
      follow up patch will fixup the selftests to understand this delay.
      
      Design awareness: Race between guc_request_alloc and guc_context_close.
      If a context close is issued while there is a request submission in
      flight and a delayed schedule disable is pending, guc_context_close
      and guc_request_alloc will race to cancel the delayed disable.
      To close the race, make sure that guc_request_alloc waits for
      guc_context_close to finish running before checking any state.
      
      Design awareness: GT Reset event.
      If a gt reset is triggered, as preparation steps, add an additional step
      to ensure all contexts that have a pending delay-disable-schedule task
      be flushed of it. Move them directly into the closed state after cancelling
      the worker. This is okay because the existing flow flushes all
      yet-to-arrive G2H's dropping them anyway.
      Signed-off-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Signed-off-by: default avatarAlan Previn <alan.previn.teres.alexis@intel.com>
      Signed-off-by: default avatarDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Reviewed-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221006225121.826257-2-alan.previn.teres.alexis@intel.com
      83321094
  2. 26 Oct, 2022 4 commits
  3. 24 Oct, 2022 8 commits
  4. 21 Oct, 2022 1 commit
  5. 20 Oct, 2022 4 commits
  6. 19 Oct, 2022 1 commit
  7. 18 Oct, 2022 1 commit
  8. 17 Oct, 2022 11 commits